4.3 KiB
4.3 KiB
EcoSpecies Modernization Roadmap
Target Product
Create a Docker Compose-based, open-source EcoSpecies successor that:
- ingests legacy SLH text files and future species submissions
- exposes a stable API for species, sections, citations, and ecological linkages
- provides a responsive public web app
- supports researcher/editor workflows for curation and publishing
- generates exports aligned with legacy reporting needs and future FLELMR-style outputs
Recommended Stack
Core platform
- Backend: Python API service
- Primary datastore: PostgreSQL
- Search/indexing: PostgreSQL full-text initially, optional Meilisearch/OpenSearch later
- Frontend: static SPA or React-based client once requirements stabilize
- Deployment/runtime: Docker Compose for development and small-scale deployment
Why this stack
- permissive licenses
- strong support for text ingestion, APIs, and data processing
- easy local development
- clear path from prototype to production
Product Capabilities By Phase
Phase 0: Discovery and migration planning
- Inventory legacy assets and user-facing capabilities.
- Capture the replacement architecture and ingestion strategy.
- Define acknowledgements, provenance, and licensing boundaries.
Phase 1: Ingestion foundation
- Parse legacy
.txtSLH inputs into structured JSON records. - Normalize common metadata: title, scientific name, common name, FLELMR code, headings, references.
- Create ingest diagnostics to flag malformed files and missing metadata.
Phase 2: Public read experience
- Species listing and search.
- Species detail view with section navigation.
- Provenance and acknowledgement display.
- Summary metrics on corpus coverage.
Phase 3: Structured persistence
- Move parsed content into PostgreSQL.
- Add editor-safe import jobs and audit metadata.
- Preserve raw source alongside normalized records.
- Establish authentication and role-based access for editor and admin workflows.
- Add persisted editorial workflow state for draft, review, and published records.
- Make document sections individually addressable for editor review and revision, with audit history for section-level changes.
Phase 4: Linkages and visualization
- Model predator/prey, habitat, and ecological association edges.
- Add graph endpoints and species-relationship views.
- Support public-friendly visual explanations and expert filters.
Phase 5: Reports and export
- Recreate legacy-like text/RTF export.
- Add machine-readable export formats such as JSON and Markdown.
- Support FLELMR-oriented authoring/export profiles.
Phase 6: Assisted research workflows
- Add local-LLM-assisted extraction and drafting in a human-review loop.
- Integrate bibliography tooling for citation consolidation.
- Support candidate-species intake for records not yet in the historical corpus.
- Restrict assisted drafting and publication actions to authenticated editorial roles.
Data Model Direction
Initial core entities:
speciessource_documentdocument_sectioncitationtaxonlinkagemedia_assetingest_run
Key design rules:
- preserve raw source text
- retain provenance and import timestamps
- separate public published records from draft/editor states
- make sections addressable for citation and graph linking
LLM Extension Strategy
Use local models only for assistive tasks, never silent publication:
- extracting candidate structured fields from new SLH text
- suggesting missing headings or linkage labels
- clustering similar citations
- drafting summaries for editor review
Guardrails:
- raw text remains authoritative
- all generated content is marked as draft
- every automated extraction stores source spans where possible
Development Roadmap
- Implement a thin ingestion API over the legacy text corpus.
- Build a responsive browser UI for listing and viewing species.
- Add a persistent PostgreSQL-backed ingest store.
- Introduce export and visualization services.
- Add editorial workflows and local-LLM assistance.
Definition Of Done For The Initial Milestone
docker compose upstarts a working API and frontend.- The system can enumerate the legacy corpus and show parsed species detail for at least one real SLH file.
- Project docs describe the migration approach, target architecture, and next phases.