# EcoSpecies Modernization Roadmap ## Target Product Create a Docker Compose-based, open-source EcoSpecies successor that: - ingests legacy SLH text files and future species submissions - exposes a stable API for species, sections, citations, and ecological linkages - provides a responsive public web app - supports researcher/editor workflows for curation and publishing - generates exports aligned with legacy reporting needs and future FLELMR-style outputs ## Recommended Stack ### Core platform - Backend: Python API service - Primary datastore: PostgreSQL - Search/indexing: PostgreSQL full-text initially, optional Meilisearch/OpenSearch later - Frontend: static SPA or React-based client once requirements stabilize - Deployment/runtime: Docker Compose for development and small-scale deployment ### Why this stack - permissive licenses - strong support for text ingestion, APIs, and data processing - easy local development - clear path from prototype to production ## Product Capabilities By Phase ### Phase 0: Discovery and migration planning - Inventory legacy assets and user-facing capabilities. - Capture the replacement architecture and ingestion strategy. - Define acknowledgements, provenance, and licensing boundaries. ### Phase 1: Ingestion foundation - Parse legacy `.txt` SLH inputs into structured JSON records. - Normalize common metadata: title, scientific name, common name, FLELMR code, headings, references. - Create ingest diagnostics to flag malformed files and missing metadata. ### Phase 2: Public read experience - Species listing and search. - Species detail view with section navigation. - Provenance and acknowledgement display. - Summary metrics on corpus coverage. ### Phase 3: Structured persistence - Move parsed content into PostgreSQL. - Add editor-safe import jobs and audit metadata. - Preserve raw source alongside normalized records. - Establish authentication and role-based access for editor and admin workflows. - Add persisted editorial workflow state for draft, review, and published records. - Make document sections individually addressable for editor review and revision, with audit history for section-level changes. ### Phase 4: Linkages and visualization - Model predator/prey, habitat, and ecological association edges. - Add graph endpoints and species-relationship views. - Support public-friendly visual explanations and expert filters. ### Phase 5: Reports and export - Recreate legacy-like text/RTF export. - Add machine-readable export formats such as JSON and Markdown. - Support FLELMR-oriented authoring/export profiles. ### Phase 6: Assisted research workflows - Add local-LLM-assisted extraction and drafting in a human-review loop. - Integrate bibliography tooling for citation consolidation. - Support candidate-species intake for records not yet in the historical corpus. - Restrict assisted drafting and publication actions to authenticated editorial roles. ## Data Model Direction Initial core entities: - `species` - `source_document` - `document_section` - `citation` - `taxon` - `linkage` - `media_asset` - `ingest_run` Key design rules: - preserve raw source text - retain provenance and import timestamps - separate public published records from draft/editor states - make sections addressable for citation and graph linking ## LLM Extension Strategy Use local models only for assistive tasks, never silent publication: - extracting candidate structured fields from new SLH text - suggesting missing headings or linkage labels - clustering similar citations - drafting summaries for editor review Guardrails: - raw text remains authoritative - all generated content is marked as draft - every automated extraction stores source spans where possible ## Development Roadmap 1. Implement a thin ingestion API over the legacy text corpus. 2. Build a responsive browser UI for listing and viewing species. 3. Add a persistent PostgreSQL-backed ingest store. 4. Introduce export and visualization services. 5. Add editorial workflows and local-LLM assistance. ## Definition Of Done For The Initial Milestone - `docker compose up` starts a working API and frontend. - The system can enumerate the legacy corpus and show parsed species detail for at least one real SLH file. - Project docs describe the migration approach, target architecture, and next phases.