EcoSpecies-Atlas/docs/roadmap.md

4.3 KiB

EcoSpecies Modernization Roadmap

Target Product

Create a Docker Compose-based, open-source EcoSpecies successor that:

  • ingests legacy SLH text files and future species submissions
  • exposes a stable API for species, sections, citations, and ecological linkages
  • provides a responsive public web app
  • supports researcher/editor workflows for curation and publishing
  • generates exports aligned with legacy reporting needs and future FLELMR-style outputs

Core platform

  • Backend: Python API service
  • Primary datastore: PostgreSQL
  • Search/indexing: PostgreSQL full-text initially, optional Meilisearch/OpenSearch later
  • Frontend: static SPA or React-based client once requirements stabilize
  • Deployment/runtime: Docker Compose for development and small-scale deployment

Why this stack

  • permissive licenses
  • strong support for text ingestion, APIs, and data processing
  • easy local development
  • clear path from prototype to production

Product Capabilities By Phase

Phase 0: Discovery and migration planning

  • Inventory legacy assets and user-facing capabilities.
  • Capture the replacement architecture and ingestion strategy.
  • Define acknowledgements, provenance, and licensing boundaries.

Phase 1: Ingestion foundation

  • Parse legacy .txt SLH inputs into structured JSON records.
  • Normalize common metadata: title, scientific name, common name, FLELMR code, headings, references.
  • Create ingest diagnostics to flag malformed files and missing metadata.

Phase 2: Public read experience

  • Species listing and search.
  • Species detail view with section navigation.
  • Provenance and acknowledgement display.
  • Summary metrics on corpus coverage.

Phase 3: Structured persistence

  • Move parsed content into PostgreSQL.
  • Add editor-safe import jobs and audit metadata.
  • Preserve raw source alongside normalized records.
  • Establish authentication and role-based access for editor and admin workflows.
  • Add persisted editorial workflow state for draft, review, and published records.
  • Make document sections individually addressable for editor review and revision, with audit history for section-level changes.

Phase 4: Linkages and visualization

  • Model predator/prey, habitat, and ecological association edges.
  • Add graph endpoints and species-relationship views.
  • Support public-friendly visual explanations and expert filters.

Phase 5: Reports and export

  • Recreate legacy-like text/RTF export.
  • Add machine-readable export formats such as JSON and Markdown.
  • Support FLELMR-oriented authoring/export profiles.

Phase 6: Assisted research workflows

  • Add local-LLM-assisted extraction and drafting in a human-review loop.
  • Integrate bibliography tooling for citation consolidation.
  • Support candidate-species intake for records not yet in the historical corpus.
  • Restrict assisted drafting and publication actions to authenticated editorial roles.

Data Model Direction

Initial core entities:

  • species
  • source_document
  • document_section
  • citation
  • taxon
  • linkage
  • media_asset
  • ingest_run

Key design rules:

  • preserve raw source text
  • retain provenance and import timestamps
  • separate public published records from draft/editor states
  • make sections addressable for citation and graph linking

LLM Extension Strategy

Use local models only for assistive tasks, never silent publication:

  • extracting candidate structured fields from new SLH text
  • suggesting missing headings or linkage labels
  • clustering similar citations
  • drafting summaries for editor review

Guardrails:

  • raw text remains authoritative
  • all generated content is marked as draft
  • every automated extraction stores source spans where possible

Development Roadmap

  1. Implement a thin ingestion API over the legacy text corpus.
  2. Build a responsive browser UI for listing and viewing species.
  3. Add a persistent PostgreSQL-backed ingest store.
  4. Introduce export and visualization services.
  5. Add editorial workflows and local-LLM assistance.

Definition Of Done For The Initial Milestone

  • docker compose up starts a working API and frontend.
  • The system can enumerate the legacy corpus and show parsed species detail for at least one real SLH file.
  • Project docs describe the migration approach, target architecture, and next phases.