4.3 KiB

Raw Blame History

EcoSpecies Modernization Roadmap

Target Product

Create a Docker Compose-based, open-source EcoSpecies successor that:

ingests legacy SLH text files and future species submissions
exposes a stable API for species, sections, citations, and ecological linkages
provides a responsive public web app
supports researcher/editor workflows for curation and publishing
generates exports aligned with legacy reporting needs and future FLELMR-style outputs

Recommended Stack

Core platform

Backend: Python API service
Primary datastore: PostgreSQL
Search/indexing: PostgreSQL full-text initially, optional Meilisearch/OpenSearch later
Frontend: static SPA or React-based client once requirements stabilize
Deployment/runtime: Docker Compose for development and small-scale deployment

Why this stack

permissive licenses
strong support for text ingestion, APIs, and data processing
easy local development
clear path from prototype to production

Product Capabilities By Phase

Phase 0: Discovery and migration planning

Inventory legacy assets and user-facing capabilities.
Capture the replacement architecture and ingestion strategy.
Define acknowledgements, provenance, and licensing boundaries.

Phase 1: Ingestion foundation

Parse legacy .txt SLH inputs into structured JSON records.
Normalize common metadata: title, scientific name, common name, FLELMR code, headings, references.
Create ingest diagnostics to flag malformed files and missing metadata.

Phase 2: Public read experience

Species listing and search.
Species detail view with section navigation.
Provenance and acknowledgement display.
Summary metrics on corpus coverage.

Phase 3: Structured persistence

Move parsed content into PostgreSQL.
Add editor-safe import jobs and audit metadata.
Preserve raw source alongside normalized records.
Establish authentication and role-based access for editor and admin workflows.
Add persisted editorial workflow state for draft, review, and published records.
Make document sections individually addressable for editor review and revision, with audit history for section-level changes.

Phase 4: Linkages and visualization

Model predator/prey, habitat, and ecological association edges.
Add graph endpoints and species-relationship views.
Support public-friendly visual explanations and expert filters.

Phase 5: Reports and export

Recreate legacy-like text/RTF export.
Add machine-readable export formats such as JSON and Markdown.
Support FLELMR-oriented authoring/export profiles.

Phase 6: Assisted research workflows

Add local-LLM-assisted extraction and drafting in a human-review loop.
Integrate bibliography tooling for citation consolidation.
Support candidate-species intake for records not yet in the historical corpus.
Restrict assisted drafting and publication actions to authenticated editorial roles.

Data Model Direction

Initial core entities:

species
source_document
document_section
citation
taxon
linkage
media_asset
ingest_run

Key design rules:

preserve raw source text
retain provenance and import timestamps
separate public published records from draft/editor states
make sections addressable for citation and graph linking

LLM Extension Strategy

Use local models only for assistive tasks, never silent publication:

extracting candidate structured fields from new SLH text
suggesting missing headings or linkage labels
clustering similar citations
drafting summaries for editor review

Guardrails:

raw text remains authoritative
all generated content is marked as draft
every automated extraction stores source spans where possible

Development Roadmap

Implement a thin ingestion API over the legacy text corpus.
Build a responsive browser UI for listing and viewing species.
Add a persistent PostgreSQL-backed ingest store.
Introduce export and visualization services.
Add editorial workflows and local-LLM assistance.

Definition Of Done For The Initial Milestone

docker compose up starts a working API and frontend.
The system can enumerate the legacy corpus and show parsed species detail for at least one real SLH file.
Project docs describe the migration approach, target architecture, and next phases.