# Roadmap

This document summarizes the current prioritized improvement roadmap for Didactopus as a learner-facing system.

The ordering is intentional. The project should first strengthen the graph-grounded mentor loop that defines the real learner task, then use that stable backbone for local-model evaluation, accessibility work, and broader UX improvements.

## Priorities

### 1. Graph-grounded conversational mentor loop

Status: in progress

Why first:

- It defines the actual learner-facing interaction Didactopus is trying to support.
- It makes later benchmarking and accessibility work target a real session model rather than an abstract idea.
- It uses the graph and source-corpus artifacts already present in the repository.

Near-term scope:

- continue strengthening the learner session backend
- make mentor, practice, and evaluator turns consistently source-grounded
- improve trust-preserving feedback behavior
- extend the session flow beyond one short interaction
- make scientific virtues operational in the session loop by separating observation from interpretation, preserving uncertainty, and rewarding justified revision
- replace stubbed provider output in learner-facing pilot flows with configured real model backends where available
- make learner-facing guidance explicitly distinction-aware:
  - `A vs B`
  - `A does not imply B`
  - `B can occur without A`

Current code anchors:

- `didactopus.learner_session`
- `didactopus.learner_session_demo`
- `didactopus.graph_retrieval`
- `didactopus.ocw_rolemesh_transcript_demo`

### 2. Local-model adequacy benchmark for constrained hardware

Status: planned

Why next:

- The learner loop should be benchmarked as soon as its task shape is stable.
- Adequate local models on low-cost hardware would materially improve access in underserved regions.
- Didactopus does not need a single perfect model; it needs role-adequate behavior.

Primary questions:

- Which models are adequate for `mentor`, `practice`, and `evaluator` roles?
- What latency, memory, and throughput are acceptable on Raspberry Pi-class hardware?
- Which roles can degrade gracefully to smaller models?

Expected outputs:

- benchmark tasks grounded in the MIT OCW pack
- per-role adequacy scores
- recommended deployment profiles for low-end, laptop, and stronger local systems

### 3. Accessibility-first learner interaction

Status: planned

Why high priority:

- Didactopus has clear potential for learners who do not have access to enough teachers or tutors.
- Blind learners and other accessibility-focused use cases benefit directly from structured, guided interaction.
- Voice and text accessibility can build on the same learner-session backend.

Target features:

- screen-reader-friendly learner output
- accessible HTML alternatives to purely visual artifacts
- text-first navigation of concept neighborhoods and progress
- explicit structural cues in explanations and feedback

### 4. Voice interaction with local STT and TTS

Status: planned

Why after accessibility baseline:

- The project should first ensure that the session structure is accessible in text.
- Voice interaction is more useful once the mentor loop and pending-response behavior are stable.

Target features:

- speech-to-text input for learner answers
- text-to-speech output for mentor, practice, and evaluator turns
- spoken waiting notices during slow local-model responses
- repeat, interrupt, and slow-down controls

### 5. Learner workbench UI

Status: pilot in progress

Why important:

- The repository has review-focused interfaces and generated artifacts, but the learner path is still fragmented.
- A dedicated learner workbench would make Didactopus more usable as a personal mentor rather than only a pipeline/demo system.

Target features:

- current concept and why-it-matters view
- prerequisite chain and supporting lessons
- grounded source excerpts
- definitions, constraints, and qualifications view
- quote candidates and source-trail view for argumentation workflows
- active practice task
- evaluator feedback
- recommended next step
- first external pilot should use the `evidence-trail` evo-edu pack as a learner-workbench test case

Current progress:

- the first external pilot pack now exists at `domain-packs/evidence-trail/`
- `pack_to_frontend` output is generated and copied into `webui/public/packs/evidence-trail-pack.json`
- the web UI now has a learner-workbench launcher and `Evidence Trail` pilot mode in addition to the review workbench
- the learner pilot exposes question, observation, interpretation, uncertainty, and revision-trigger fields directly in the UI
- scientific virtues are now reflected in the UI framing and in backend learner-session prompt construction
- the backend now exposes `POST /api/learner-workbench/session`
- end-to-end verification succeeded locally: the API starts, the endpoint returns structured concept/session output, and the frontend/backend contract is working

Immediate next steps:

- replace current stubbed mentor/practice/evaluator text with a configured real provider path
- enrich the `Evidence Trail` pack with grounded source fragments so returned guidance is based on more than pack metadata
- persist learner-session state instead of treating each call as a stateless step
- connect learner progress, evidence, and revision history to the standard backend session model
- define deployment notes for running the learner workbench against the local API outside development mode

Current pilot state:

- a backend learner-workbench path exists in `didactopus.learner_workbench`
- the API exposes `POST /api/learner-workbench/session`
- the web UI now has a launcher that separates review workbench from learner workbench
- the first pilot pack exists at `domain-packs/evidence-trail/`
- the frontend can load a static learner-pack payload from `webui/public/packs/evidence-trail-pack.json`
- the current pilot explicitly emphasizes question framing, observation versus interpretation, uncertainty, and revision

Next steps:

- connect the learner-workbench pilot more directly to the standard learner-session backend
- persist learner-workbench state instead of treating each step as a stateless interaction
- ground the pilot more deeply in source fragments instead of mostly pack-level structure
- decide which scientific-virtues framing belongs in the stable learner path versus remaining pilot-specific
- document a simple local run path for using the learner workbench outside ad hoc development

### 6. Adaptive diagnostics and practice refinement

Status: planned

Why this matters:

- Learners need clearer answers to “what am I weak at?” and “what should I do next?”
- The repository already has evidence and evaluator machinery that can be surfaced in learner terms.

Target features:

- weak-dimension summaries by concept
- misconception tracking
- remedial branch suggestions
- hint ladders and difficulty control
- oral, short-answer, and compare-and-contrast practice modes

### 7. Source-grounded citation transparency

Status: planned

Why it matters:

- Trust depends on showing what is grounded in source material and what is model inference.
- This is especially important for learners using local models with variable quality.

Target features:

- lesson and source-fragment references in explanations
- explicit distinction between cited source support and model inference
- easier inspection of concept-to-source provenance
- explicit quote marking and attribution in any public-facing output
- no unmarked source wording in public Notebook exposition

### 8. Notebook-centered knowledge layer

Status: planned

Why it matters:

- The Foundation Notebook pilot suggests that Didactopus needs one durable
  concept-network representation between raw source grounding and learner-facing
  products.
- Topic labels alone are too weak; broad explanatory hubs and first-ring
  concept neighborhoods work better.
- The Notebook is the right place to preserve definitions, constraints,
  qualifications, and contrasts.
- The pilot also suggests that the Notebook is the durable center between raw
  source-grounding work and learner-facing products, not just a supplemental
  static page format.

Target features:

- hub-first concept organization
- first-ring and second-ring concept neighborhoods
- first-class distinction modeling:
  - `A vs B`
  - `A does not imply B`
  - `B can occur without A`
- support for source-role weighting:
  - overview
  - mechanism
  - nuance
  - controversy
  - argumentation
- support for learner-significance cues so explanation and practice can answer
  “why does this distinction matter?”
- Notebook-adjacent secondary products:
  - definitions
  - qualifications
  - constraints
  - quote candidates
- separate rendering rules for Notebook, workbench, and public exposition

Immediate next steps:

- promote the Foundation Notebook pilot conclusions into the stable design
  model for Didactopus
- prefer broad explanatory hubs over narrow topic labels when organizing new
  Notebook regions
- make source-role-aware retrieval available to learner workbench flows
- treat secondary products as first-class review/export outputs rather than
  incidental metadata
- connect Notebook concept neighborhoods more directly to learner-session
  grounding and practice generation
- add a project-level `.groundrecall/work-map.{json,md}` convention so active
  source roots, export roots, temp builds, and deployment targets stay easy to
  find across long-running modernization work
- extend Notebook-related terminology work into bibliography/index workflows:
  - expand TOA/CiteGeist keyword and keyphrase coverage for Notebook concepts
  - use book-index terminology as an authoritative signal for concept ranking
  - allow opposition-index terminology to raise salience without raising
    authority score
- add citation-coverage triage for public-facing pages:
  - `citation_missing`
  - `citation_thin`
  - `citation_rich`
- use visible citation blocks for pages that do not yet have full citation
  support

### 8a. Timeline framework for Archive modernization

Status: planned

Why it matters:

- The Archive needs a structured chronology path for publications, court cases,
  educational milestones, and controversy events.
- A timeline is useful even before the full citation graph and Notebook link
  structure are complete.
- A timeline framework is realistic for rollout, even if deep expansion is a
  post-rollout task.

Near-term scope:

- support timeline entry types:
  - `publication`
  - `case`
  - `event`
- support multiple time granularities:
  - exact date
  - year
  - date range
  - decade
  - century
  - deep-time epoch
- seed a small set of high-value entries for public launch
- connect timeline entries to Notebook concepts, citation status, and later
  evidence-docket expansion

Longer-term scope:

- add aggregate entries for years, decades, and centuries
- add deep-time scientific chronology back through geological eras and major
  life-history milestones
- connect publications to open-access links, cites/cited-by expansion, and
  opposition-response dockets

### 9. Pack quality, review, and concept-graph curation improvements

Status: planned

Why later:

- These are important, but they mainly improve the quality of the learning substrate rather than the immediate learner interaction.
- The graph-first path should first prove out the learner experience it supports.

Target features:

- concept merge and split workflows
- alias handling across packs
- impact analysis for concept edits
- stronger review support for noisy or broad concepts
- improved source coverage QA

### 10. Incremental re-ingestion and course updates

Status: planned

Why useful:

- External course repositories are now part of the intended workflow.
- Didactopus should avoid full rebuilds when only part of a source tree changes.

Target features:

- changed-file detection
- stable concept and fragment IDs where possible
- graph and pack diffs
- preservation of learner evidence across source updates

### 11. Richer multimodal and notation support

Status: longer-term

Why longer-term:

- This work is valuable but more specialized and technically demanding than the earlier roadmap items.

Examples:

- spoken math rendering improvements
- diagram descriptions
- accessible handling of image-heavy source materials
- EPUB and other learner-friendly export targets

## Guiding Principles

- Use the graph and source corpus before relying on model prior knowledge.
- Optimize for guided learning, not answer offloading.
- Prefer role-adequate local models over chasing a single best model.
- Keep accessibility and low-cost deployment in scope from the start, not as cleanup work.
- Preserve provenance and license compliance as first-class constraints.
- Advance the current roadmap without assuming abundant compute, fluent English, expert supervision, or mature learners.
- Treat scientific virtues as operational principles: encourage curiosity, honesty about evidence, skepticism toward weak claims, attentiveness to caveats, and revision when the evidence changes.
- Separate observation from interpretation in learner-facing guidance so the system does not blur grounded support with model inference.
- Frame revision as progress rather than as failure, especially in mentor and evaluator feedback.
- Preserve distinctions, caveats, and scope conditions as learning assets rather
  than treating them as noise.
- Treat the Notebook as the durable knowledge layer, but not as the only
  learner-facing representation.

## Suggested Implementation Sequence

1. Strengthen `didactopus.learner_session` into the standard session backend.
2. Fold the learner-workbench pilot into that backend without losing its stronger study-state framing.
3. Add a Notebook-centered operating layer with hub concepts, distinctions, and secondary products.
4. Replace stubbed learner-workbench provider output with a configured real model backend.
5. Ground the `evidence-trail` pilot and future Notebook pilots in richer source fragments, definitions, constraints, and persisted learner state.
6. Build a small model-benchmark harness around the unified learner backend.
7. Add accessible learner HTML and text-first outputs.
8. Add local TTS and STT support to the same session flow.
9. Expand adaptive practice and diagnostics.
10. Improve review, impact analysis, and incremental update support.