Didactopus/docs/roadmap.md

12 KiB

Roadmap

This document summarizes the current prioritized improvement roadmap for Didactopus as a learner-facing system.

The ordering is intentional. The project should first strengthen the graph-grounded mentor loop that defines the real learner task, then use that stable backbone for local-model evaluation, accessibility work, and broader UX improvements.

Priorities

1. Graph-grounded conversational mentor loop

Status: in progress

Why first:

  • It defines the actual learner-facing interaction Didactopus is trying to support.
  • It makes later benchmarking and accessibility work target a real session model rather than an abstract idea.
  • It uses the graph and source-corpus artifacts already present in the repository.

Near-term scope:

  • continue strengthening the learner session backend
  • make mentor, practice, and evaluator turns consistently source-grounded
  • improve trust-preserving feedback behavior
  • extend the session flow beyond one short interaction
  • make scientific virtues operational in the session loop by separating observation from interpretation, preserving uncertainty, and rewarding justified revision
  • replace stubbed provider output in learner-facing pilot flows with configured real model backends where available
  • make learner-facing guidance explicitly distinction-aware:
    • A vs B
    • A does not imply B
    • B can occur without A

Current code anchors:

  • didactopus.learner_session
  • didactopus.learner_session_demo
  • didactopus.graph_retrieval
  • didactopus.ocw_rolemesh_transcript_demo

2. Local-model adequacy benchmark for constrained hardware

Status: planned

Why next:

  • The learner loop should be benchmarked as soon as its task shape is stable.
  • Adequate local models on low-cost hardware would materially improve access in underserved regions.
  • Didactopus does not need a single perfect model; it needs role-adequate behavior.

Primary questions:

  • Which models are adequate for mentor, practice, and evaluator roles?
  • What latency, memory, and throughput are acceptable on Raspberry Pi-class hardware?
  • Which roles can degrade gracefully to smaller models?

Expected outputs:

  • benchmark tasks grounded in the MIT OCW pack
  • per-role adequacy scores
  • recommended deployment profiles for low-end, laptop, and stronger local systems

3. Accessibility-first learner interaction

Status: planned

Why high priority:

  • Didactopus has clear potential for learners who do not have access to enough teachers or tutors.
  • Blind learners and other accessibility-focused use cases benefit directly from structured, guided interaction.
  • Voice and text accessibility can build on the same learner-session backend.

Target features:

  • screen-reader-friendly learner output
  • accessible HTML alternatives to purely visual artifacts
  • text-first navigation of concept neighborhoods and progress
  • explicit structural cues in explanations and feedback

4. Voice interaction with local STT and TTS

Status: planned

Why after accessibility baseline:

  • The project should first ensure that the session structure is accessible in text.
  • Voice interaction is more useful once the mentor loop and pending-response behavior are stable.

Target features:

  • speech-to-text input for learner answers
  • text-to-speech output for mentor, practice, and evaluator turns
  • spoken waiting notices during slow local-model responses
  • repeat, interrupt, and slow-down controls

5. Learner workbench UI

Status: pilot in progress

Why important:

  • The repository has review-focused interfaces and generated artifacts, but the learner path is still fragmented.
  • A dedicated learner workbench would make Didactopus more usable as a personal mentor rather than only a pipeline/demo system.

Target features:

  • current concept and why-it-matters view
  • prerequisite chain and supporting lessons
  • grounded source excerpts
  • definitions, constraints, and qualifications view
  • quote candidates and source-trail view for argumentation workflows
  • active practice task
  • evaluator feedback
  • recommended next step
  • first external pilot should use the evidence-trail evo-edu pack as a learner-workbench test case

Current progress:

  • the first external pilot pack now exists at domain-packs/evidence-trail/
  • pack_to_frontend output is generated and copied into webui/public/packs/evidence-trail-pack.json
  • the web UI now has a learner-workbench launcher and Evidence Trail pilot mode in addition to the review workbench
  • the learner pilot exposes question, observation, interpretation, uncertainty, and revision-trigger fields directly in the UI
  • scientific virtues are now reflected in the UI framing and in backend learner-session prompt construction
  • the backend now exposes POST /api/learner-workbench/session
  • end-to-end verification succeeded locally: the API starts, the endpoint returns structured concept/session output, and the frontend/backend contract is working

Immediate next steps:

  • replace current stubbed mentor/practice/evaluator text with a configured real provider path
  • enrich the Evidence Trail pack with grounded source fragments so returned guidance is based on more than pack metadata
  • persist learner-session state instead of treating each call as a stateless step
  • connect learner progress, evidence, and revision history to the standard backend session model
  • define deployment notes for running the learner workbench against the local API outside development mode

Current pilot state:

  • a backend learner-workbench path exists in didactopus.learner_workbench
  • the API exposes POST /api/learner-workbench/session
  • the web UI now has a launcher that separates review workbench from learner workbench
  • the first pilot pack exists at domain-packs/evidence-trail/
  • the frontend can load a static learner-pack payload from webui/public/packs/evidence-trail-pack.json
  • the current pilot explicitly emphasizes question framing, observation versus interpretation, uncertainty, and revision

Next steps:

  • connect the learner-workbench pilot more directly to the standard learner-session backend
  • persist learner-workbench state instead of treating each step as a stateless interaction
  • ground the pilot more deeply in source fragments instead of mostly pack-level structure
  • decide which scientific-virtues framing belongs in the stable learner path versus remaining pilot-specific
  • document a simple local run path for using the learner workbench outside ad hoc development

6. Adaptive diagnostics and practice refinement

Status: planned

Why this matters:

  • Learners need clearer answers to “what am I weak at?” and “what should I do next?”
  • The repository already has evidence and evaluator machinery that can be surfaced in learner terms.

Target features:

  • weak-dimension summaries by concept
  • misconception tracking
  • remedial branch suggestions
  • hint ladders and difficulty control
  • oral, short-answer, and compare-and-contrast practice modes

7. Source-grounded citation transparency

Status: planned

Why it matters:

  • Trust depends on showing what is grounded in source material and what is model inference.
  • This is especially important for learners using local models with variable quality.

Target features:

  • lesson and source-fragment references in explanations
  • explicit distinction between cited source support and model inference
  • easier inspection of concept-to-source provenance
  • explicit quote marking and attribution in any public-facing output
  • no unmarked source wording in public Notebook exposition

8. Notebook-centered knowledge layer

Status: planned

Why it matters:

  • The Foundation Notebook pilot suggests that Didactopus needs one durable concept-network representation between raw source grounding and learner-facing products.
  • Topic labels alone are too weak; broad explanatory hubs and first-ring concept neighborhoods work better.
  • The Notebook is the right place to preserve definitions, constraints, qualifications, and contrasts.
  • The pilot also suggests that the Notebook is the durable center between raw source-grounding work and learner-facing products, not just a supplemental static page format.

Target features:

  • hub-first concept organization
  • first-ring and second-ring concept neighborhoods
  • first-class distinction modeling:
    • A vs B
    • A does not imply B
    • B can occur without A
  • support for source-role weighting:
    • overview
    • mechanism
    • nuance
    • controversy
    • argumentation
  • support for learner-significance cues so explanation and practice can answer “why does this distinction matter?”
  • Notebook-adjacent secondary products:
    • definitions
    • qualifications
    • constraints
    • quote candidates
  • separate rendering rules for Notebook, workbench, and public exposition

Immediate next steps:

  • promote the Foundation Notebook pilot conclusions into the stable design model for Didactopus
  • prefer broad explanatory hubs over narrow topic labels when organizing new Notebook regions
  • make source-role-aware retrieval available to learner workbench flows
  • treat secondary products as first-class review/export outputs rather than incidental metadata
  • connect Notebook concept neighborhoods more directly to learner-session grounding and practice generation

9. Pack quality, review, and concept-graph curation improvements

Status: planned

Why later:

  • These are important, but they mainly improve the quality of the learning substrate rather than the immediate learner interaction.
  • The graph-first path should first prove out the learner experience it supports.

Target features:

  • concept merge and split workflows
  • alias handling across packs
  • impact analysis for concept edits
  • stronger review support for noisy or broad concepts
  • improved source coverage QA

10. Incremental re-ingestion and course updates

Status: planned

Why useful:

  • External course repositories are now part of the intended workflow.
  • Didactopus should avoid full rebuilds when only part of a source tree changes.

Target features:

  • changed-file detection
  • stable concept and fragment IDs where possible
  • graph and pack diffs
  • preservation of learner evidence across source updates

11. Richer multimodal and notation support

Status: longer-term

Why longer-term:

  • This work is valuable but more specialized and technically demanding than the earlier roadmap items.

Examples:

  • spoken math rendering improvements
  • diagram descriptions
  • accessible handling of image-heavy source materials
  • EPUB and other learner-friendly export targets

Guiding Principles

  • Use the graph and source corpus before relying on model prior knowledge.
  • Optimize for guided learning, not answer offloading.
  • Prefer role-adequate local models over chasing a single best model.
  • Keep accessibility and low-cost deployment in scope from the start, not as cleanup work.
  • Preserve provenance and license compliance as first-class constraints.
  • Advance the current roadmap without assuming abundant compute, fluent English, expert supervision, or mature learners.
  • Treat scientific virtues as operational principles: encourage curiosity, honesty about evidence, skepticism toward weak claims, attentiveness to caveats, and revision when the evidence changes.
  • Separate observation from interpretation in learner-facing guidance so the system does not blur grounded support with model inference.
  • Frame revision as progress rather than as failure, especially in mentor and evaluator feedback.
  • Preserve distinctions, caveats, and scope conditions as learning assets rather than treating them as noise.
  • Treat the Notebook as the durable knowledge layer, but not as the only learner-facing representation.

Suggested Implementation Sequence

  1. Strengthen didactopus.learner_session into the standard session backend.
  2. Fold the learner-workbench pilot into that backend without losing its stronger study-state framing.
  3. Add a Notebook-centered operating layer with hub concepts, distinctions, and secondary products.
  4. Replace stubbed learner-workbench provider output with a configured real model backend.
  5. Ground the evidence-trail pilot and future Notebook pilots in richer source fragments, definitions, constraints, and persisted learner state.
  6. Build a small model-benchmark harness around the unified learner backend.
  7. Add accessible learner HTML and text-first outputs.
  8. Add local TTS and STT support to the same session flow.
  9. Expand adaptive practice and diagnostics.
  10. Improve review, impact analysis, and incremental update support.