966 B
966 B
Multi-Source Ingestion
The multi-source ingestion layer lets Didactopus build one draft domain pack from several heterogeneous inputs describing the same course or topic.
Why this matters
Real course material is often scattered across:
- syllabus files
- lesson notes
- transcripts
- assignment sheets
- HTML pages
- supplemental markdown
A single-source parser is too narrow for serious curriculum distillation.
Pipeline
- detect adapter by file extension or naming convention
- normalize each source into a
NormalizedSourceRecord - merge sources into a
NormalizedCourse - extract concept candidates
- run rule-policy passes
- emit merged draft pack
- emit conflict report and attribution manifest
Conflict report categories
- duplicate lesson titles across sources
- repeated key terms with different local contexts
- modules with no explicit exercises
- project-like content needing manual review
- lessons with thin mastery signals