Added cross-course merger.

2026-03-13 06:36:27 -04:00 · 2026-03-13 06:36:27 -04:00 · 0656f7bbe8
parent 8defaab1c2
commit 0656f7bbe8
31 changed files with 753 additions and 90 deletions
--- a/README.md
+++ b/README.md
@ -8,6 +8,41 @@
 ## Recent revisions
 ### Course-to-course merger
 This revision adds two major capabilities:
 - **real document adapter scaffolds** for PDF, DOCX, PPTX, and HTML
 - a **cross-course merger** for combining multiple course-derived packs into one stronger domain draft
 These additions extend the earlier multi-source ingestion layer from "multiple files for one course"
 to "multiple courses or course-like sources for one topic domain."
 ## What is included
 - adapter registry for:
  - PDF
  - DOCX
  - PPTX
  - HTML
  - Markdown
  - text
 - normalized document extraction interface
 - course bundle ingestion across multiple source documents
 - cross-course terminology and overlap analysis
 - merged topic-pack emitter
 - cross-course conflict report
 - example source files and example merged output
 ## Design stance
 This is still scaffold-level extraction. The purpose is to define stable interfaces and emitted artifacts,
 not to claim perfect semantic parsing of every teaching document.
 The implementation is designed so stronger parsers can later replace the stub extractors without changing
 the surrounding pipeline.
 ### Multi-Source Course Ingestion
 This revision adds a **Multi-Source Course Ingestion Layer**.
@ -216,3 +251,4 @@ didactopus/
--- a/configs/config.example.yaml
+++ b/configs/config.example.yaml
@ -1,16 +1,19 @@
 document_adapters:
  allow_pdf: true
  allow_docx: true
  allow_pptx: true
  allow_html: true
  allow_markdown: true
  allow_text: true
 course_ingest:
  default_pack_author: "Wesley R. Elsberry"
  default_license: "REVIEW-REQUIRED"
  min_term_length: 4
  max_terms_per_lesson: 8
-rule_policy:
+cross_course:
-  enable_prerequisite_order_rule: true
+  detect_title_overlaps: true
  enable_duplicate_term_merge_rule: true
  enable_project_detection_rule: true
  enable_review_flags: true
 multisource:
  detect_duplicate_lessons: true
  detect_term_conflicts: true
  detect_order_conflicts: true
  merge_same_named_lessons: true
--- a/docs/cross-course-merger.md
+++ b/docs/cross-course-merger.md
@ -0,0 +1,31 @@
 # Cross-Course Merger
 The cross-course merger combines multiple course-like inputs covering the same subject area.
 ## Goal
 Build a stronger draft topic pack from several partially overlapping sources.
 ## What it does
 - merges normalized source records into course bundles
 - merges course bundles into one topic bundle
 - compares repeated concepts across courses
 - flags terminology conflicts and overlap
 - emits a merged draft pack
 - emits a cross-course conflict report
 ## Why this matters
 No single course is usually ideal for mastery-oriented domain construction.
 Combining multiple sources can improve:
 - concept coverage
 - exercise diversity
 - project identification
 - terminology mapping
 - prerequisite robustness
 ## Important caveat
 This merger is draft-oriented.
 Human review remains necessary before trusting the result as a final domain pack.
--- a/docs/document-adapters.md
+++ b/docs/document-adapters.md
@ -0,0 +1,42 @@
 # Document Adapters
 Didactopus now includes adapter scaffolds for several common educational document types.
 ## Supported adapter interfaces
 - PDF adapter
 - DOCX adapter
 - PPTX adapter
 - HTML adapter
 - Markdown adapter
 - text adapter
 ## Current status
 The current implementation is intentionally conservative:
 - it focuses on stable interfaces
 - it extracts text in a simplified way
 - it normalizes results into shared internal structures
 ## Why this matters
 Educational material commonly lives in:
 - syllabi PDFs
 - DOCX notes
 - PowerPoint slide decks
 - LMS HTML exports
 - markdown lesson files
 A useful curriculum distiller must be able to treat these as first-class inputs.
 ## Adapter contract
 Each adapter returns a normalized document record with:
 - source path
 - source type
 - title
 - extracted text
 - sections
 - metadata
 This record is then passed into higher-level course/topic distillation logic.
--- a/docs/faq.md
+++ b/docs/faq.md
@ -1,27 +1,25 @@
 # FAQ
-## Why multi-source ingestion?
+## Why add document adapters now?
-Because course structure is usually distributed across several files rather than
+Because real educational material is rarely provided in only one plain-text format.
 perfectly contained in one source.
-## What kinds of conflicts can arise?
+## Are these full-fidelity parsers?
-Common examples:
+Not yet. The current implementation is a stable scaffold for extraction and normalization.
 - the same lesson with slightly different titles
 - inconsistent terminology across notes and transcripts
 - exercises present in one source but absent in another
 - project prompts implied in one file and explicit in another
-## Does the system resolve all conflicts automatically?
+## Why add cross-course merging?
-No. It produces a merged draft pack and a conflict report for human review.
+Because one course often under-specifies a domain, while multiple sources together can produce a better draft pack.
-## Why not rely only on embeddings for this?
+## Does the merger resolve every concept conflict automatically?
-Because Didactopus needs explicit structures such as:
+No. It produces a merged draft plus a conflict report for human review.
- concepts
+
- prerequisites
+## What kinds of issues are flagged?
- projects
+
- rubrics
+Examples:
- checkpoints
+- repeated concepts with different names
 - same term used with different local contexts
 - courses that introduce topics in conflicting orders
 - weak or thin concept descriptions
--- a/examples/generated_topic_pack/concepts.yaml
+++ b/examples/generated_topic_pack/concepts.yaml
@ -0,0 +1,42 @@
 concepts:
 - id: descriptive-statistics
  title: Descriptive Statistics
  description: 'Objective: Explain mean, median, and variance.
    Exercise: Summarize a small dataset.
    Descriptive Statistics introduces center and spread.'
  prerequisites: []
  mastery_signals:
  - Summarize a small dataset.
  mastery_profile: {}
 - id: probability-basics
  title: Probability Basics
  description: 'Objective: Explain conditional probability.
    Exercise: Compute a simple conditional probability.
    Probability Basics introduces events and likelihood.'
  prerequisites:
  - descriptive-statistics
  mastery_signals:
  - Compute a simple conditional probability.
  mastery_profile: {}
 - id: prior-and-posterior
  title: Prior And Posterior
  description: 'Prior and Posterior are central concepts. Prior reflects assumptions
    before evidence. Exercise: Compare prior and posterior beliefs.'
  prerequisites:
  - probability-basics
  mastery_signals:
  - Compare prior and posterior beliefs.
  mastery_profile: {}
 - id: model-checking
  title: Model Checking
  description: 'A weakness is hidden assumptions. A limitation is poor fit. Uncertainty
    remains. Exercise: Critique a simple inference model.'
  prerequisites:
  - prior-and-posterior
  mastery_signals:
  - Critique a simple inference model.
  mastery_profile: {}
--- a/examples/generated_topic_pack/conflict_report.md
+++ b/examples/generated_topic_pack/conflict_report.md
@ -0,0 +1,3 @@
 # Conflict Report
 - Lesson 'prior and posterior' was merged from multiple sources; review ordering assumptions.
--- a/examples/generated_topic_pack/license_attribution.json
+++ b/examples/generated_topic_pack/license_attribution.json
@ -0,0 +1,30 @@
 {
  "rights_note": "REVIEW REQUIRED",
  "sources": [
    {
      "source_path": "examples/intro_bayes_outline.md",
      "source_type": "markdown",
      "title": "Intro Bayes Outline"
    },
    {
      "source_path": "examples/intro_bayes_lecture.html",
      "source_type": "html",
      "title": "Intro Bayes Lecture"
    },
    {
      "source_path": "examples/intro_bayes_slides.pptx",
      "source_type": "pptx",
      "title": "Intro Bayes Slides"
    },
    {
      "source_path": "examples/intro_bayes_notes.docx",
      "source_type": "docx",
      "title": "Intro Bayes Notes"
    },
    {
      "source_path": "examples/intro_bayes_syllabus.pdf",
      "source_type": "pdf",
      "title": "Intro Bayes Syllabus"
    }
  ]
 }
--- a/examples/generated_topic_pack/pack.yaml
+++ b/examples/generated_topic_pack/pack.yaml
@ -0,0 +1,14 @@
 name: introductory-bayesian-inference
 display_name: Introductory Bayesian Inference
 version: 0.1.0-draft
 schema_version: '1'
 didactopus_min_version: 0.1.0
 didactopus_max_version: 0.9.99
 description: Draft topic pack generated from multi-course inputs for 'Introductory
  Bayesian Inference'.
 author: Wesley R. Elsberry
 license: REVIEW-REQUIRED
 dependencies: []
 overrides: []
 profile_templates: {}
 cross_pack_links: []
--- a/examples/generated_topic_pack/projects.yaml
+++ b/examples/generated_topic_pack/projects.yaml
@ -0,0 +1,7 @@
 projects:
 - id: prior-and-posterior
  title: Prior And Posterior
  difficulty: review-required
  prerequisites: []
  deliverables:
  - project artifact
--- a/examples/generated_topic_pack/review_report.md
+++ b/examples/generated_topic_pack/review_report.md
@ -0,0 +1,3 @@
 # Review Report
 - Module 'Imported from PPTX' appears to contain project-like material; review project extraction.
--- a/examples/generated_topic_pack/roadmap.yaml
+++ b/examples/generated_topic_pack/roadmap.yaml
@ -0,0 +1,17 @@
 stages:
 - id: stage-1
  title: Imported from MARKDOWN
  concepts:
  - descriptive-statistics
  - probability-basics
  checkpoint: []
 - id: stage-2
  title: Imported from HTML
  concepts:
  - prior-and-posterior
  checkpoint: []
 - id: stage-3
  title: Imported from DOCX
  concepts:
  - model-checking
  checkpoint: []
--- a/examples/generated_topic_pack/rubrics.yaml
+++ b/examples/generated_topic_pack/rubrics.yaml
@ -0,0 +1,6 @@
 rubrics:
 - id: draft-rubric
  title: Draft Rubric
  criteria:
  - correctness
  - explanation
--- a/examples/intro_bayes_lecture.html
+++ b/examples/intro_bayes_lecture.html
@ -0,0 +1,7 @@
 <html><body>
 <h1>Introductory Bayesian Inference</h1>
 <h2>Bayesian Updating</h2>
 <h3>Prior and Posterior</h3>
 <p>Prior and Posterior are central concepts. Prior reflects assumptions before evidence.</p>
 <p>Exercise: Compare prior and posterior beliefs.</p>
 </body></html>
--- a/examples/intro_bayes_notes.docx
+++ b/examples/intro_bayes_notes.docx
@ -0,0 +1,6 @@
 # Bayesian Notes
 ## Model Critique
 ### Model Checking
 A weakness is hidden assumptions. A limitation is poor fit. Uncertainty remains.
 Exercise: Critique a simple inference model.
--- a/examples/intro_bayes_outline.md
+++ b/examples/intro_bayes_outline.md
@ -0,0 +1,12 @@
 # Introductory Bayesian Inference
 ## Foundations
 ### Descriptive Statistics
 Objective: Explain mean, median, and variance.
 Exercise: Summarize a small dataset.
 Descriptive Statistics introduces center and spread.
 ### Probability Basics
 Objective: Explain conditional probability.
 Exercise: Compute a simple conditional probability.
 Probability Basics introduces events and likelihood.
--- a/examples/intro_bayes_slides.pptx
+++ b/examples/intro_bayes_slides.pptx
@ -0,0 +1,7 @@
 # Bayesian Slides
 ## Bayesian Updating
 ### Prior and Posterior
 Prior and Posterior summary slide text.
 Capstone Mini Project
 Exercise: Write a short project report comparing priors and posteriors.
--- a/examples/intro_bayes_syllabus.pdf
+++ b/examples/intro_bayes_syllabus.pdf
@ -0,0 +1,5 @@
 # Bayesian Syllabus
 ## Schedule
 ### Foundations
 Objective: Explain descriptive statistics and conditional probability.
--- a/pyproject.toml
+++ b/pyproject.toml
@ -5,7 +5,7 @@ build-backend = "setuptools.build_meta"
 [project]
 name = "didactopus"
 version = "0.1.0"
-description = "Didactopus: multi-source course-to-pack ingestion scaffold"
+description = "Didactopus: document-adapter and cross-course merger scaffold"
 readme = "README.md"
 requires-python = ">=3.10"
 license = {text = "MIT"}
@ -16,7 +16,7 @@ dependencies = ["pydantic>=2.7", "pyyaml>=6.0"]
 dev = ["pytest>=8.0", "ruff>=0.6"]
 [project.scripts]
-didactopus-course-ingest = "didactopus.main:main"
+didactopus-topic-ingest = "didactopus.main:main"
 [tool.setuptools.packages.find]
 where = ["src"]
--- a/src/didactopus/config.py
+++ b/src/didactopus/config.py
@ -3,6 +3,15 @@ from pydantic import BaseModel, Field
 import yaml
 class DocumentAdaptersConfig(BaseModel):
    allow_pdf: bool = True
    allow_docx: bool = True
    allow_pptx: bool = True
    allow_html: bool = True
    allow_markdown: bool = True
    allow_text: bool = True
 class CourseIngestConfig(BaseModel):
    default_pack_author: str = "Unknown"
    default_license: str = "REVIEW-REQUIRED"
@ -10,23 +19,17 @@ class CourseIngestConfig(BaseModel):
    max_terms_per_lesson: int = 8
-class RulePolicyConfig(BaseModel):
+class CrossCourseConfig(BaseModel):
-    enable_prerequisite_order_rule: bool = True
+    detect_title_overlaps: bool = True
    enable_duplicate_term_merge_rule: bool = True
    enable_project_detection_rule: bool = True
    enable_review_flags: bool = True
 class MultisourceConfig(BaseModel):
    detect_duplicate_lessons: bool = True
    detect_term_conflicts: bool = True
    detect_order_conflicts: bool = True
    merge_same_named_lessons: bool = True
 class AppConfig(BaseModel):
    document_adapters: DocumentAdaptersConfig = Field(default_factory=DocumentAdaptersConfig)
    course_ingest: CourseIngestConfig = Field(default_factory=CourseIngestConfig)
-    rule_policy: RulePolicyConfig = Field(default_factory=RulePolicyConfig)
+    cross_course: CrossCourseConfig = Field(default_factory=CrossCourseConfig)
    multisource: MultisourceConfig = Field(default_factory=MultisourceConfig)
 def load_config(path: str | Path) -> AppConfig:
--- a/src/didactopus/course_schema.py
+++ b/src/didactopus/course_schema.py
@ -1,8 +1,21 @@
 from __future__ import annotations
 from pydantic import BaseModel, Field
 class Section(BaseModel):
    heading: str
    body: str = ""
 class NormalizedDocument(BaseModel):
    source_path: str
    source_type: str
    title: str = ""
    text: str = ""
    sections: list[Section] = Field(default_factory=list)
    metadata: dict = Field(default_factory=dict)
 class Lesson(BaseModel):
    title: str
    body: str = ""
@ -17,21 +30,18 @@ class Module(BaseModel):
    lessons: list[Lesson] = Field(default_factory=list)
 class NormalizedSourceRecord(BaseModel):
    source_name: str
    source_type: str
    source_path: str
    title: str = ""
    modules: list[Module] = Field(default_factory=list)
 class NormalizedCourse(BaseModel):
    title: str
    source_name: str = ""
    source_url: str = ""
    rights_note: str = ""
    modules: list[Module] = Field(default_factory=list)
-    source_records: list[NormalizedSourceRecord] = Field(default_factory=list)
+    source_records: list[NormalizedDocument] = Field(default_factory=list)
 class TopicBundle(BaseModel):
    topic_title: str
    courses: list[NormalizedCourse] = Field(default_factory=list)
 class ConceptCandidate(BaseModel):
@ -40,6 +50,7 @@ class ConceptCandidate(BaseModel):
    description: str = ""
    source_modules: list[str] = Field(default_factory=list)
    source_lessons: list[str] = Field(default_factory=list)
    source_courses: list[str] = Field(default_factory=list)
    prerequisites: list[str] = Field(default_factory=list)
    mastery_signals: list[str] = Field(default_factory=list)
--- a/src/didactopus/cross_course_conflicts.py
+++ b/src/didactopus/cross_course_conflicts.py
@ -0,0 +1,50 @@
 from __future__ import annotations
 from collections import defaultdict
 from .course_schema import NormalizedCourse, ConceptCandidate
 def detect_title_overlaps(course: NormalizedCourse) -> list[str]:
    lesson_to_sources = defaultdict(set)
    for module in course.modules:
        for lesson in module.lessons:
            for src in lesson.source_refs:
                lesson_to_sources[lesson.title.lower()].add(src)
    flags = []
    for title, sources in lesson_to_sources.items():
        if len(sources) > 1:
            flags.append(f"Lesson title '{title}' appears across multiple sources: {', '.join(sorted(sources))}")
    return flags
 def detect_term_conflicts(course: NormalizedCourse) -> list[str]:
    term_to_lessons = defaultdict(set)
    for module in course.modules:
        for lesson in module.lessons:
            for term in lesson.key_terms:
                term_to_lessons[term.lower()].add(lesson.title)
    flags = []
    for term, lessons in term_to_lessons.items():
        if len(lessons) > 1:
            flags.append(f"Key term '{term}' appears in multiple lesson contexts: {', '.join(sorted(lessons))}")
    return flags
 def detect_order_conflicts(course: NormalizedCourse) -> list[str]:
    # Placeholder heuristic: if same lesson title appears in multiple source_refs, flag for order review.
    flags = []
    for module in course.modules:
        for lesson in module.lessons:
            if len(set(lesson.source_refs)) > 1:
                flags.append(f"Lesson '{lesson.title}' was merged from multiple sources; review ordering assumptions.")
    return flags
 def detect_thin_concepts(concepts: list[ConceptCandidate]) -> list[str]:
    flags = []
    for concept in concepts:
        if len(concept.description.strip()) < 20:
            flags.append(f"Concept '{concept.title}' has a very thin description.")
        if not concept.mastery_signals:
            flags.append(f"Concept '{concept.title}' has no extracted mastery signals.")
    return flags
--- a/src/didactopus/document_adapters.py
+++ b/src/didactopus/document_adapters.py
@ -0,0 +1,141 @@
 from __future__ import annotations
 from pathlib import Path
 import re
 from .course_schema import NormalizedDocument, Section
 def _title_from_path(path: str | Path) -> str:
    p = Path(path)
    return p.stem.replace("_", " ").replace("-", " ").title()
 def _simple_section_split(text: str) -> list[Section]:
    sections = []
    current_heading = "Main"
    current_lines = []
    for line in text.splitlines():
        if re.match(r"^(#{1,3})\s+", line):
            if current_lines:
                sections.append(Section(heading=current_heading, body="\n".join(current_lines).strip()))
            current_heading = re.sub(r"^(#{1,3})\s+", "", line).strip()
            current_lines = []
        else:
            current_lines.append(line)
    if current_lines:
        sections.append(Section(heading=current_heading, body="\n".join(current_lines).strip()))
    return sections
 def read_textish(path: str | Path) -> str:
    return Path(path).read_text(encoding="utf-8")
 def adapt_markdown(path: str | Path) -> NormalizedDocument:
    text = read_textish(path)
    return NormalizedDocument(
        source_path=str(path),
        source_type="markdown",
        title=_title_from_path(path),
        text=text,
        sections=_simple_section_split(text),
        metadata={},
    )
 def adapt_text(path: str | Path) -> NormalizedDocument:
    text = read_textish(path)
    return NormalizedDocument(
        source_path=str(path),
        source_type="text",
        title=_title_from_path(path),
        text=text,
        sections=_simple_section_split(text),
        metadata={},
    )
 def adapt_html(path: str | Path) -> NormalizedDocument:
    raw = read_textish(path)
    text = re.sub(r"<[^>]+>", " ", raw)
    text = re.sub(r"\s+", " ", text).strip()
    return NormalizedDocument(
        source_path=str(path),
        source_type="html",
        title=_title_from_path(path),
        text=text,
        sections=[Section(heading="HTML Extract", body=text)],
        metadata={"extraction": "stub-html-strip"},
    )
 def adapt_pdf(path: str | Path) -> NormalizedDocument:
    # Stub: in a real implementation, plug in PDF text extraction here.
    text = read_textish(path)
    return NormalizedDocument(
        source_path=str(path),
        source_type="pdf",
        title=_title_from_path(path),
        text=text,
        sections=_simple_section_split(text),
        metadata={"extraction": "stub-pdf-text"},
    )
 def adapt_docx(path: str | Path) -> NormalizedDocument:
    # Stub: in a real implementation, plug in DOCX extraction here.
    text = read_textish(path)
    return NormalizedDocument(
        source_path=str(path),
        source_type="docx",
        title=_title_from_path(path),
        text=text,
        sections=_simple_section_split(text),
        metadata={"extraction": "stub-docx-text"},
    )
 def adapt_pptx(path: str | Path) -> NormalizedDocument:
    # Stub: in a real implementation, plug in PPTX extraction here.
    text = read_textish(path)
    return NormalizedDocument(
        source_path=str(path),
        source_type="pptx",
        title=_title_from_path(path),
        text=text,
        sections=_simple_section_split(text),
        metadata={"extraction": "stub-pptx-text"},
    )
 def detect_adapter(path: str | Path) -> str:
    p = Path(path)
    suffix = p.suffix.lower()
    if suffix == ".md":
        return "markdown"
    if suffix in {".txt"}:
        return "text"
    if suffix in {".html", ".htm"}:
        return "html"
    if suffix == ".pdf":
        return "pdf"
    if suffix == ".docx":
        return "docx"
    if suffix == ".pptx":
        return "pptx"
    return "text"
 def adapt_document(path: str | Path) -> NormalizedDocument:
    adapter = detect_adapter(path)
    if adapter == "markdown":
        return adapt_markdown(path)
    if adapter == "html":
        return adapt_html(path)
    if adapter == "pdf":
        return adapt_pdf(path)
    if adapter == "docx":
        return adapt_docx(path)
    if adapter == "pptx":
        return adapt_pptx(path)
    return adapt_text(path)
--- a/src/didactopus/main.py
+++ b/src/didactopus/main.py
@ -4,18 +4,19 @@ import argparse
 from pathlib import Path
 from .config import load_config
-from .course_ingest import parse_source_file, merge_source_records, extract_concept_candidates
+from .document_adapters import adapt_document
 from .topic_ingest import document_to_course, build_topic_bundle, merge_courses_into_topic_course, extract_concept_candidates
 from .cross_course_conflicts import detect_title_overlaps, detect_term_conflicts, detect_order_conflicts, detect_thin_concepts
 from .rule_policy import RuleContext, build_default_rules, run_rules
 from .conflict_report import detect_duplicate_lessons, detect_term_conflicts, detect_thin_concepts
 from .pack_emitter import build_draft_pack, write_draft_pack
 def build_parser() -> argparse.ArgumentParser:
-    parser = argparse.ArgumentParser(description="Didactopus multi-source course-to-pack ingestion pipeline")
+    parser = argparse.ArgumentParser(description="Didactopus document-adapter and cross-course topic ingestion")
-    parser.add_argument("--inputs", nargs="+", required=True, help="Input source files")
+    parser.add_argument("--inputs", nargs="+", required=True, help="Document inputs")
-    parser.add_argument("--title", required=True, help="Course or topic title")
+    parser.add_argument("--title", required=True, help="Topic title")
    parser.add_argument("--rights-note", default="REVIEW REQUIRED")
-    parser.add_argument("--output-dir", default="generated-pack")
+    parser.add_argument("--output-dir", default="generated-topic-pack")
    parser.add_argument("--config", default="configs/config.example.yaml")
    return parser
@ -24,33 +25,30 @@ def main() -> None:
    args = build_parser().parse_args()
    config = load_config(args.config)
-    records = [parse_source_file(path, title=args.title) for path in args.inputs]
+    docs = [adapt_document(path) for path in args.inputs]
-    course = merge_source_records(
+    courses = [document_to_course(doc, course_title=args.title) for doc in docs]
-        records=records,
+    topic = build_topic_bundle(args.title, courses)
-        course_title=args.title,
+    merged_course = merge_courses_into_topic_course(
-        rights_note=args.rights_note,
+        topic_bundle=topic,
-        merge_same_named_lessons=config.multisource.merge_same_named_lessons,
+        merge_same_named_lessons=config.cross_course.merge_same_named_lessons,
    )
-    concepts = extract_concept_candidates(course)
+    concepts = extract_concept_candidates(merged_course)
    context = RuleContext(course=course, concepts=concepts)
-    rules = build_default_rules(
+    context = RuleContext(course=merged_course, concepts=concepts)
-        enable_prereq=config.rule_policy.enable_prerequisite_order_rule,
+    rules = build_default_rules()
        enable_merge=config.rule_policy.enable_duplicate_term_merge_rule,
        enable_projects=config.rule_policy.enable_project_detection_rule,
        enable_review=config.rule_policy.enable_review_flags,
    )
    run_rules(context, rules)
    conflicts = []
-    if config.multisource.detect_duplicate_lessons:
+    if config.cross_course.detect_title_overlaps:
-        conflicts.extend(detect_duplicate_lessons(course))
+        conflicts.extend(detect_title_overlaps(merged_course))
-    if config.multisource.detect_term_conflicts:
+    if config.cross_course.detect_term_conflicts:
-        conflicts.extend(detect_term_conflicts(course))
+        conflicts.extend(detect_term_conflicts(merged_course))
    if config.cross_course.detect_order_conflicts:
        conflicts.extend(detect_order_conflicts(merged_course))
    conflicts.extend(detect_thin_concepts(context.concepts))
    draft = build_draft_pack(
-        course=course,
+        course=merged_course,
        concepts=context.concepts,
        author=config.course_ingest.default_pack_author,
        license_name=config.course_ingest.default_license,
@ -59,10 +57,11 @@ def main() -> None:
    )
    write_draft_pack(draft, args.output_dir)
-    print("== Didactopus Multi-Source Course Ingest ==")
+    print("== Didactopus Cross-Course Topic Ingest ==")
-    print(f"Course: {course.title}")
+    print(f"Topic: {args.title}")
-    print(f"Sources: {len(records)}")
+    print(f"Documents: {len(docs)}")
-    print(f"Modules: {len(course.modules)}")
+    print(f"Courses: {len(courses)}")
    print(f"Merged modules: {len(merged_course.modules)}")
    print(f"Concept candidates: {len(context.concepts)}")
    print(f"Review flags: {len(context.review_flags)}")
    print(f"Conflicts: {len(conflicts)}")
--- a/src/didactopus/pack_emitter.py
+++ b/src/didactopus/pack_emitter.py
@ -15,7 +15,7 @@ def build_draft_pack(course: NormalizedCourse, concepts: list[ConceptCandidate],
        "schema_version": "1",
        "didactopus_min_version": "0.1.0",
        "didactopus_max_version": "0.9.99",
-        "description": f"Draft pack generated from multi-source course inputs for '{course.title}'.",
+        "description": f"Draft topic pack generated from multi-course inputs for '{course.title}'.",
        "author": author,
        "license": license_name,
        "dependencies": [],
@ -64,7 +64,7 @@ def build_draft_pack(course: NormalizedCourse, concepts: list[ConceptCandidate],
    attribution = {
        "rights_note": course.rights_note,
        "sources": [
-            {"source_name": src.source_name, "source_type": src.source_type, "source_path": src.source_path}
+            {"source_path": src.source_path, "source_type": src.source_type, "title": src.title}
            for src in course.source_records
        ],
    }
@ -88,11 +88,8 @@ def write_draft_pack(pack: DraftPack, outdir: str | Path) -> None:
    (out / "roadmap.yaml").write_text(yaml.safe_dump(pack.roadmap, sort_keys=False), encoding="utf-8")
    (out / "projects.yaml").write_text(yaml.safe_dump(pack.projects, sort_keys=False), encoding="utf-8")
    (out / "rubrics.yaml").write_text(yaml.safe_dump(pack.rubrics, sort_keys=False), encoding="utf-8")
    review_lines = ["# Review Report", ""] + [f"- {flag}" for flag in pack.review_report] if pack.review_report else ["# Review Report", "", "- none"]
    (out / "review_report.md").write_text("\n".join(review_lines), encoding="utf-8")
    conflict_lines = ["# Conflict Report", ""] + [f"- {flag}" for flag in pack.conflicts] if pack.conflicts else ["# Conflict Report", "", "- none"]
    (out / "conflict_report.md").write_text("\n".join(conflict_lines), encoding="utf-8")
    (out / "license_attribution.json").write_text(json.dumps(pack.attribution, indent=2), encoding="utf-8")
--- a/src/didactopus/rule_policy.py
+++ b/src/didactopus/rule_policy.py
@ -39,6 +39,7 @@ def duplicate_term_merge_rule(context: RuleContext) -> None:
        if key in seen:
            seen[key].source_modules.extend(x for x in concept.source_modules if x not in seen[key].source_modules)
            seen[key].source_lessons.extend(x for x in concept.source_lessons if x not in seen[key].source_lessons)
            seen[key].source_courses.extend(x for x in concept.source_courses if x not in seen[key].source_courses)
            if concept.description and len(seen[key].description) < len(concept.description):
                seen[key].description = concept.description
        else:
--- a/src/didactopus/topic_ingest.py
+++ b/src/didactopus/topic_ingest.py
@ -0,0 +1,126 @@
 from __future__ import annotations
 import re
 from collections import defaultdict
 from .course_schema import NormalizedDocument, NormalizedCourse, Module, Lesson, TopicBundle, ConceptCandidate
 def slugify(text: str) -> str:
    cleaned = re.sub(r"[^a-zA-Z0-9]+", "-", text.strip().lower()).strip("-")
    return cleaned or "untitled"
 def extract_key_terms(text: str, min_term_length: int = 4, max_terms: int = 8) -> list[str]:
    candidates = re.findall(r"\b[A-Z][A-Za-z0-9\-]{%d,}\b" % (min_term_length - 1), text)
    seen = set()
    out = []
    for term in candidates:
        if term not in seen:
            seen.add(term)
            out.append(term)
        if len(out) >= max_terms:
            break
    return out
 def document_to_course(doc: NormalizedDocument, course_title: str) -> NormalizedCourse:
    # Conservative mapping: each section becomes a lesson; all lessons go into one module.
    lessons = []
    for section in doc.sections:
        body = section.body.strip()
        lines = body.splitlines()
        objectives = []
        exercises = []
        for line in lines:
            low = line.lower().strip()
            if low.startswith("objective:"):
                objectives.append(line.split(":", 1)[1].strip())
            if low.startswith("exercise:"):
                exercises.append(line.split(":", 1)[1].strip())
        lessons.append(
            Lesson(
                title=section.heading.strip() or "Untitled Lesson",
                body=body,
                objectives=objectives,
                exercises=exercises,
                key_terms=extract_key_terms(section.heading + "\n" + body),
                source_refs=[doc.source_path],
            )
        )
    module = Module(title=f"Imported from {doc.source_type.upper()}", lessons=lessons)
    return NormalizedCourse(title=course_title, modules=[module], source_records=[doc])
 def build_topic_bundle(topic_title: str, courses: list[NormalizedCourse]) -> TopicBundle:
    return TopicBundle(topic_title=topic_title, courses=courses)
 def merge_courses_into_topic_course(topic_bundle: TopicBundle, merge_same_named_lessons: bool = True) -> NormalizedCourse:
    modules_by_title: dict[str, Module] = {}
    source_records = []
    for course in topic_bundle.courses:
        source_records.extend(course.source_records)
        for module in course.modules:
            target_module = modules_by_title.setdefault(module.title, Module(title=module.title, lessons=[]))
            if merge_same_named_lessons:
                lesson_map = {lesson.title: lesson for lesson in target_module.lessons}
                for lesson in module.lessons:
                    if lesson.title in lesson_map:
                        existing = lesson_map[lesson.title]
                        if lesson.body and lesson.body not in existing.body:
                            existing.body = (existing.body + "\n\n" + lesson.body).strip()
                        for x in lesson.objectives:
                            if x not in existing.objectives:
                                existing.objectives.append(x)
                        for x in lesson.exercises:
                            if x not in existing.exercises:
                                existing.exercises.append(x)
                        for x in lesson.key_terms:
                            if x not in existing.key_terms:
                                existing.key_terms.append(x)
                        for x in lesson.source_refs:
                            if x not in existing.source_refs:
                                existing.source_refs.append(x)
                    else:
                        target_module.lessons.append(lesson)
            else:
                target_module.lessons.extend(module.lessons)
    return NormalizedCourse(title=topic_bundle.topic_title, modules=list(modules_by_title.values()), source_records=source_records)
 def extract_concept_candidates(course: NormalizedCourse) -> list[ConceptCandidate]:
    concepts = []
    seen_ids = set()
    for module in course.modules:
        for lesson in module.lessons:
            cid = slugify(lesson.title)
            if cid not in seen_ids:
                seen_ids.add(cid)
                concepts.append(
                    ConceptCandidate(
                        id=cid,
                        title=lesson.title,
                        description=lesson.body[:240].strip(),
                        source_modules=[module.title],
                        source_lessons=[lesson.title],
                        source_courses=list(lesson.source_refs),
                        mastery_signals=list(lesson.objectives[:3] or lesson.exercises[:2]),
                    )
                )
            for term in lesson.key_terms:
                tid = slugify(term)
                if tid in seen_ids:
                    continue
                seen_ids.add(tid)
                concepts.append(
                    ConceptCandidate(
                        id=tid,
                        title=term,
                        description=f"Candidate concept extracted from lesson '{lesson.title}'.",
                        source_modules=[module.title],
                        source_lessons=[lesson.title],
                        source_courses=list(lesson.source_refs),
                        mastery_signals=list(lesson.objectives[:2]),
                    )
                )
    return concepts
--- a/tests/test_cross_course_conflicts.py
+++ b/tests/test_cross_course_conflicts.py
@ -0,0 +1,19 @@
 from pathlib import Path
 from didactopus.document_adapters import adapt_document
 from didactopus.topic_ingest import document_to_course, build_topic_bundle, merge_courses_into_topic_course, extract_concept_candidates
 from didactopus.cross_course_conflicts import detect_title_overlaps, detect_term_conflicts, detect_order_conflicts, detect_thin_concepts
 def test_conflict_detection(tmp_path: Path) -> None:
    a = tmp_path / "a.md"
    b = tmp_path / "b.md"
    a.write_text("# T\n\n## M1\n### Bayesian Updating\nPrior and Posterior appear here.", encoding="utf-8")
    b.write_text("# T\n\n## M2\n### Bayesian Updating\nPrior and Posterior appear again.", encoding="utf-8")
    docs = [adapt_document(a), adapt_document(b)]
    courses = [document_to_course(doc, "Topic") for doc in docs]
    merged = merge_courses_into_topic_course(build_topic_bundle("Topic", courses), merge_same_named_lessons=False)
    concepts = extract_concept_candidates(merged)
    assert isinstance(detect_title_overlaps(merged), list)
    assert isinstance(detect_term_conflicts(merged), list)
    assert isinstance(detect_order_conflicts(merged), list)
    assert isinstance(detect_thin_concepts(concepts), list)
--- a/tests/test_document_adapters.py
+++ b/tests/test_document_adapters.py
@ -0,0 +1,18 @@
 from pathlib import Path
 from didactopus.document_adapters import adapt_document, detect_adapter
 def test_detect_adapter() -> None:
    assert detect_adapter("a.md") == "markdown"
    assert detect_adapter("b.html") == "html"
    assert detect_adapter("c.pdf") == "pdf"
    assert detect_adapter("d.docx") == "docx"
    assert detect_adapter("e.pptx") == "pptx"
 def test_adapt_markdown(tmp_path: Path) -> None:
    p = tmp_path / "x.md"
    p.write_text("# T\n\n## A\nBody", encoding="utf-8")
    doc = adapt_document(p)
    assert doc.source_type == "markdown"
    assert len(doc.sections) >= 1
--- a/tests/test_pack_output.py
+++ b/tests/test_pack_output.py
@ -1,17 +1,20 @@
 from pathlib import Path
-from didactopus.course_ingest import parse_source_file, merge_source_records, extract_concept_candidates
+from didactopus.document_adapters import adapt_document
 from didactopus.topic_ingest import document_to_course, build_topic_bundle, merge_courses_into_topic_course, extract_concept_candidates
 from didactopus.rule_policy import RuleContext, build_default_rules, run_rules
 from didactopus.pack_emitter import build_draft_pack, write_draft_pack
-def test_emit_multisource_pack(tmp_path: Path) -> None:
+def test_emit_topic_pack(tmp_path: Path) -> None:
    src = tmp_path / "course.md"
-    src.write_text("# C\n\n## M1\n### Lesson A\n- Objective: Explain Topic A.\n- Exercise: Do task A.\nTopic A body.", encoding="utf-8")
+    src.write_text("# T\n\n## M\n### L\nExercise: Do task A.\nTopic A body.", encoding="utf-8")
-    course = merge_source_records([parse_source_file(src, title="Course")], course_title="Course")
+    doc = adapt_document(src)
-    concepts = extract_concept_candidates(course)
+    course = document_to_course(doc, "Topic")
-    ctx = RuleContext(course=course, concepts=concepts)
+    merged = merge_courses_into_topic_course(build_topic_bundle("Topic", [course]))
    concepts = extract_concept_candidates(merged)
    ctx = RuleContext(course=merged, concepts=concepts)
    run_rules(ctx, build_default_rules())
-    draft = build_draft_pack(course, ctx.concepts, "Tester", "REVIEW", ctx.review_flags, [])
+    draft = build_draft_pack(merged, ctx.concepts, "Tester", "REVIEW", ctx.review_flags, [])
    write_draft_pack(draft, tmp_path / "out")
    assert (tmp_path / "out" / "pack.yaml").exists()
    assert (tmp_path / "out" / "conflict_report.md").exists()
--- a/tests/test_topic_ingest.py
+++ b/tests/test_topic_ingest.py
@ -0,0 +1,26 @@
 from pathlib import Path
 from didactopus.document_adapters import adapt_document
 from didactopus.topic_ingest import document_to_course, build_topic_bundle, merge_courses_into_topic_course, extract_concept_candidates
 def test_cross_course_merge(tmp_path: Path) -> None:
    a = tmp_path / "a.md"
    b = tmp_path / "b.docx"
    a.write_text("# T\n\n## M\n### L1\nBody A", encoding="utf-8")
    b.write_text("# T\n\n## M\n### L1\nBody B", encoding="utf-8")
    docs = [adapt_document(a), adapt_document(b)]
    courses = [document_to_course(doc, "Topic") for doc in docs]
    topic = build_topic_bundle("Topic", courses)
    merged = merge_courses_into_topic_course(topic)
    assert len(merged.modules) >= 1
    assert len(merged.modules[0].lessons) == 1
 def test_extract_concepts(tmp_path: Path) -> None:
    a = tmp_path / "a.md"
    a.write_text("# T\n\n## M\n### Lesson A\nObjective: Explain Topic A.\nBody.", encoding="utf-8")
    doc = adapt_document(a)
    course = document_to_course(doc, "Topic")
    concepts = extract_concept_candidates(course)
    assert len(concepts) >= 1
		`@ -0,0 +1,3 @@`
							`# Conflict Report`

							`- Lesson 'prior and posterior' was merged from multiple sources; review ordering assumptions.`
		`@ -0,0 +1,3 @@`
							`# Review Report`

							`- Module 'Imported from PPTX' appears to contain project-like material; review project extraction.`