# Synthesis Engine Architecture ## Purpose The synthesis engine identifies potentially useful conceptual overlaps across packs, topics, and learning trajectories. Its goal is to help learners and maintainers discover connections that improve understanding of the topic of interest. This is not merely a recommendation engine. It is a **cross-domain structural discovery system**. --- ## Design goals - identify meaningful connections across packs - support analogy, transfer, and hidden-prerequisite discovery - generate reviewer-friendly candidate proposals - improve pack quality and curriculum design - capture surprising learner or AI discoveries - expose synthesis to users visually and operationally --- ## Kinds of synthesis targets ### 1. Cross-pack concept similarity Examples: - entropy ↔ entropy - drift ↔ random walk - selection pressure ↔ optimization pressure ### 2. Structural analogy Examples: - feedback loops in control theory and ecology - graph search and evolutionary exploration - signal detection in acoustics and statistical inference ### 3. Hidden prerequisite discovery If learners repeatedly fail on a concept despite nominal prerequisites, a missing dependency may exist. ### 4. Example transfer A concept may become easier to understand when illustrated by examples from another pack. ### 5. Skill transfer A skill bundle from one domain may partially apply in another domain. --- ## Data model ### ConceptNode - concept_id - pack_id - title - description - prerequisites - tags - examples - glossary terms - vector embedding - graph neighborhood signature ### SynthesisCandidate - synthesis_id - source_concept_id - target_concept_id - source_pack_id - target_pack_id - synthesis_kind - score_total - score_semantic - score_structural - score_trajectory - score_review_history - explanation - evidence - current_status ### SynthesisCluster Represents a small group of mutually related concepts across packs. Fields: - cluster_id - member_concepts - centroid_embedding - theme_label - notes ### HiddenPrerequisiteCandidate - source_concept_id - suspected_missing_prerequisite_id - signal_strength - supporting_fail_patterns - reviewer_status --- ## Scoring methods The engine should combine multiple signals. ### A. Semantic similarity score Source: - concept text - glossary - examples - descriptions - optional embeddings Methods: - cosine similarity on embeddings - term overlap - phrase normalization - ontology-aware synonyms if available ### B. Structural similarity score Source: - prerequisite neighborhoods - downstream dependencies - graph motif similarity - role in pack topology Examples: - concepts that sit in similar graph positions - concepts that unlock similar kinds of later work ### C. Learner trajectory score Source: - shared error patterns - similar mastery progression - evidence timing - co-improvement patterns across learners Examples: - learners who master A often learn B faster - failure on X predicts later trouble on Y ### D. Reviewer history score Source: - accepted past synthesis suggestions - rejected patterns - reviewer preference patterns Use: - prioritize candidate types with strong track record ### E. Novelty score Purpose: - avoid flooding reviewers with obvious or duplicate links Methods: - de-duplicate against existing pack links - penalize near-duplicate proposals - boost under-explored high-signal regions --- ## Composite score Suggested first composite: score_total = 0.35 * semantic_similarity + 0.25 * structural_similarity + 0.20 * trajectory_signal + 0.10 * review_prior + 0.10 * novelty This weighting should remain configurable. --- ## Discovery pipeline ### Step 1. Ingest graph and learner data Inputs: - packs - concepts - pack metadata - learner states - evidence histories - artifacts - knowledge exports ### Step 2. Compute concept features For each concept: - embedding - prerequisite signature - downstream signature - learner-error signature - example signature ### Step 3. Generate candidate pairs Possible approaches: - nearest neighbors in embedding space - shared tag neighborhoods - prerequisite motif matches - frequent learner co-patterns ### Step 4. Re-rank candidates Combine semantic, structural, and trajectory scores. ### Step 5. Group into synthesis clusters Cluster related candidate pairs into themes such as: - uncertainty - feedback - optimization - conservation - branching processes ### Step 6. Produce explanations Each candidate should include a compact explanation, for example: - “These concepts occupy similar prerequisite roles.” - “Learner error patterns suggest a hidden shared dependency.” - “Examples in pack A may clarify this concept in pack B.” ### Step 7. Send to review-and-promotion workflow All candidates become reviewable objects rather than immediately modifying packs. --- ## Outputs The engine should emit candidate objects suitable for promotion into: - cross-pack links - pack improvement suggestions - curriculum draft notes - skill-bundle drafts - archived synthesis notes --- ## UI visualization ### 1. Synthesis map Graph overlay showing: - existing cross-pack links - proposed synthesis links - confidence levels - accepted vs candidate status ### 2. Candidate explanation panel For a selected proposed link: - why it was suggested - component scores - source evidence - similar accepted proposals - reviewer actions ### 3. Cluster view Shows higher-level themes connecting multiple packs. ### 4. Learner pathway overlay Allows a maintainer to see where synthesis would help a learner currently stuck in one pack by borrowing examples or structures from another. ### 5. Promotion workflow integration Every synthesis candidate can be: - accepted as pack improvement - converted to curriculum draft - converted to skill bundle - archived - rejected --- ## Appropriate uses The synthesis engine is especially useful for: - interdisciplinary education - transfer learning support - AI learner introspection - pack maintenance - curriculum design - discovery of hidden structure --- ## Cautions - synthesis suggestions are candidate aids, not guaranteed truths - semantic similarity alone is not enough - over-linking can confuse learners - reviewers need concise explanation and provenance - accepted synthesis should be visible as intentional structure, not accidental clutter