Didactopus/docs/synthesis_engine_architectu...

292 lines
6.3 KiB
Markdown

# Synthesis Engine Architecture
## Purpose
The synthesis engine identifies potentially useful conceptual overlaps across
packs, topics, and learning trajectories. Its goal is to help learners and
maintainers discover connections that improve understanding of the topic of
interest.
This is not merely a recommendation engine. It is a **cross-domain structural
discovery system**.
---
## Design goals
- identify meaningful connections across packs
- support analogy, transfer, and hidden-prerequisite discovery
- generate reviewer-friendly candidate proposals
- improve pack quality and curriculum design
- capture surprising learner or AI discoveries
- expose synthesis to users visually and operationally
---
## Kinds of synthesis targets
### 1. Cross-pack concept similarity
Examples:
- entropy ↔ entropy
- drift ↔ random walk
- selection pressure ↔ optimization pressure
### 2. Structural analogy
Examples:
- feedback loops in control theory and ecology
- graph search and evolutionary exploration
- signal detection in acoustics and statistical inference
### 3. Hidden prerequisite discovery
If learners repeatedly fail on a concept despite nominal prerequisites, a
missing dependency may exist.
### 4. Example transfer
A concept may become easier to understand when illustrated by examples from
another pack.
### 5. Skill transfer
A skill bundle from one domain may partially apply in another domain.
---
## Data model
### ConceptNode
- concept_id
- pack_id
- title
- description
- prerequisites
- tags
- examples
- glossary terms
- vector embedding
- graph neighborhood signature
### SynthesisCandidate
- synthesis_id
- source_concept_id
- target_concept_id
- source_pack_id
- target_pack_id
- synthesis_kind
- score_total
- score_semantic
- score_structural
- score_trajectory
- score_review_history
- explanation
- evidence
- current_status
### SynthesisCluster
Represents a small group of mutually related concepts across packs.
Fields:
- cluster_id
- member_concepts
- centroid_embedding
- theme_label
- notes
### HiddenPrerequisiteCandidate
- source_concept_id
- suspected_missing_prerequisite_id
- signal_strength
- supporting_fail_patterns
- reviewer_status
---
## Scoring methods
The engine should combine multiple signals.
### A. Semantic similarity score
Source:
- concept text
- glossary
- examples
- descriptions
- optional embeddings
Methods:
- cosine similarity on embeddings
- term overlap
- phrase normalization
- ontology-aware synonyms if available
### B. Structural similarity score
Source:
- prerequisite neighborhoods
- downstream dependencies
- graph motif similarity
- role in pack topology
Examples:
- concepts that sit in similar graph positions
- concepts that unlock similar kinds of later work
### C. Learner trajectory score
Source:
- shared error patterns
- similar mastery progression
- evidence timing
- co-improvement patterns across learners
Examples:
- learners who master A often learn B faster
- failure on X predicts later trouble on Y
### D. Reviewer history score
Source:
- accepted past synthesis suggestions
- rejected patterns
- reviewer preference patterns
Use:
- prioritize candidate types with strong track record
### E. Novelty score
Purpose:
- avoid flooding reviewers with obvious or duplicate links
Methods:
- de-duplicate against existing pack links
- penalize near-duplicate proposals
- boost under-explored high-signal regions
---
## Composite score
Suggested first composite:
score_total =
0.35 * semantic_similarity
+ 0.25 * structural_similarity
+ 0.20 * trajectory_signal
+ 0.10 * review_prior
+ 0.10 * novelty
This weighting should remain configurable.
---
## Discovery pipeline
### Step 1. Ingest graph and learner data
Inputs:
- packs
- concepts
- pack metadata
- learner states
- evidence histories
- artifacts
- knowledge exports
### Step 2. Compute concept features
For each concept:
- embedding
- prerequisite signature
- downstream signature
- learner-error signature
- example signature
### Step 3. Generate candidate pairs
Possible approaches:
- nearest neighbors in embedding space
- shared tag neighborhoods
- prerequisite motif matches
- frequent learner co-patterns
### Step 4. Re-rank candidates
Combine semantic, structural, and trajectory scores.
### Step 5. Group into synthesis clusters
Cluster related candidate pairs into themes such as:
- uncertainty
- feedback
- optimization
- conservation
- branching processes
### Step 6. Produce explanations
Each candidate should include a compact explanation, for example:
- “These concepts occupy similar prerequisite roles.”
- “Learner error patterns suggest a hidden shared dependency.”
- “Examples in pack A may clarify this concept in pack B.”
### Step 7. Send to review-and-promotion workflow
All candidates become reviewable objects rather than immediately modifying packs.
---
## Outputs
The engine should emit candidate objects suitable for promotion into:
- cross-pack links
- pack improvement suggestions
- curriculum draft notes
- skill-bundle drafts
- archived synthesis notes
---
## UI visualization
### 1. Synthesis map
Graph overlay showing:
- existing cross-pack links
- proposed synthesis links
- confidence levels
- accepted vs candidate status
### 2. Candidate explanation panel
For a selected proposed link:
- why it was suggested
- component scores
- source evidence
- similar accepted proposals
- reviewer actions
### 3. Cluster view
Shows higher-level themes connecting multiple packs.
### 4. Learner pathway overlay
Allows a maintainer to see where synthesis would help a learner currently stuck in
one pack by borrowing examples or structures from another.
### 5. Promotion workflow integration
Every synthesis candidate can be:
- accepted as pack improvement
- converted to curriculum draft
- converted to skill bundle
- archived
- rejected
---
## Appropriate uses
The synthesis engine is especially useful for:
- interdisciplinary education
- transfer learning support
- AI learner introspection
- pack maintenance
- curriculum design
- discovery of hidden structure
---
## Cautions
- synthesis suggestions are candidate aids, not guaranteed truths
- semantic similarity alone is not enough
- over-linking can confuse learners
- reviewers need concise explanation and provenance
- accepted synthesis should be visible as intentional structure, not accidental clutter