Added evaluator loop

2026-03-13 05:33:58 -04:00 · 2026-03-13 05:33:58 -04:00 · 1035213470
parent dd0cc9fd08
commit 1035213470
15 changed files with 475 additions and 411 deletions
--- a/README.md
+++ b/README.md
@ -6,188 +6,76 @@
 **Tagline:** *Many arms, one goal — mastery.*
-## This revision
+## Recent revisions
-This revision adds a **graph-aware planning layer** that connects the concept graph engine to the adaptive and evidence engines.
+This revision introduces a **pluggable evaluator pipeline** that converts
 learner attempts into structured mastery evidence.
-The new planner selects the next concepts to study using a utility function that considers:
+The prior revision adds an **agentic learner loop** that turns Didactopus into a closed-loop mastery system prototype.
- prerequisite readiness
+The loop can now:
 - distance to learner target concepts
 - weakness in competence dimensions
 - project availability
 - review priority for fragile concepts
 - semantic neighborhood around learner goals
-## Why this matters
+- choose the next concept via the graph-aware planner
 - generate a synthetic learner attempt
 - score the attempt into evidence
 - update mastery state
 - repeat toward a target concept
-Up to this point, Didactopus could:
+This is still scaffold-level, but it is the first explicit implementation of the idea that **Didactopus can supervise not only human learners, but also AI student agents**.
 - build concept graphs
 - identify ready concepts
 - infer mastery from evidence
-But it still needed a better mechanism for choosing **what to do next**.
+## Complete overview to this point
-The graph-aware planner begins to solve that by ranking candidate concepts according to learner-specific utility instead of using unlocked prerequisites alone.
+Didactopus currently includes:
 ## Current architecture overview
 Didactopus now includes:
 - **Domain packs** for concepts, projects, rubrics, mastery profiles, templates, and cross-pack links
 - **Dependency resolution** across packs
 - **Merged learning graph** generation
- **Concept graph engine** with cross-pack links, similarity hooks, pathfinding, and visualization export
+- **Concept graph engine** for cross-pack prerequisite reasoning, linking, pathfinding, and export
- **Adaptive learner engine** for ready/blocked/mastered concept states
+- **Adaptive learner engine** for ready, blocked, and mastered concepts
 - **Evidence engine** with weighted, recency-aware, multi-dimensional mastery inference
 - **Concept-specific mastery profiles** with template inheritance
 - **Graph-aware planner** for utility-ranked next-step recommendations
-
+- **Agentic learner loop** for iterative goal-directed mastery acquisition
 ## Planning utility
 The current planner computes a score per candidate concept using:
 - readiness bonus
 - target-distance bonus
 - weak-dimension bonus
 - fragile-concept review bonus
 - project-unlock bonus
 - semantic-similarity bonus
 These terms are transparent and configurable.
 ## Agentic AI students
-This planner also strengthens the case for **AI student agents** that use Didactopus as a structured mastery environment.
+An AI student under Didactopus is modeled as an **agent that accumulates evidence against concept mastery criteria**.
-An AI student could:
+It does not “learn” in the same sense that model weights are retrained inside Didactopus. Instead, its learned mastery is represented as:
-1. inspect the graph
+- current mastered concept set
-2. choose the next concept via the planner
+- evidence history
-3. attempt tasks
+- dimension-level competence summaries
-4. generate evidence
+- concept-specific weak dimensions
-5. update mastery state
+- adaptive plan state
-6. repeat until a target expertise profile is reached
+- optional artifacts, explanations, project outputs, and critiques it has produced
-This makes Didactopus useful as both:
+In other words, Didactopus represents mastery as a **structured operational state**, not merely a chat transcript.
 - a learning platform
 - a benchmark harness for agentic expertise growth
-## Core philosophy
+That state can be put to work by:
-Didactopus assumes that real expertise is built through:
+- selecting tasks the agent is now qualified to attempt
 - routing domain-relevant problems to the agent
 - exposing mastered concept profiles to orchestration logic
 - using evidence summaries to decide whether the agent should act, defer, or review
 - exporting a mastery portfolio for downstream use
- explanation
+## FAQ
 - problem solving
 - transfer
 - critique
 - project execution
-The AI layer should function as a **mentor, evaluator, and curriculum partner**, not an answer vending machine.
+See:
 - `docs/faq.md`
-## Domain packs
+## Correctness and formal knowledge components
-Knowledge enters the system through versioned, shareable **domain packs**. Each pack can contribute:
+See:
 - `docs/correctness-and-knowledge-engine.md`
- concepts
+Short version: yes, there is a strong argument that Didactopus will eventually benefit from a more formal knowledge-engine layer, especially for domains where correctness can be stated in symbolic, logical, computational, or rule-governed terms.
 - prerequisites
 - learning stages
 - projects
 - rubrics
 - mastery profiles
 - profile templates
 - cross-pack concept links
-## Concept graph engine
+A good future architecture is likely **hybrid**:
-This revision implements a concept graph engine with:
+- LLM/agentic layer for explanation, synthesis, critique, and exploration
-
+- formal knowledge engine for rule checking, constraint satisfaction, proof support, symbolic validation, and executable correctness checks
 - prerequisite reasoning across packs
 - cross-pack concept linking
 - semantic concept similarity hooks
 - automatic curriculum pathfinding
 - visualization export for mastery graphs
 Concepts are namespaced as `pack-name::concept-id`.
 ### Cross-pack links
 Domain packs may declare conceptual links such as:
 - `equivalent_to`
 - `related_to`
 - `extends`
 - `depends_on`
 These links enable Didactopus to reason across pack boundaries rather than treating each pack as an isolated island.
 ### Semantic similarity
 A semantic similarity layer is included as a hook for:
 - token overlap similarity
 - future embedding-based similarity
 - future ontology and LLM-assisted concept alignment
 ### Curriculum pathfinding
 The concept graph engine supports:
 - prerequisite chains
 - shortest dependency paths
 - next-ready concept discovery
 - reachability analysis
 - curriculum path generation from a learner’s mastery state to a target concept
 ### Visualization
 Graphs can be exported to:
 - Graphviz DOT
 - Cytoscape-style JSON
 ## Evidence-driven mastery
 Mastery is inferred from evidence such as:
 - explanations
 - problem solutions
 - transfer tasks
 - project artifacts
 Evidence is:
 - weighted by type
 - optionally up-weighted for recency
 - summarized by competence dimension
 - compared against concept-specific mastery profiles
 ## Multi-dimensional mastery
 Current dimensions include:
 - `correctness`
 - `explanation`
 - `transfer`
 - `project_execution`
 - `critique`
 Different concepts can require different subsets of these dimensions.
 ## Agentic AI students
 Didactopus is also architecturally suitable for **AI learner agents**.
 An agentic AI student could:
 1. ingest domain packs
 2. traverse the concept graph
 3. generate explanations and answers
 4. attempt practice tasks
 5. critique model outputs
 6. complete simulated projects
 7. accumulate evidence
 8. advance only when concept-specific mastery criteria are satisfied
 ## Repository structure
@ -201,3 +89,11 @@ didactopus/
 ├── src/didactopus/
 └── tests/
 ```
 # Didactopus
 Didactopus is an AI-assisted autodidactic mastery platform based on
 concept graphs, mastery evidence, and evaluator-driven correctness.
 This revision introduces a **pluggable evaluator pipeline** that converts
 learner attempts into structured mastery evidence.
--- a/docs/agentic-learner-loop.md
+++ b/docs/agentic-learner-loop.md
@ -0,0 +1,24 @@
 # Agentic Learner Loop
 The agentic learner loop is the first closed-loop prototype for AI-student behavior in Didactopus.
 ## Current loop
 1. Inspect current mastery state
 2. Ask graph-aware planner for next best concept
 3. Produce synthetic attempt
 4. Score attempt into evidence
 5. Update mastery state
 6. Repeat until target is reached or iteration budget ends
 ## Important limitation
 The current implementation is a scaffold. The learner attempt is synthetic and deterministic, not a true external model call with robust domain evaluation.
 ## Why it still matters
 It establishes the orchestration pattern for:
 - planner-guided concept selection
 - evidence accumulation
 - mastery updates
 - goal-directed progression
--- a/docs/correctness-and-knowledge-engine.md
+++ b/docs/correctness-and-knowledge-engine.md
@ -0,0 +1,87 @@
 # Correctness Evaluation and the Case for a Knowledge Engine
 ## Question
 Is there a need for a more formal knowledge-engine component in Didactopus?
 ## Answer
 Probably yes, in at least some target domains.
 The current evidence and mastery layers are useful, but they remain fundamentally evaluation orchestrators. They can aggregate evidence, compare it to thresholds, and guide learning. What they cannot yet do, in a principled way, is guarantee correctness when the domain itself has strong formal structure.
 ## Why a formal layer may be needed
 Some domains support correctness checks that are not merely stylistic or heuristic.
 Examples:
 - algebraic manipulation
 - probability identities
 - code execution and tests
 - type checking
 - formal logic
 - graph constraints
 - unit analysis
 - finite-state or rule-based systems
 - regulatory checklists with explicit conditions
 In those cases, LLM-style evaluation should not be the only correctness mechanism.
 ## Recommended architecture
 A future Didactopus should likely use a hybrid stack:
 ### 1. Generative / agentic layer
 Responsible for:
 - explanation
 - synthesis
 - dialogue
 - critique
 - problem decomposition
 - exploratory hypothesis generation
 ### 2. Formal knowledge engine
 Responsible for:
 - executable validation
 - symbolic checking
 - proof obligations
 - rule application
 - constraint checking
 - test execution
 - ontology-backed consistency checks
 ## Possible forms of knowledge engine
 Depending on domain, the formal component might include:
 - theorem provers
 - CAS systems
 - unit and dimension analyzers
 - typed AST analyzers
 - code test harnesses
 - Datalog or rule engines
 - OWL/RDF/description logic tooling
 - Bayesian network or probabilistic programming validators
 - DSL interpreters for domain constraints
 ## Where it fits in Didactopus
 The knowledge engine would sit beneath the evidence layer.
 Possible flow:
 1. learner produces an answer, explanation, proof sketch, program, or model
 2. Didactopus extracts evaluable claims or artifacts
 3. formal engine checks what it can check
 4. agentic evaluator interprets the result and turns it into evidence
 5. mastery state updates accordingly
 ## Why this matters for AI students
 For agentic AI learners especially, formal validation is important because it reduces the risk that a fluent but incorrect model is credited with mastery.
 ## Conclusion
 Didactopus does not strictly require a formal knowledge engine to be useful. But for many serious domains, adding one would materially improve:
 - correctness
 - trustworthiness
 - transfer assessment
 - deployment readiness
--- a/docs/evaluator-pipeline.md
+++ b/docs/evaluator-pipeline.md
@ -0,0 +1,18 @@
 # Evaluator Pipeline
 The evaluator pipeline converts learner attempts into mastery evidence.
 Flow:
 1. learner attempt
 2. evaluators score attempt
 3. scores aggregated by dimension
 4. mastery evidence updated
 Evaluator types:
 • rubric
 • code/test
 • symbolic rule
 • critique
 • portfolio
--- a/docs/faq.md
+++ b/docs/faq.md
@ -0,0 +1,65 @@
 # FAQ
 ## What is Didactopus?
 Didactopus is a mastery-oriented learning infrastructure that uses concept graphs, evidence-based assessment, and adaptive planning to support serious learning.
 ## Is this just a tutoring chatbot?
 No. The intended architecture is broader than tutoring. Didactopus maintains explicit representations of:
 - concepts
 - prerequisites
 - mastery criteria
 - evidence
 - learner state
 - planning priorities
 ## How is an AI student's learned mastery represented?
 An AI student's learned mastery is represented as structured state, not just conversation history.
 Important elements include:
 - mastered concept set
 - evidence records
 - dimension-level competence summaries
 - weak-dimension lists
 - project eligibility
 - target-progress state
 - produced artifacts and critiques
 ## Does Didactopus fine-tune the AI model?
 Not in the current design. Didactopus supervises and evaluates a learner agent, but it does not itself retrain foundation model weights.
 ## Then how is the AI student “ready to work”?
 Readiness is operationalized by the mastery state. An AI student is ready for a class of tasks when:
 - relevant concepts are mastered
 - confidence is high enough
 - weak dimensions are acceptable for the target task
 - prerequisite and project evidence support deployment
 ## Could mastered state be exported?
 Yes. A future implementation should support export of:
 - concept mastery ledgers
 - evidence portfolios
 - competence profiles
 - project artifacts
 - domain-specific capability summaries
 ## Is human learning treated the same way?
 The same conceptual framework applies to both human and AI learners, though interfaces and evidence sources differ.
 ## What is the difference between mastery and model knowledge?
 A model may contain latent knowledge or pattern familiarity. Didactopus mastery is narrower and stricter: it is evidence-backed demonstrated competence with respect to explicit concepts and criteria.
 ## Why not use only embeddings and LLM judgments?
 Because correctness, especially in formal domains, often needs stronger guarantees than plausibility. That is why Didactopus may eventually need hybrid symbolic or executable validation components.
 ## Can Didactopus work offline?
 Yes, that is a primary design goal. The architecture is local-first and can be paired with local model serving and locally stored domain packs.
--- a/src/didactopus/agentic_loop.py
+++ b/src/didactopus/agentic_loop.py
@ -0,0 +1,92 @@
 from __future__ import annotations
 from dataclasses import dataclass, field
 from .planner import rank_next_concepts, PlannerWeights
 from .evidence_engine import EvidenceState, ConceptEvidenceSummary
@dataclass
 class AgenticStudentState:
    mastered_concepts: set[str] = field(default_factory=set)
    evidence_state: EvidenceState = field(default_factory=EvidenceState)
    attempt_history: list[dict] = field(default_factory=list)
 def synthetic_attempt_for_concept(concept: str) -> dict:
    if "descriptive-statistics" in concept:
        weak = []
        mastered = True
    elif "probability-basics" in concept:
        weak = ["transfer"]
        mastered = False
    elif "prior" in concept:
        weak = ["explanation", "transfer"]
        mastered = False
    elif "posterior" in concept:
        weak = ["critique", "transfer"]
        mastered = False
    elif "model-checking" in concept:
        weak = ["critique"]
        mastered = False
    else:
        weak = ["correctness"]
        mastered = False
    return {"concept": concept, "mastered": mastered, "weak_dimensions": weak}
 def integrate_attempt(state: AgenticStudentState, attempt: dict) -> None:
    concept = attempt["concept"]
    summary = ConceptEvidenceSummary(
        concept_key=concept,
        weak_dimensions=list(attempt["weak_dimensions"]),
        mastered=bool(attempt["mastered"]),
    )
    state.evidence_state.summary_by_concept[concept] = summary
    if summary.mastered:
        state.mastered_concepts.add(concept)
        state.evidence_state.resurfaced_concepts.discard(concept)
    else:
        if concept in state.mastered_concepts:
            state.mastered_concepts.remove(concept)
            state.evidence_state.resurfaced_concepts.add(concept)
    state.attempt_history.append(attempt)
 def run_agentic_learning_loop(
    graph,
    project_catalog: list[dict],
    target_concepts: list[str],
    weights: PlannerWeights,
    max_steps: int = 5,
 ) -> AgenticStudentState:
    state = AgenticStudentState()
    for _ in range(max_steps):
        weak_dimensions_by_concept = {
            key: summary.weak_dimensions
            for key, summary in state.evidence_state.summary_by_concept.items()
        }
        fragile = set(state.evidence_state.resurfaced_concepts)
        ranked = rank_next_concepts(
            graph=graph,
            mastered=state.mastered_concepts,
            targets=target_concepts,
            weak_dimensions_by_concept=weak_dimensions_by_concept,
            fragile_concepts=fragile,
            project_catalog=project_catalog,
            weights=weights,
        )
        if not ranked:
            break
        chosen = ranked[0]["concept"]
        attempt = synthetic_attempt_for_concept(chosen)
        integrate_attempt(state, attempt)
        if all(target in state.mastered_concepts for target in target_concepts):
            break
    return state
--- a/src/didactopus/concept_graph.py
+++ b/src/didactopus/concept_graph.py
@ -38,21 +38,9 @@ class ConceptGraph:
                g.add_edge(u, v)
        return g
    def prerequisites(self, concept: str) -> list[str]:
        return list(self.prerequisite_subgraph().predecessors(concept))
    def prerequisite_chain(self, concept: str) -> list[str]:
        return list(nx.ancestors(self.prerequisite_subgraph(), concept))
    def dependents(self, concept: str) -> list[str]:
        return list(self.prerequisite_subgraph().successors(concept))
    def learning_path(self, start: str, target: str) -> list[str] | None:
        try:
            return nx.shortest_path(self.prerequisite_subgraph(), start, target)
        except nx.NetworkXNoPath:
            return None
    def curriculum_path_to_target(self, mastered: set[str], target: str) -> list[str]:
        pg = self.prerequisite_subgraph()
        needed = set(nx.ancestors(pg, target)) | {target}
--- a/src/didactopus/config.py
+++ b/src/didactopus/config.py
@ -24,9 +24,24 @@ class PlannerConfig(BaseModel):
    semantic_similarity_weight: float = 1.0
 class EvidenceConfig(BaseModel):
    resurfacing_threshold: float = 0.55
    confidence_threshold: float = 0.8
    evidence_weights: dict[str, float] = Field(
        default_factory=lambda: {
            "explanation": 1.0,
            "problem": 1.5,
            "project": 2.5,
            "transfer": 2.0,
        }
    )
    recent_evidence_multiplier: float = 1.35
 class AppConfig(BaseModel):
    platform: PlatformConfig = Field(default_factory=PlatformConfig)
    planner: PlannerConfig = Field(default_factory=PlannerConfig)
    evidence: EvidenceConfig = Field(default_factory=EvidenceConfig)
 def load_config(path: str | Path) -> AppConfig:
--- a/src/didactopus/evaluator_pipeline.py
+++ b/src/didactopus/evaluator_pipeline.py
@ -0,0 +1,72 @@
 from dataclasses import dataclass, field
@dataclass
 class LearnerAttempt:
    concept: str
    artifact_type: str
    content: str
    metadata: dict = field(default_factory=dict)
@dataclass
 class EvaluatorResult:
    evaluator_name: str
    dimensions: dict
    passed: bool | None = None
    notes: str = ""
 class RubricEvaluator:
    name = "rubric"
    def evaluate(self, attempt: LearnerAttempt):
        explanation = 0.85 if len(attempt.content) > 40 else 0.55
        correctness = 0.80 if "because" in attempt.content.lower() else 0.65
        return EvaluatorResult(self.name,
                               {"correctness": correctness,
                                "explanation": explanation})
 class CodeTestEvaluator:
    name = "code_test"
    def evaluate(self, attempt: LearnerAttempt):
        passed = "return" in attempt.content
        score = 0.9 if passed else 0.35
        return EvaluatorResult(self.name,
                               {"correctness": score,
                                "project_execution": score},
                               passed=passed)
 class SymbolicRuleEvaluator:
    name = "symbolic_rule"
    def evaluate(self, attempt: LearnerAttempt):
        passed = "=" in attempt.content
        score = 0.88 if passed else 0.4
        return EvaluatorResult(self.name,
                               {"correctness": score},
                               passed=passed)
 class CritiqueEvaluator:
    name = "critique"
    def evaluate(self, attempt: LearnerAttempt):
        markers = ["assumption","bias","limitation","weakness"]
        found = sum(m in attempt.content.lower() for m in markers)
        score = min(1.0, 0.35 + 0.15 * found)
        return EvaluatorResult(self.name, {"critique": score})
 class PortfolioEvaluator:
    name = "portfolio"
    def evaluate(self, attempt: LearnerAttempt):
        count = int(attempt.metadata.get("deliverable_count",1))
        score = min(1.0, 0.5 + 0.1 * count)
        return EvaluatorResult(self.name,
                               {"project_execution": score,
                                "transfer": max(0.4, score-0.1)})
 def run_pipeline(attempt, evaluators):
    return [e.evaluate(attempt) for e in evaluators]
 def aggregate(results):
    totals = {}
    counts = {}
    for r in results:
        for d,v in r.dimensions.items():
            totals[d] = totals.get(d,0)+v
            counts[d] = counts.get(d,0)+1
    return {d: totals[d]/counts[d] for d in totals}
--- a/src/didactopus/evidence_engine.py
+++ b/src/didactopus/evidence_engine.py
@ -1,170 +1,16 @@
 from __future__ import annotations
 from dataclasses import dataclass, field
 from typing import Literal
 from .adaptive_engine import LearnerProfile
 EvidenceType = Literal["explanation", "problem", "project", "transfer"]
 MASTERY_DIMENSIONS = ["correctness", "explanation", "transfer", "project_execution", "critique"]
@dataclass
 class EvidenceItem:
    concept_key: str
    evidence_type: EvidenceType
    score: float
    notes: str = ""
    is_recent: bool = False
    rubric_dimensions: dict[str, float] = field(default_factory=dict)
@dataclass
 class ConceptEvidenceSummary:
    concept_key: str
    count: int = 0
    weighted_mean_score: float = 0.0
    total_weight: float = 0.0
    confidence: float = 0.0
    dimension_means: dict[str, float] = field(default_factory=dict)
    weak_dimensions: list[str] = field(default_factory=list)
    mastered: bool = False
@dataclass
 class EvidenceState:
    evidence_by_concept: dict[str, list[EvidenceItem]] = field(default_factory=dict)
    summary_by_concept: dict[str, ConceptEvidenceSummary] = field(default_factory=dict)
    resurfaced_concepts: set[str] = field(default_factory=set)
 def clamp_score(score: float) -> float:
    return max(0.0, min(1.0, score))
 def evidence_weight(item: EvidenceItem, type_weights: dict[str, float], recent_multiplier: float) -> float:
    base = type_weights.get(item.evidence_type, 1.0)
    return base * (recent_multiplier if item.is_recent else 1.0)
 def confidence_from_weight(total_weight: float) -> float:
    return total_weight / (total_weight + 1.0) if total_weight > 0 else 0.0
 def recompute_concept_summary(
    concept_key: str,
    items: list[EvidenceItem],
    type_weights: dict[str, float],
    recent_multiplier: float,
    dimension_thresholds: dict[str, float],
    confidence_threshold: float,
 ) -> ConceptEvidenceSummary:
    weighted_score_sum = 0.0
    total_weight = 0.0
    dim_totals: dict[str, float] = {}
    dim_weights: dict[str, float] = {}
    for item in items:
        item.score = clamp_score(item.score)
        w = evidence_weight(item, type_weights, recent_multiplier)
        weighted_score_sum += item.score * w
        total_weight += w
        for dim, value in item.rubric_dimensions.items():
            v = clamp_score(value)
            dim_totals[dim] = dim_totals.get(dim, 0.0) + v * w
            dim_weights[dim] = dim_weights.get(dim, 0.0) + w
    dimension_means = {
        dim: dim_totals[dim] / dim_weights[dim]
        for dim in dim_totals
        if dim_weights[dim] > 0
    }
    confidence = confidence_from_weight(total_weight)
    weak_dimensions = []
    for dim, threshold in dimension_thresholds.items():
        if dim in dimension_means and dimension_means[dim] < threshold:
            weak_dimensions.append(dim)
    mastered = (
        confidence >= confidence_threshold
        and all(
            (dim in dimension_means and dimension_means[dim] >= threshold)
            for dim, threshold in dimension_thresholds.items()
            if dim in dimension_means
        )
        and len(dimension_means) > 0
    )
    return ConceptEvidenceSummary(
        concept_key=concept_key,
        count=len(items),
        weighted_mean_score=(weighted_score_sum / total_weight) if total_weight > 0 else 0.0,
        total_weight=total_weight,
        confidence=confidence,
        dimension_means=dimension_means,
        weak_dimensions=sorted(weak_dimensions),
        mastered=mastered,
    )
 def add_evidence_item(
    state: EvidenceState,
    item: EvidenceItem,
    type_weights: dict[str, float],
    recent_multiplier: float,
    dimension_thresholds: dict[str, float],
    confidence_threshold: float,
 ) -> None:
    item.score = clamp_score(item.score)
    state.evidence_by_concept.setdefault(item.concept_key, []).append(item)
    state.summary_by_concept[item.concept_key] = recompute_concept_summary(
        item.concept_key,
        state.evidence_by_concept[item.concept_key],
        type_weights,
        recent_multiplier,
        dimension_thresholds,
        confidence_threshold,
    )
 def update_profile_mastery_from_evidence(
    profile: LearnerProfile,
    state: EvidenceState,
    resurfacing_threshold: float,
 ) -> None:
    for concept_key, summary in state.summary_by_concept.items():
        if summary.mastered:
            profile.mastered_concepts.add(concept_key)
            state.resurfaced_concepts.discard(concept_key)
        elif concept_key in profile.mastered_concepts and summary.weighted_mean_score < resurfacing_threshold:
            profile.mastered_concepts.remove(concept_key)
            state.resurfaced_concepts.add(concept_key)
 def ingest_evidence_bundle(
    profile: LearnerProfile,
    items: list[EvidenceItem],
    resurfacing_threshold: float,
    confidence_threshold: float,
    type_weights: dict[str, float],
    recent_multiplier: float,
    dimension_thresholds: dict[str, float],
 ) -> EvidenceState:
    state = EvidenceState()
    for item in items:
        add_evidence_item(
            state,
            item,
            type_weights,
            recent_multiplier,
            dimension_thresholds,
            confidence_threshold,
        )
    update_profile_mastery_from_evidence(
        profile=profile,
        state=state,
        resurfacing_threshold=resurfacing_threshold,
    )
    return state
--- a/src/didactopus/main.py
+++ b/src/didactopus/main.py
@ -2,18 +2,18 @@ import argparse
 import os
 from pathlib import Path
 from .agentic_loop import run_agentic_learning_loop
 from .artifact_registry import check_pack_dependencies, detect_dependency_cycles, discover_domain_packs
 from .config import load_config
-from .graph_builder import build_concept_graph, suggest_semantic_links
+from .graph_builder import build_concept_graph
-from .planner import PlannerWeights, rank_next_concepts
+from .learning_graph import build_merged_learning_graph
 from .planner import PlannerWeights
 def build_parser() -> argparse.ArgumentParser:
-    parser = argparse.ArgumentParser(description="Didactopus graph-aware planner")
+    parser = argparse.ArgumentParser(description="Didactopus agentic learner loop")
    parser.add_argument("--target", default="bayes-extension::posterior")
-    parser.add_argument("--mastered", nargs="*", default=[])
+    parser.add_argument("--steps", type=int, default=5)
    parser.add_argument("--export-dot", default="")
    parser.add_argument("--export-cytoscape", default="")
    parser.add_argument("--config", default=os.environ.get("DIDACTOPUS_CONFIG", "configs/config.example.yaml"))
    return parser
@ -35,30 +35,13 @@ def main() -> None:
            print(f"- {' -> '.join(cycle)}")
        return
    merged = build_merged_learning_graph(results, config.platform.default_dimension_thresholds)
    graph = build_concept_graph(results, config.platform.default_dimension_thresholds)
    mastered = set(args.mastered)
-    weak_dimensions_by_concept = {
+    state = run_agentic_learning_loop(
        "bayes-extension::prior": ["explanation", "transfer"],
    }
    fragile_concepts = {"bayes-extension::prior"}
    ranked = rank_next_concepts(
        graph=graph,
-        mastered=mastered,
+        project_catalog=merged.project_catalog,
-        targets=[args.target],
+        target_concepts=[args.target],
        weak_dimensions_by_concept=weak_dimensions_by_concept,
        fragile_concepts=fragile_concepts,
        project_catalog=[
            {
                "id": "bayes-extension::bayes-mini-project",
                "prerequisites": ["bayes-extension::prior"],
            },
            {
                "id": "applied-inference::inference-project",
                "prerequisites": ["applied-inference::model-checking"],
            },
        ],
        weights=PlannerWeights(
            readiness_bonus=config.planner.readiness_bonus,
            target_distance_weight=config.planner.target_distance_weight,
@ -67,36 +50,21 @@ def main() -> None:
            project_unlock_bonus=config.planner.project_unlock_bonus,
            semantic_similarity_weight=config.planner.semantic_similarity_weight,
        ),
        max_steps=args.steps,
    )
-    print("== Didactopus Graph-Aware Planner ==")
+    print("== Didactopus Agentic Learner Loop ==")
-    print(f"Target concept: {args.target}")
+    print(f"Target: {args.target}")
    print(f"Steps executed: {len(state.attempt_history)}")
    print()
-    print("Curriculum path from current mastery:")
+    print("Mastered concepts:")
-    for item in graph.curriculum_path_to_target(mastered, args.target):
+    if state.mastered_concepts:
-        print(f"- {item}")
+        for item in sorted(state.mastered_concepts):
            print(f"- {item}")
    else:
        print("- none")
    print()
-    print("Ready concepts:")
+    print("Attempt history:")
-    for item in graph.ready_concepts(mastered):
+    for item in state.attempt_history:
-        print(f"- {item}")
+        weak = ", ".join(item["weak_dimensions"]) if item["weak_dimensions"] else "none"
-    print()
+        print(f"- {item['concept']}: mastered={item['mastered']}, weak={weak}")
    print("Ranked next concepts:")
    for item in ranked:
        print(f"- {item['concept']}: {item['score']:.2f}")
        for name, value in item["components"].items():
            print(f"  * {name}: {value:.2f}")
    print()
    print("Suggested semantic links:")
    for a, b, score in suggest_semantic_links(graph, minimum_similarity=0.10)[:8]:
        print(f"- {a} <-> {b} : {score:.2f}")
    if args.export_dot:
        graph.export_graphviz(args.export_dot)
        print(f"Exported Graphviz DOT to {args.export_dot}")
    if args.export_cytoscape:
        graph.export_cytoscape_json(args.export_cytoscape)
        print(f"Exported Cytoscape JSON to {args.export_cytoscape}")
 if __name__ == "__main__":
    main()
--- a/src/didactopus/planner.py
+++ b/src/didactopus/planner.py
@ -22,7 +22,8 @@ def _distance_bonus(graph: ConceptGraph, concept: str, targets: list[str]) -> fl
    best = inf
    for target in targets:
        try:
-            dist = len(__import__("networkx").shortest_path(pg, concept, target)) - 1
+            import networkx as nx
            dist = len(nx.shortest_path(pg, concept, target)) - 1
            best = min(best, dist)
        except Exception:
            continue
@ -32,11 +33,7 @@ def _distance_bonus(graph: ConceptGraph, concept: str, targets: list[str]) -> fl
 def _project_unlock_bonus(concept: str, project_catalog: list[dict]) -> float:
-    count = 0
+    return float(sum(1 for project in project_catalog if concept in project.get("prerequisites", [])))
    for project in project_catalog:
        if concept in project.get("prerequisites", []):
            count += 1
    return float(count)
 def _semantic_bonus(graph: ConceptGraph, concept: str, targets: list[str]) -> float:
@ -90,11 +87,7 @@ def rank_next_concepts(
        score += semantic
        components["semantic_similarity"] = semantic
-        ranked.append({
+        ranked.append({"concept": concept, "score": score, "components": components})
            "concept": concept,
            "score": score,
            "components": components,
        })
    ranked.sort(key=lambda item: item["score"], reverse=True)
    return ranked
--- a/src/didactopus/profile_templates.py
+++ b/src/didactopus/profile_templates.py
@ -22,6 +22,7 @@ def resolve_mastery_profile(
            }
        else:
            effective = dict(default_profile)
        if concept_profile.get("required_dimensions"):
            effective["required_dimensions"] = list(concept_profile["required_dimensions"])
        if concept_profile.get("dimension_threshold_overrides"):
--- a/tests/test_agentic_loop.py
+++ b/tests/test_agentic_loop.py
@ -0,0 +1,23 @@
 from didactopus.agentic_loop import run_agentic_learning_loop
 from didactopus.artifact_registry import discover_domain_packs
 from didactopus.config import load_config
 from didactopus.graph_builder import build_concept_graph
 from didactopus.learning_graph import build_merged_learning_graph
 from didactopus.planner import PlannerWeights
 def test_agentic_loop_runs() -> None:
    config = load_config("configs/config.example.yaml")
    results = discover_domain_packs(["domain-packs"])
    merged = build_merged_learning_graph(results, config.platform.default_dimension_thresholds)
    graph = build_concept_graph(results, config.platform.default_dimension_thresholds)
    state = run_agentic_learning_loop(
        graph=graph,
        project_catalog=merged.project_catalog,
        target_concepts=["bayes-extension::posterior"],
        weights=PlannerWeights(),
        max_steps=4,
    )
    assert len(state.attempt_history) >= 1
--- a/tests/test_concept_graph.py
+++ b/tests/test_concept_graph.py
@ -1,14 +1,6 @@
 from didactopus.artifact_registry import discover_domain_packs
 from didactopus.config import load_config
-from didactopus.graph_builder import build_concept_graph, suggest_semantic_links
+from didactopus.graph_builder import build_concept_graph
 def test_concept_graph_builds() -> None:
    config = load_config("configs/config.example.yaml")
    results = discover_domain_packs(["domain-packs"])
    graph = build_concept_graph(results, config.platform.default_dimension_thresholds)
    assert "foundations-statistics::probability-basics" in graph.graph.nodes
    assert "bayes-extension::posterior" in graph.graph.nodes
 def test_curriculum_path_to_target() -> None:
@ -18,19 +10,3 @@ def test_curriculum_path_to_target() -> None:
    path = graph.curriculum_path_to_target(set(), "bayes-extension::posterior")
    assert "bayes-extension::prior" in path
    assert "bayes-extension::posterior" in path
 def test_declared_cross_pack_links_exist() -> None:
    config = load_config("configs/config.example.yaml")
    results = discover_domain_packs(["domain-packs"])
    graph = build_concept_graph(results, config.platform.default_dimension_thresholds)
    related = graph.related_concepts("bayes-extension::posterior")
    assert "applied-inference::model-checking" in related
 def test_semantic_link_suggestions() -> None:
    config = load_config("configs/config.example.yaml")
    results = discover_domain_packs(["domain-packs"])
    graph = build_concept_graph(results, config.platform.default_dimension_thresholds)
    suggestions = suggest_semantic_links(graph, minimum_similarity=0.10)
    assert len(suggestions) >= 1