Added mastery ledger and capability export.
This commit is contained in:
parent
687ed001fa
commit
db2cca50d0
59
README.md
59
README.md
|
|
@ -8,10 +8,67 @@
|
||||||
|
|
||||||
## Recent revisions
|
## Recent revisions
|
||||||
|
|
||||||
|
### Mastery Ledger
|
||||||
|
|
||||||
|
This revision adds a **Mastery Ledger + Capability Export** layer.
|
||||||
|
|
||||||
|
The main purpose is to let Didactopus turn accumulated learner state into
|
||||||
|
portable, inspectable artifacts that can support downstream deployment,
|
||||||
|
review, orchestration, or certification-like workflows.
|
||||||
|
|
||||||
|
#### What is new
|
||||||
|
|
||||||
|
- mastery ledger data model
|
||||||
|
- capability profile export
|
||||||
|
- JSON export of mastered concepts and evaluator summaries
|
||||||
|
- Markdown export of a readable capability report
|
||||||
|
- artifact manifest for produced deliverables
|
||||||
|
- demo CLI for generating exports for an AI student or human learner
|
||||||
|
- FAQ covering how learned mastery is represented and put to work
|
||||||
|
|
||||||
|
#### Why this matters
|
||||||
|
|
||||||
|
Didactopus can now do more than guide learning. It can also emit a structured
|
||||||
|
statement of what a learner appears able to do, based on explicit concepts,
|
||||||
|
evidence, and artifacts.
|
||||||
|
|
||||||
|
That makes it easier to use Didactopus as:
|
||||||
|
- a mastery tracker
|
||||||
|
- a portfolio generator
|
||||||
|
- a deployment-readiness aid
|
||||||
|
- an orchestration input for agent routing
|
||||||
|
|
||||||
|
#### Mastery representation
|
||||||
|
|
||||||
|
A learner's mastery is represented as structured operational state, including:
|
||||||
|
|
||||||
|
- mastered concepts
|
||||||
|
- evaluator results
|
||||||
|
- evidence summaries
|
||||||
|
- weak dimensions
|
||||||
|
- attempt history
|
||||||
|
- produced artifacts
|
||||||
|
- capability export
|
||||||
|
|
||||||
|
This is stricter than a normal chat transcript or self-description.
|
||||||
|
|
||||||
|
#### Future direction
|
||||||
|
|
||||||
|
A later revision should connect the capability export with:
|
||||||
|
- formal evaluator outputs
|
||||||
|
- signed evidence ledgers
|
||||||
|
- domain-specific capability schemas
|
||||||
|
- deployment policies for agent routing
|
||||||
|
|
||||||
|
|
||||||
|
### Evaluator Pipeline
|
||||||
|
|
||||||
This revision introduces a **pluggable evaluator pipeline** that converts
|
This revision introduces a **pluggable evaluator pipeline** that converts
|
||||||
learner attempts into structured mastery evidence.
|
learner attempts into structured mastery evidence.
|
||||||
|
|
||||||
The prior revision adds an **agentic learner loop** that turns Didactopus into a closed-loop mastery system prototype.
|
### Agentic Learner Loop
|
||||||
|
|
||||||
|
This revision adds an **agentic learner loop** that turns Didactopus into a closed-loop mastery system prototype.
|
||||||
|
|
||||||
The loop can now:
|
The loop can now:
|
||||||
|
|
||||||
|
|
|
||||||
76
docs/faq.md
76
docs/faq.md
|
|
@ -1,65 +1,37 @@
|
||||||
# FAQ
|
# FAQ
|
||||||
|
|
||||||
## What is Didactopus?
|
|
||||||
|
|
||||||
Didactopus is a mastery-oriented learning infrastructure that uses concept graphs, evidence-based assessment, and adaptive planning to support serious learning.
|
|
||||||
|
|
||||||
## Is this just a tutoring chatbot?
|
|
||||||
|
|
||||||
No. The intended architecture is broader than tutoring. Didactopus maintains explicit representations of:
|
|
||||||
- concepts
|
|
||||||
- prerequisites
|
|
||||||
- mastery criteria
|
|
||||||
- evidence
|
|
||||||
- learner state
|
|
||||||
- planning priorities
|
|
||||||
|
|
||||||
## How is an AI student's learned mastery represented?
|
## How is an AI student's learned mastery represented?
|
||||||
|
|
||||||
An AI student's learned mastery is represented as structured state, not just conversation history.
|
As structured operational state, including:
|
||||||
|
- mastered concepts
|
||||||
Important elements include:
|
- evaluator summaries
|
||||||
- mastered concept set
|
- weak dimensions
|
||||||
- evidence records
|
- evidence records
|
||||||
- dimension-level competence summaries
|
- artifacts
|
||||||
- weak-dimension lists
|
- capability export
|
||||||
- project eligibility
|
|
||||||
- target-progress state
|
|
||||||
- produced artifacts and critiques
|
|
||||||
|
|
||||||
## Does Didactopus fine-tune the AI model?
|
## Does Didactopus change the AI model weights?
|
||||||
|
|
||||||
Not in the current design. Didactopus supervises and evaluates a learner agent, but it does not itself retrain foundation model weights.
|
No. In the current architecture, Didactopus supervises and evaluates a learner
|
||||||
|
agent, but it does not retrain the foundation model.
|
||||||
|
|
||||||
## Then how is the AI student “ready to work”?
|
## How is an AI student ready to be put to work?
|
||||||
|
|
||||||
Readiness is operationalized by the mastery state. An AI student is ready for a class of tasks when:
|
Readiness is represented operationally. A downstream system can inspect:
|
||||||
- relevant concepts are mastered
|
- which concepts are mastered
|
||||||
- confidence is high enough
|
- which weak dimensions remain
|
||||||
- weak dimensions are acceptable for the target task
|
- what artifacts were produced
|
||||||
- prerequisite and project evidence support deployment
|
- what evaluator evidence supports deployment
|
||||||
|
|
||||||
## Could mastered state be exported?
|
## Is the capability export a certification?
|
||||||
|
|
||||||
Yes. A future implementation should support export of:
|
Not by itself. It is a structured mastery report. In future, it could be combined
|
||||||
- concept mastery ledgers
|
with formal evaluators, signed evidence records, and policy rules.
|
||||||
- evidence portfolios
|
|
||||||
- competence profiles
|
|
||||||
- project artifacts
|
|
||||||
- domain-specific capability summaries
|
|
||||||
|
|
||||||
## Is human learning treated the same way?
|
## Why is this useful?
|
||||||
|
|
||||||
The same conceptual framework applies to both human and AI learners, though interfaces and evidence sources differ.
|
Because it allows Didactopus outputs to feed into:
|
||||||
|
- task routing
|
||||||
## What is the difference between mastery and model knowledge?
|
- portfolio review
|
||||||
|
- benchmark comparison
|
||||||
A model may contain latent knowledge or pattern familiarity. Didactopus mastery is narrower and stricter: it is evidence-backed demonstrated competence with respect to explicit concepts and criteria.
|
- agent deployment policies
|
||||||
|
|
||||||
## Why not use only embeddings and LLM judgments?
|
|
||||||
|
|
||||||
Because correctness, especially in formal domains, often needs stronger guarantees than plausibility. That is why Didactopus may eventually need hybrid symbolic or executable validation components.
|
|
||||||
|
|
||||||
## Can Didactopus work offline?
|
|
||||||
|
|
||||||
Yes, that is a primary design goal. The architecture is local-first and can be paired with local model serving and locally stored domain packs.
|
|
||||||
|
|
|
||||||
|
|
@ -0,0 +1,31 @@
|
||||||
|
# Mastery Ledger
|
||||||
|
|
||||||
|
The mastery ledger is the structured record of what a learner has demonstrated.
|
||||||
|
|
||||||
|
## Core contents
|
||||||
|
|
||||||
|
- learner identity
|
||||||
|
- target domain or goal
|
||||||
|
- mastered concepts
|
||||||
|
- concept-level evidence summaries
|
||||||
|
- weak dimensions
|
||||||
|
- artifact records
|
||||||
|
- generated capability profile
|
||||||
|
|
||||||
|
## Exports
|
||||||
|
|
||||||
|
This scaffold exports:
|
||||||
|
|
||||||
|
- JSON capability profile
|
||||||
|
- Markdown capability report
|
||||||
|
- artifact manifest JSON
|
||||||
|
|
||||||
|
## Why it matters
|
||||||
|
|
||||||
|
The mastery ledger provides an explicit representation of readiness.
|
||||||
|
It supports both human and AI learners.
|
||||||
|
|
||||||
|
## Important caveat
|
||||||
|
|
||||||
|
The current scaffold is not a formal certification system. It is a structured
|
||||||
|
capability report driven by the Didactopus evidence and evaluator pipeline.
|
||||||
|
|
@ -1,92 +1,132 @@
|
||||||
from __future__ import annotations
|
|
||||||
|
|
||||||
from dataclasses import dataclass, field
|
from dataclasses import dataclass, field
|
||||||
|
|
||||||
from .planner import rank_next_concepts, PlannerWeights
|
from .evaluator_pipeline import (
|
||||||
from .evidence_engine import EvidenceState, ConceptEvidenceSummary
|
LearnerAttempt,
|
||||||
|
RubricEvaluator,
|
||||||
|
CodeTestEvaluator,
|
||||||
|
SymbolicRuleEvaluator,
|
||||||
|
CritiqueEvaluator,
|
||||||
|
PortfolioEvaluator,
|
||||||
|
run_pipeline,
|
||||||
|
aggregate,
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class ConceptEvidenceSummary:
|
||||||
|
concept_key: str
|
||||||
|
weak_dimensions: list[str] = field(default_factory=list)
|
||||||
|
mastered: bool = False
|
||||||
|
aggregated: dict = field(default_factory=dict)
|
||||||
|
evaluators: list[str] = field(default_factory=list)
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class EvidenceState:
|
||||||
|
summary_by_concept: dict[str, ConceptEvidenceSummary] = field(default_factory=dict)
|
||||||
|
resurfaced_concepts: set[str] = field(default_factory=set)
|
||||||
|
|
||||||
|
|
||||||
@dataclass
|
@dataclass
|
||||||
class AgenticStudentState:
|
class AgenticStudentState:
|
||||||
|
learner_id: str = "demo-agent"
|
||||||
|
display_name: str = "Demo Agentic Student"
|
||||||
mastered_concepts: set[str] = field(default_factory=set)
|
mastered_concepts: set[str] = field(default_factory=set)
|
||||||
evidence_state: EvidenceState = field(default_factory=EvidenceState)
|
evidence_state: EvidenceState = field(default_factory=EvidenceState)
|
||||||
attempt_history: list[dict] = field(default_factory=list)
|
attempt_history: list[dict] = field(default_factory=list)
|
||||||
|
artifacts: list[dict] = field(default_factory=list)
|
||||||
|
|
||||||
|
|
||||||
def synthetic_attempt_for_concept(concept: str) -> dict:
|
def synthetic_attempt_for_concept(concept: str) -> LearnerAttempt:
|
||||||
if "descriptive-statistics" in concept:
|
if "descriptive-statistics" in concept:
|
||||||
weak = []
|
return LearnerAttempt(
|
||||||
mastered = True
|
concept=concept,
|
||||||
elif "probability-basics" in concept:
|
artifact_type="explanation",
|
||||||
weak = ["transfer"]
|
content="Mean and variance summarize a dataset because they describe center and spread.",
|
||||||
mastered = False
|
metadata={"deliverable_count": 1, "artifact_name": "descriptive_statistics_note.md"},
|
||||||
elif "prior" in concept:
|
)
|
||||||
weak = ["explanation", "transfer"]
|
if "probability-basics" in concept:
|
||||||
mastered = False
|
return LearnerAttempt(
|
||||||
elif "posterior" in concept:
|
concept=concept,
|
||||||
weak = ["critique", "transfer"]
|
artifact_type="explanation",
|
||||||
mastered = False
|
content="Conditional probability changes because context changes the sample space.",
|
||||||
elif "model-checking" in concept:
|
metadata={"deliverable_count": 1, "artifact_name": "probability_basics_note.md"},
|
||||||
weak = ["critique"]
|
)
|
||||||
mastered = False
|
if "prior" in concept:
|
||||||
else:
|
return LearnerAttempt(
|
||||||
weak = ["correctness"]
|
concept=concept,
|
||||||
mastered = False
|
artifact_type="explanation",
|
||||||
|
content="A prior is an assumption before evidence, but one limitation is bias.",
|
||||||
return {"concept": concept, "mastered": mastered, "weak_dimensions": weak}
|
metadata={"deliverable_count": 1, "artifact_name": "prior_reflection.md"},
|
||||||
|
)
|
||||||
|
if "posterior" in concept:
|
||||||
|
return LearnerAttempt(
|
||||||
|
concept=concept,
|
||||||
|
artifact_type="symbolic",
|
||||||
|
content="Therefore posterior = updated belief after evidence, but one assumption may be model fit.",
|
||||||
|
metadata={"deliverable_count": 1, "artifact_name": "posterior_symbolic_note.md"},
|
||||||
|
)
|
||||||
|
return LearnerAttempt(
|
||||||
|
concept=concept,
|
||||||
|
artifact_type="critique",
|
||||||
|
content="A weakness is hidden assumptions; a limitation is poor fit; uncertainty remains.",
|
||||||
|
metadata={"deliverable_count": 2, "artifact_name": "critique_report.md"},
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
def integrate_attempt(state: AgenticStudentState, attempt: dict) -> None:
|
def evaluator_set_for_attempt(attempt: LearnerAttempt):
|
||||||
concept = attempt["concept"]
|
evaluators = [RubricEvaluator(), CritiqueEvaluator()]
|
||||||
|
if attempt.artifact_type == "code":
|
||||||
|
evaluators.append(CodeTestEvaluator())
|
||||||
|
if attempt.artifact_type == "symbolic":
|
||||||
|
evaluators.append(SymbolicRuleEvaluator())
|
||||||
|
if attempt.artifact_type in {"project", "portfolio", "critique"}:
|
||||||
|
evaluators.append(PortfolioEvaluator())
|
||||||
|
return evaluators
|
||||||
|
|
||||||
|
|
||||||
|
def integrate_attempt(state: AgenticStudentState, attempt: LearnerAttempt) -> None:
|
||||||
|
results = run_pipeline(attempt, evaluator_set_for_attempt(attempt))
|
||||||
|
aggregated = aggregate(results)
|
||||||
|
weak = [dim for dim, score in aggregated.items() if score < 0.75]
|
||||||
|
mastered = len(aggregated) > 0 and all(score >= 0.75 for score in aggregated.values())
|
||||||
|
|
||||||
summary = ConceptEvidenceSummary(
|
summary = ConceptEvidenceSummary(
|
||||||
concept_key=concept,
|
concept_key=attempt.concept,
|
||||||
weak_dimensions=list(attempt["weak_dimensions"]),
|
weak_dimensions=weak,
|
||||||
mastered=bool(attempt["mastered"]),
|
mastered=mastered,
|
||||||
|
aggregated=aggregated,
|
||||||
|
evaluators=[r.evaluator_name for r in results],
|
||||||
)
|
)
|
||||||
state.evidence_state.summary_by_concept[concept] = summary
|
state.evidence_state.summary_by_concept[attempt.concept] = summary
|
||||||
if summary.mastered:
|
|
||||||
state.mastered_concepts.add(concept)
|
if mastered:
|
||||||
state.evidence_state.resurfaced_concepts.discard(concept)
|
state.mastered_concepts.add(attempt.concept)
|
||||||
|
state.evidence_state.resurfaced_concepts.discard(attempt.concept)
|
||||||
else:
|
else:
|
||||||
if concept in state.mastered_concepts:
|
if attempt.concept in state.mastered_concepts:
|
||||||
state.mastered_concepts.remove(concept)
|
state.mastered_concepts.remove(attempt.concept)
|
||||||
state.evidence_state.resurfaced_concepts.add(concept)
|
state.evidence_state.resurfaced_concepts.add(attempt.concept)
|
||||||
state.attempt_history.append(attempt)
|
|
||||||
|
state.attempt_history.append({
|
||||||
|
"concept": attempt.concept,
|
||||||
|
"artifact_type": attempt.artifact_type,
|
||||||
|
"aggregated": aggregated,
|
||||||
|
"weak_dimensions": weak,
|
||||||
|
"mastered": mastered,
|
||||||
|
"evaluators": [r.evaluator_name for r in results],
|
||||||
|
})
|
||||||
|
|
||||||
|
state.artifacts.append({
|
||||||
|
"concept": attempt.concept,
|
||||||
|
"artifact_type": attempt.artifact_type,
|
||||||
|
"artifact_name": attempt.metadata.get("artifact_name", f"{attempt.concept}.txt"),
|
||||||
|
})
|
||||||
|
|
||||||
|
|
||||||
def run_agentic_learning_loop(
|
def run_demo_agentic_loop(concepts: list[str]) -> AgenticStudentState:
|
||||||
graph,
|
|
||||||
project_catalog: list[dict],
|
|
||||||
target_concepts: list[str],
|
|
||||||
weights: PlannerWeights,
|
|
||||||
max_steps: int = 5,
|
|
||||||
) -> AgenticStudentState:
|
|
||||||
state = AgenticStudentState()
|
state = AgenticStudentState()
|
||||||
|
for concept in concepts:
|
||||||
for _ in range(max_steps):
|
attempt = synthetic_attempt_for_concept(concept)
|
||||||
weak_dimensions_by_concept = {
|
|
||||||
key: summary.weak_dimensions
|
|
||||||
for key, summary in state.evidence_state.summary_by_concept.items()
|
|
||||||
}
|
|
||||||
fragile = set(state.evidence_state.resurfaced_concepts)
|
|
||||||
|
|
||||||
ranked = rank_next_concepts(
|
|
||||||
graph=graph,
|
|
||||||
mastered=state.mastered_concepts,
|
|
||||||
targets=target_concepts,
|
|
||||||
weak_dimensions_by_concept=weak_dimensions_by_concept,
|
|
||||||
fragile_concepts=fragile,
|
|
||||||
project_catalog=project_catalog,
|
|
||||||
weights=weights,
|
|
||||||
)
|
|
||||||
if not ranked:
|
|
||||||
break
|
|
||||||
|
|
||||||
chosen = ranked[0]["concept"]
|
|
||||||
attempt = synthetic_attempt_for_concept(chosen)
|
|
||||||
integrate_attempt(state, attempt)
|
integrate_attempt(state, attempt)
|
||||||
|
|
||||||
if all(target in state.mastered_concepts for target in target_concepts):
|
|
||||||
break
|
|
||||||
|
|
||||||
return state
|
return state
|
||||||
|
|
|
||||||
|
|
@ -1,5 +1,6 @@
|
||||||
from dataclasses import dataclass, field
|
from dataclasses import dataclass, field
|
||||||
|
|
||||||
|
|
||||||
@dataclass
|
@dataclass
|
||||||
class LearnerAttempt:
|
class LearnerAttempt:
|
||||||
concept: str
|
concept: str
|
||||||
|
|
@ -7,6 +8,7 @@ class LearnerAttempt:
|
||||||
content: str
|
content: str
|
||||||
metadata: dict = field(default_factory=dict)
|
metadata: dict = field(default_factory=dict)
|
||||||
|
|
||||||
|
|
||||||
@dataclass
|
@dataclass
|
||||||
class EvaluatorResult:
|
class EvaluatorResult:
|
||||||
evaluator_name: str
|
evaluator_name: str
|
||||||
|
|
@ -14,59 +16,84 @@ class EvaluatorResult:
|
||||||
passed: bool | None = None
|
passed: bool | None = None
|
||||||
notes: str = ""
|
notes: str = ""
|
||||||
|
|
||||||
|
|
||||||
class RubricEvaluator:
|
class RubricEvaluator:
|
||||||
name = "rubric"
|
name = "rubric"
|
||||||
|
|
||||||
def evaluate(self, attempt: LearnerAttempt):
|
def evaluate(self, attempt: LearnerAttempt):
|
||||||
explanation = 0.85 if len(attempt.content) > 40 else 0.55
|
explanation = 0.85 if len(attempt.content.strip()) > 40 else 0.55
|
||||||
correctness = 0.80 if "because" in attempt.content.lower() else 0.65
|
correctness = 0.80 if "because" in attempt.content.lower() or "therefore" in attempt.content.lower() else 0.65
|
||||||
return EvaluatorResult(self.name,
|
return EvaluatorResult(
|
||||||
{"correctness": correctness,
|
self.name,
|
||||||
"explanation": explanation})
|
{"correctness": correctness, "explanation": explanation},
|
||||||
|
notes="Heuristic scaffold rubric score.",
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
class CodeTestEvaluator:
|
class CodeTestEvaluator:
|
||||||
name = "code_test"
|
name = "code_test"
|
||||||
|
|
||||||
def evaluate(self, attempt: LearnerAttempt):
|
def evaluate(self, attempt: LearnerAttempt):
|
||||||
passed = "return" in attempt.content
|
passed = "return" in attempt.content or "assert" in attempt.content
|
||||||
score = 0.9 if passed else 0.35
|
score = 0.9 if passed else 0.35
|
||||||
return EvaluatorResult(self.name,
|
return EvaluatorResult(
|
||||||
{"correctness": score,
|
self.name,
|
||||||
"project_execution": score},
|
{"correctness": score, "project_execution": score},
|
||||||
passed=passed)
|
passed=passed,
|
||||||
|
notes="Stub code/test evaluator.",
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
class SymbolicRuleEvaluator:
|
class SymbolicRuleEvaluator:
|
||||||
name = "symbolic_rule"
|
name = "symbolic_rule"
|
||||||
|
|
||||||
def evaluate(self, attempt: LearnerAttempt):
|
def evaluate(self, attempt: LearnerAttempt):
|
||||||
passed = "=" in attempt.content
|
passed = "=" in attempt.content or "therefore" in attempt.content.lower()
|
||||||
score = 0.88 if passed else 0.4
|
score = 0.88 if passed else 0.4
|
||||||
return EvaluatorResult(self.name,
|
return EvaluatorResult(
|
||||||
|
self.name,
|
||||||
{"correctness": score},
|
{"correctness": score},
|
||||||
passed=passed)
|
passed=passed,
|
||||||
|
notes="Stub symbolic evaluator.",
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
class CritiqueEvaluator:
|
class CritiqueEvaluator:
|
||||||
name = "critique"
|
name = "critique"
|
||||||
|
|
||||||
def evaluate(self, attempt: LearnerAttempt):
|
def evaluate(self, attempt: LearnerAttempt):
|
||||||
markers = ["assumption","bias","limitation","weakness"]
|
markers = ["assumption", "bias", "limitation", "weakness", "uncertain"]
|
||||||
found = sum(m in attempt.content.lower() for m in markers)
|
found = sum(m in attempt.content.lower() for m in markers)
|
||||||
score = min(1.0, 0.35 + 0.15 * found)
|
score = min(1.0, 0.35 + 0.15 * found)
|
||||||
return EvaluatorResult(self.name, {"critique": score})
|
return EvaluatorResult(
|
||||||
|
self.name,
|
||||||
|
{"critique": score},
|
||||||
|
notes="Stub critique evaluator.",
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
class PortfolioEvaluator:
|
class PortfolioEvaluator:
|
||||||
name = "portfolio"
|
name = "portfolio"
|
||||||
|
|
||||||
def evaluate(self, attempt: LearnerAttempt):
|
def evaluate(self, attempt: LearnerAttempt):
|
||||||
count = int(attempt.metadata.get("deliverable_count",1))
|
deliverable_count = int(attempt.metadata.get("deliverable_count", 1))
|
||||||
score = min(1.0, 0.5 + 0.1 * count)
|
score = min(1.0, 0.5 + 0.1 * deliverable_count)
|
||||||
return EvaluatorResult(self.name,
|
return EvaluatorResult(
|
||||||
{"project_execution": score,
|
self.name,
|
||||||
"transfer": max(0.4, score-0.1)})
|
{"project_execution": score, "transfer": max(0.4, score - 0.1)},
|
||||||
|
notes="Stub portfolio evaluator.",
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
def run_pipeline(attempt, evaluators):
|
def run_pipeline(attempt, evaluators):
|
||||||
return [e.evaluate(attempt) for e in evaluators]
|
return [e.evaluate(attempt) for e in evaluators]
|
||||||
|
|
||||||
|
|
||||||
def aggregate(results):
|
def aggregate(results):
|
||||||
totals = {}
|
totals = {}
|
||||||
counts = {}
|
counts = {}
|
||||||
for r in results:
|
for r in results:
|
||||||
for d,v in r.dimensions.items():
|
for dim, val in r.dimensions.items():
|
||||||
totals[d] = totals.get(d,0)+v
|
totals[dim] = totals.get(dim, 0.0) + val
|
||||||
counts[d] = counts.get(d,0)+1
|
counts[dim] = counts.get(dim, 0) + 1
|
||||||
return {d: totals[d]/counts[d] for d in totals}
|
return {dim: totals[dim] / counts[dim] for dim in totals}
|
||||||
|
|
|
||||||
|
|
@ -1,70 +1,49 @@
|
||||||
import argparse
|
import argparse
|
||||||
import os
|
|
||||||
from pathlib import Path
|
from pathlib import Path
|
||||||
|
|
||||||
from .agentic_loop import run_agentic_learning_loop
|
from .agentic_loop import run_demo_agentic_loop
|
||||||
from .artifact_registry import check_pack_dependencies, detect_dependency_cycles, discover_domain_packs
|
from .mastery_ledger import (
|
||||||
from .config import load_config
|
build_capability_profile,
|
||||||
from .graph_builder import build_concept_graph
|
export_capability_profile_json,
|
||||||
from .learning_graph import build_merged_learning_graph
|
export_capability_report_markdown,
|
||||||
from .planner import PlannerWeights
|
export_artifact_manifest,
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
def build_parser() -> argparse.ArgumentParser:
|
def build_parser() -> argparse.ArgumentParser:
|
||||||
parser = argparse.ArgumentParser(description="Didactopus agentic learner loop")
|
parser = argparse.ArgumentParser(description="Didactopus mastery ledger demo")
|
||||||
parser.add_argument("--target", default="bayes-extension::posterior")
|
parser.add_argument("--domain", default="Bayesian inference")
|
||||||
parser.add_argument("--steps", type=int, default=5)
|
parser.add_argument("--outdir", default="exports")
|
||||||
parser.add_argument("--config", default=os.environ.get("DIDACTOPUS_CONFIG", "configs/config.example.yaml"))
|
|
||||||
return parser
|
return parser
|
||||||
|
|
||||||
|
|
||||||
def main() -> None:
|
def main() -> None:
|
||||||
args = build_parser().parse_args()
|
args = build_parser().parse_args()
|
||||||
config = load_config(Path(args.config))
|
outdir = Path(args.outdir)
|
||||||
results = discover_domain_packs(["domain-packs"])
|
outdir.mkdir(parents=True, exist_ok=True)
|
||||||
dep_errors = check_pack_dependencies(results)
|
|
||||||
cycles = detect_dependency_cycles(results)
|
|
||||||
|
|
||||||
if dep_errors:
|
concepts = [
|
||||||
print("Dependency errors:")
|
"foundations-statistics::descriptive-statistics",
|
||||||
for err in dep_errors:
|
"foundations-statistics::probability-basics",
|
||||||
print(f"- {err}")
|
"bayes-extension::prior",
|
||||||
if cycles:
|
"bayes-extension::posterior",
|
||||||
print("Dependency cycles:")
|
"applied-inference::model-checking",
|
||||||
for cycle in cycles:
|
]
|
||||||
print(f"- {' -> '.join(cycle)}")
|
state = run_demo_agentic_loop(concepts)
|
||||||
return
|
profile = build_capability_profile(state, args.domain)
|
||||||
|
|
||||||
merged = build_merged_learning_graph(results, config.platform.default_dimension_thresholds)
|
json_path = outdir / "capability_profile.json"
|
||||||
graph = build_concept_graph(results, config.platform.default_dimension_thresholds)
|
md_path = outdir / "capability_report.md"
|
||||||
|
manifest_path = outdir / "artifact_manifest.json"
|
||||||
|
|
||||||
state = run_agentic_learning_loop(
|
export_capability_profile_json(profile, str(json_path))
|
||||||
graph=graph,
|
export_capability_report_markdown(profile, str(md_path))
|
||||||
project_catalog=merged.project_catalog,
|
export_artifact_manifest(profile, str(manifest_path))
|
||||||
target_concepts=[args.target],
|
|
||||||
weights=PlannerWeights(
|
|
||||||
readiness_bonus=config.planner.readiness_bonus,
|
|
||||||
target_distance_weight=config.planner.target_distance_weight,
|
|
||||||
weak_dimension_bonus=config.planner.weak_dimension_bonus,
|
|
||||||
fragile_review_bonus=config.planner.fragile_review_bonus,
|
|
||||||
project_unlock_bonus=config.planner.project_unlock_bonus,
|
|
||||||
semantic_similarity_weight=config.planner.semantic_similarity_weight,
|
|
||||||
),
|
|
||||||
max_steps=args.steps,
|
|
||||||
)
|
|
||||||
|
|
||||||
print("== Didactopus Agentic Learner Loop ==")
|
print("== Didactopus Mastery Ledger Demo ==")
|
||||||
print(f"Target: {args.target}")
|
print(f"Domain: {args.domain}")
|
||||||
print(f"Steps executed: {len(state.attempt_history)}")
|
print(f"Mastered concepts: {len(profile.mastered_concepts)}")
|
||||||
print()
|
print(f"Artifacts: {len(profile.artifacts)}")
|
||||||
print("Mastered concepts:")
|
print(f"Capability profile JSON: {json_path}")
|
||||||
if state.mastered_concepts:
|
print(f"Capability report Markdown: {md_path}")
|
||||||
for item in sorted(state.mastered_concepts):
|
print(f"Artifact manifest JSON: {manifest_path}")
|
||||||
print(f"- {item}")
|
|
||||||
else:
|
|
||||||
print("- none")
|
|
||||||
print()
|
|
||||||
print("Attempt history:")
|
|
||||||
for item in state.attempt_history:
|
|
||||||
weak = ", ".join(item["weak_dimensions"]) if item["weak_dimensions"] else "none"
|
|
||||||
print(f"- {item['concept']}: mastered={item['mastered']}, weak={weak}")
|
|
||||||
|
|
|
||||||
|
|
@ -0,0 +1,78 @@
|
||||||
|
from dataclasses import dataclass, field, asdict
|
||||||
|
from pathlib import Path
|
||||||
|
import json
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class CapabilityProfile:
|
||||||
|
learner_id: str
|
||||||
|
display_name: str
|
||||||
|
domain: str
|
||||||
|
mastered_concepts: list[str] = field(default_factory=list)
|
||||||
|
weak_dimensions_by_concept: dict[str, list[str]] = field(default_factory=dict)
|
||||||
|
evaluator_summary_by_concept: dict[str, dict] = field(default_factory=dict)
|
||||||
|
artifacts: list[dict] = field(default_factory=list)
|
||||||
|
|
||||||
|
|
||||||
|
def build_capability_profile(state, domain: str) -> CapabilityProfile:
|
||||||
|
weak = {}
|
||||||
|
summaries = {}
|
||||||
|
for concept, summary in state.evidence_state.summary_by_concept.items():
|
||||||
|
weak[concept] = list(summary.weak_dimensions)
|
||||||
|
summaries[concept] = dict(summary.aggregated)
|
||||||
|
return CapabilityProfile(
|
||||||
|
learner_id=state.learner_id,
|
||||||
|
display_name=state.display_name,
|
||||||
|
domain=domain,
|
||||||
|
mastered_concepts=sorted(state.mastered_concepts),
|
||||||
|
weak_dimensions_by_concept=weak,
|
||||||
|
evaluator_summary_by_concept=summaries,
|
||||||
|
artifacts=list(state.artifacts),
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
def export_capability_profile_json(profile: CapabilityProfile, path: str) -> None:
|
||||||
|
Path(path).write_text(json.dumps(asdict(profile), indent=2), encoding="utf-8")
|
||||||
|
|
||||||
|
|
||||||
|
def export_capability_report_markdown(profile: CapabilityProfile, path: str) -> None:
|
||||||
|
lines = [
|
||||||
|
f"# Capability Profile: {profile.display_name}",
|
||||||
|
"",
|
||||||
|
f"- Learner ID: `{profile.learner_id}`",
|
||||||
|
f"- Domain: `{profile.domain}`",
|
||||||
|
"",
|
||||||
|
"## Mastered Concepts",
|
||||||
|
]
|
||||||
|
if profile.mastered_concepts:
|
||||||
|
lines.extend([f"- {c}" for c in profile.mastered_concepts])
|
||||||
|
else:
|
||||||
|
lines.append("- none")
|
||||||
|
lines.extend(["", "## Concept Summaries"])
|
||||||
|
if profile.evaluator_summary_by_concept:
|
||||||
|
for concept, dims in sorted(profile.evaluator_summary_by_concept.items()):
|
||||||
|
lines.append(f"### {concept}")
|
||||||
|
if dims:
|
||||||
|
for dim, score in sorted(dims.items()):
|
||||||
|
lines.append(f"- {dim}: {score:.2f}")
|
||||||
|
weak = profile.weak_dimensions_by_concept.get(concept, [])
|
||||||
|
lines.append(f"- weak dimensions: {', '.join(weak) if weak else 'none'}")
|
||||||
|
lines.append("")
|
||||||
|
else:
|
||||||
|
lines.append("- none")
|
||||||
|
lines.extend(["## Artifacts"])
|
||||||
|
if profile.artifacts:
|
||||||
|
for art in profile.artifacts:
|
||||||
|
lines.append(f"- {art['artifact_name']} ({art['artifact_type']}) for {art['concept']}")
|
||||||
|
else:
|
||||||
|
lines.append("- none")
|
||||||
|
Path(path).write_text("\n".join(lines), encoding="utf-8")
|
||||||
|
|
||||||
|
|
||||||
|
def export_artifact_manifest(profile: CapabilityProfile, path: str) -> None:
|
||||||
|
manifest = {
|
||||||
|
"learner_id": profile.learner_id,
|
||||||
|
"domain": profile.domain,
|
||||||
|
"artifacts": profile.artifacts,
|
||||||
|
}
|
||||||
|
Path(path).write_text(json.dumps(manifest, indent=2), encoding="utf-8")
|
||||||
|
|
@ -0,0 +1,43 @@
|
||||||
|
from pathlib import Path
|
||||||
|
import json
|
||||||
|
|
||||||
|
from didactopus.agentic_loop import run_demo_agentic_loop
|
||||||
|
from didactopus.mastery_ledger import (
|
||||||
|
build_capability_profile,
|
||||||
|
export_capability_profile_json,
|
||||||
|
export_capability_report_markdown,
|
||||||
|
export_artifact_manifest,
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
def test_build_capability_profile() -> None:
|
||||||
|
state = run_demo_agentic_loop([
|
||||||
|
"foundations-statistics::descriptive-statistics",
|
||||||
|
"bayes-extension::prior",
|
||||||
|
])
|
||||||
|
profile = build_capability_profile(state, "Bayesian inference")
|
||||||
|
assert profile.domain == "Bayesian inference"
|
||||||
|
assert len(profile.artifacts) == 2
|
||||||
|
|
||||||
|
|
||||||
|
def test_exports(tmp_path: Path) -> None:
|
||||||
|
state = run_demo_agentic_loop([
|
||||||
|
"foundations-statistics::descriptive-statistics",
|
||||||
|
"bayes-extension::prior",
|
||||||
|
])
|
||||||
|
profile = build_capability_profile(state, "Bayesian inference")
|
||||||
|
|
||||||
|
json_path = tmp_path / "capability_profile.json"
|
||||||
|
md_path = tmp_path / "capability_report.md"
|
||||||
|
manifest_path = tmp_path / "artifact_manifest.json"
|
||||||
|
|
||||||
|
export_capability_profile_json(profile, str(json_path))
|
||||||
|
export_capability_report_markdown(profile, str(md_path))
|
||||||
|
export_artifact_manifest(profile, str(manifest_path))
|
||||||
|
|
||||||
|
assert json_path.exists()
|
||||||
|
assert md_path.exists()
|
||||||
|
assert manifest_path.exists()
|
||||||
|
|
||||||
|
data = json.loads(json_path.read_text(encoding="utf-8"))
|
||||||
|
assert data["domain"] == "Bayesian inference"
|
||||||
Loading…
Reference in New Issue