Revised multilignual support with round-trip warnings, updated philosophy.

This commit is contained in:
welsberr 2026-03-17 20:36:38 -04:00
parent 34b60ac529
commit 58466bbf9f
29 changed files with 1208 additions and 137 deletions

View File

@ -49,6 +49,8 @@ In practice, that means Didactopus tries to help with:
It explicitly tries not to become a silent answer surrogate. It explicitly tries not to become a silent answer surrogate.
The project is also being advanced with a future-compatibility constraint: avoid choices that assume abundant compute, fluent English, expert supervision, or only mature learners. That keeps the current roadmap moving while preserving eventual usefulness for more constrained and equity-sensitive educational settings.
## Who It Is For ## Who It Is For
Didactopus has several real audiences: Didactopus has several real audiences:
@ -74,6 +76,13 @@ Current priorities are:
The live detailed roadmap is in: The live detailed roadmap is in:
- `docs/roadmap.md` - `docs/roadmap.md`
- `docs/multilingual-qa.md`
Didactopus can also generate a starter multilingual QA draft from a pack:
```bash
python -m didactopus.multilingual_qa_seed domain-packs/mit-ocw-information-entropy
```
## Start Here If You Just Want To Learn ## Start Here If You Just Want To Learn

View File

@ -84,6 +84,12 @@ The arena currently writes:
- `arena_review_queue.json` - `arena_review_queue.json`
- `arena_report.md` - `arena_report.md`
When a candidate sets a non-English `language`, the arena now also tracks a heuristic `multilingual_score` alongside the grounded behavior score. This is meant to catch obvious failures where a model ignores the requested output language or drops key grounded terms.
If the pack provides `multilingual_qa.yaml`, the arena also uses that spec to check required terms, required caveats, and forbidden confusions for the target language.
For non-English candidates, the arena now also records round-trip warnings by back-translating outputs into English and checking whether required source phrases remain recoverable.
## Human Review Position ## Human Review Position
The LLM review summary should be treated as initial triage support only. The LLM review summary should be treated as initial triage support only.

View File

@ -26,6 +26,7 @@ The HTML output is meant to be screen-reader-friendly and keyboard-friendly:
- semantic headings - semantic headings
- reading-order sections for study plan, conversation, and evaluation - reading-order sections for study plan, conversation, and evaluation
- grounded source fragments rendered as ordinary text instead of only visual diagrams - grounded source fragments rendered as ordinary text instead of only visual diagrams
- deterministic learner-facing labels localized for supported output languages
The plain-text output is a linearized learner-session transcript that is suitable for: The plain-text output is a linearized learner-session transcript that is suitable for:

View File

@ -89,6 +89,15 @@ The current heuristic scoring asks whether each role does the right kind of work
This is deliberately narrower than a general-purpose benchmark. Didactopus cares about trustworthy learner guidance, not maximal generic fluency. This is deliberately narrower than a general-purpose benchmark. Didactopus cares about trustworthy learner guidance, not maximal generic fluency.
When `--language` is set to a non-English value, the benchmark now also applies a heuristic multilingual check:
- does the response appear to actually be in the target language?
- does it still preserve key grounded concept terms and caveats?
If the pack provides `multilingual_qa.yaml`, the benchmark also applies per-pack preservation checks from that spec.
For non-English runs, the benchmark now also records a round-trip warning layer by back-translating role outputs into English and checking whether required source phrases are still recoverable. This is a warning-oriented signal, not a proof of correctness.
## Interpreting Ratings ## Interpreting Ratings
- `adequate` - `adequate`

80
docs/multilingual-qa.md Normal file
View File

@ -0,0 +1,80 @@
# Multilingual QA
Didactopus now supports an optional per-pack multilingual QA spec.
The goal is not to certify perfect translation quality. The goal is to make multilingual evaluation less dependent on vague fluency judgments by checking whether key terms, caveats, and forbidden confusions survive across languages.
## Spec File
Place this file in a pack directory:
- `multilingual_qa.yaml`
It is currently optional.
## Current Shape
```yaml
source_language: en
targets:
es:
required_terms:
- id: shannon-entropy
accepted:
- "entropía de shannon"
required_caveats:
- id: shannon-vs-thermo-not-identical
accepted:
- "no es idéntica"
forbidden_confusions:
- id: shannon-equals-thermodynamic-entropy
patterns:
- "es idéntica a la entropía termodinámica"
```
## Starter Generation
Didactopus can now generate a draft starter spec for reviewer refinement:
```bash
python -m didactopus.multilingual_qa_seed domain-packs/mit-ocw-information-entropy \
--out domain-packs/mit-ocw-information-entropy/multilingual_qa.seed.yaml \
--languages es fr
```
The generated `multilingual_qa.seed.yaml` is not meant for immediate trust. It is a reviewer aid that pulls:
- multi-word concept titles as draft required terms
- likely caveat candidates from grounded source fragments
- likely forbidden confusions derived from negated caveat language
## What It Checks
For a target language, the QA layer can check:
- required terms that should appear in acceptable translated or multilingual output
- required caveats that must survive explanation
- forbidden confusions that should trigger warnings
## Where It Is Used
This spec now feeds:
- the local model benchmark
- the Didactopus arena
Those tools still use heuristic scoring, but multilingual QA spec checks now contribute an explicit preservation signal.
## Why This Helps
This gives Didactopus a better layered multilingual evaluation model:
1. language-alignment heuristics
2. term and caveat preservation checks
3. round-trip warning checks on required phrases
4. arena comparison and LLM review support
5. human bilingual review for promoted or disputed outputs
## Current Limitation
This is still a lightweight preservation framework. It does not yet prove semantic equivalence across whole explanations. It is best treated as an early QA filter and promotion aid.

View File

@ -190,6 +190,7 @@ Examples:
- Prefer role-adequate local models over chasing a single best model. - Prefer role-adequate local models over chasing a single best model.
- Keep accessibility and low-cost deployment in scope from the start, not as cleanup work. - Keep accessibility and low-cost deployment in scope from the start, not as cleanup work.
- Preserve provenance and license compliance as first-class constraints. - Preserve provenance and license compliance as first-class constraints.
- Advance the current roadmap without assuming abundant compute, fluent English, expert supervision, or mature learners.
## Suggested Implementation Sequence ## Suggested Implementation Sequence

View File

@ -0,0 +1,60 @@
source_language: en
generated_by: didactopus.multilingual_qa_seed
review_status: draft-seed
targets:
es:
required_terms: &id001
- id: mit-ocw-6-050j-information-and-entropy-course-home
accepted:
- MIT OCW 6.050J Information and Entropy Course Home
- id: information-and-entropy
accepted:
- Information and Entropy
- id: ultimate-limits-to-communication-and-computation
accepted:
- Ultimate Limits to Communication and Computation
- id: open-textbooks-problem-sets-and-programming-work
accepted:
- Open Textbooks, Problem Sets, and Programming Work
- id: mit-ocw-6-050j-information-and-entropy-syllabus
accepted:
- MIT OCW 6.050J Information and Entropy Syllabus
- id: prerequisites-and-mathematical-background
accepted:
- Prerequisites and Mathematical Background
- id: assessment-structure
accepted:
- Assessment Structure
- id: course-notes-and-reference-texts
accepted:
- Course Notes and Reference Texts
- id: independent-reasoning-and-careful-comparison
accepted:
- Independent Reasoning and Careful Comparison
- id: mit-ocw-6-050j-information-and-entropy-unit-sequence
accepted:
- MIT OCW 6.050J Information and Entropy Unit Sequence
- id: counting-and-probability
accepted:
- Counting and Probability
- id: shannon-entropy
accepted:
- Shannon Entropy
required_caveats: &id002
- id: thermodynamics-and-entropy
accepted:
- Objective Explain how thermodynamic entropy relates to, and differs from,
Shannon entropy. Exercise Compare the two entropy notions and identify what
is preserved across the analogy. The course uses entropy as a bridge concept
between communication theory and physics while insisting on careful interpretation.
forbidden_confusions: &id003
- id: thermodynamics-and-entropy-confusion
patterns:
- Objective Explain how thermodynamic entropy relates to, and is identical to,
Shannon entropy. Exercise Compare the two entropy notions and identify what
is preserved across the analogy. The course uses entropy as a bridge concept
between communication theory and physics while insisting on careful interpretation.
fr:
required_terms: *id001
required_caveats: *id002
forbidden_confusions: *id003

View File

@ -0,0 +1,59 @@
source_language: en
targets:
es:
required_terms:
- id: shannon-entropy
accepted:
- "entropia"
- "entropía"
- "entropia de shannon"
- "entropía de shannon"
- id: channel-capacity
accepted:
- "capacidad del canal"
- "capacidad de canal"
- id: thermodynamic-entropy
accepted:
- "entropia termodinamica"
- "entropía termodinámica"
required_caveats:
- id: shannon-vs-thermo-not-identical
accepted:
- "no es identica"
- "no es idéntica"
- "no son identicas"
- "no son idénticas"
- "no equivale exactamente"
forbidden_confusions:
- id: shannon-equals-thermodynamic-entropy
patterns:
- "es identica a la entropia termodinamica"
- "es idéntica a la entropía termodinámica"
- "son identicas"
- "son idénticas"
fr:
required_terms:
- id: shannon-entropy
accepted:
- "entropie"
- "entropie de shannon"
- id: channel-capacity
accepted:
- "capacite du canal"
- "capacité du canal"
- id: thermodynamic-entropy
accepted:
- "entropie thermodynamique"
required_caveats:
- id: shannon-vs-thermo-not-identical
accepted:
- "n'est pas identique"
- "ne sont pas identiques"
- "n'est pas equivalente"
- "n'est pas équivalente"
forbidden_confusions:
- id: shannon-equals-thermodynamic-entropy
patterns:
- "est identique a l'entropie thermodynamique"
- "est identique à l'entropie thermodynamique"
- "sont identiques"

View File

@ -3,9 +3,9 @@
- Candidates: 3 - Candidates: 3
## Rankings ## Rankings
- `stub-baseline` via `stub` / prompt variant `baseline`: borderline (0.667), language `en` - `stub-baseline` via `stub` / prompt variant `baseline`: borderline (0.733), language `en`
- `stub-strict-grounding` via `stub` / prompt variant `strict_grounding`: borderline (0.667), language `es` - `stub-strict-grounding` via `stub` / prompt variant `strict_grounding`: inadequate (0.547), language `es`
- `stub-trust-preserving` via `stub` / prompt variant `trust_preserving`: borderline (0.667), language `fr` - `stub-trust-preserving` via `stub` / prompt variant `trust_preserving`: inadequate (0.547), language `fr`
## Human Review Queue ## Human Review Queue
- `stub-baseline`: needs_human_review=True, weak_roles=['mentor', 'evaluator'] - `stub-baseline`: needs_human_review=True, weak_roles=['mentor', 'evaluator']

View File

@ -10,7 +10,7 @@
"prompt_variant": "baseline", "prompt_variant": "baseline",
"language": "en", "language": "en",
"provider": "stub", "provider": "stub",
"overall_score": 0.667, "overall_score": 0.733,
"overall_rating": "borderline", "overall_rating": "borderline",
"role_results": [ "role_results": [
{ {
@ -19,9 +19,11 @@
"model_name": "local-demo", "model_name": "local-demo",
"prompt_variant": "baseline", "prompt_variant": "baseline",
"language": "en", "language": "en",
"latency_ms": 0.027, "latency_ms": 0.021,
"adequacy_score": 0.65, "adequacy_score": 0.72,
"adequacy_rating": "borderline", "adequacy_rating": "borderline",
"grounded_score": 0.65,
"multilingual_score": 1.0,
"response_preview": "[stubbed-response] [mentor] Concept: Independent Reasoning and Careful Comparison Prerequisites: Course Notes and Reference Texts Supporting lessons", "response_preview": "[stubbed-response] [mentor] Concept: Independent Reasoning and Careful Comparison Prerequisites: Course Notes and Reference Texts Supporting lessons",
"notes": [ "notes": [
"Did not ask a focused learner question." "Did not ask a focused learner question."
@ -33,9 +35,11 @@
"model_name": "local-demo", "model_name": "local-demo",
"prompt_variant": "baseline", "prompt_variant": "baseline",
"language": "en", "language": "en",
"latency_ms": 0.006, "latency_ms": 0.005,
"adequacy_score": 1.0, "adequacy_score": 1.0,
"adequacy_rating": "adequate", "adequacy_rating": "adequate",
"grounded_score": 1.0,
"multilingual_score": 1.0,
"response_preview": "[stubbed-response] [practice] Concept: Independent Reasoning and Careful Comparison Prerequisites: Course Notes and Reference Texts Supporting lessons", "response_preview": "[stubbed-response] [practice] Concept: Independent Reasoning and Careful Comparison Prerequisites: Course Notes and Reference Texts Supporting lessons",
"notes": [] "notes": []
}, },
@ -45,9 +49,11 @@
"model_name": "local-demo", "model_name": "local-demo",
"prompt_variant": "baseline", "prompt_variant": "baseline",
"language": "en", "language": "en",
"latency_ms": 0.005, "latency_ms": 0.004,
"adequacy_score": 0.35, "adequacy_score": 0.48,
"adequacy_rating": "inadequate", "adequacy_rating": "inadequate",
"grounded_score": 0.35,
"multilingual_score": 1.0,
"response_preview": "[stubbed-response] [evaluator] Concept: Independent Reasoning and Careful Comparison Prerequisites: Course Notes and Reference Texts Supporting lessons", "response_preview": "[stubbed-response] [evaluator] Concept: Independent Reasoning and Careful Comparison Prerequisites: Course Notes and Reference Texts Supporting lessons",
"notes": [ "notes": [
"Did not acknowledge learner strengths.", "Did not acknowledge learner strengths.",
@ -62,8 +68,8 @@
"prompt_variant": "strict_grounding", "prompt_variant": "strict_grounding",
"language": "es", "language": "es",
"provider": "stub", "provider": "stub",
"overall_score": 0.667, "overall_score": 0.547,
"overall_rating": "borderline", "overall_rating": "inadequate",
"role_results": [ "role_results": [
{ {
"role": "mentor", "role": "mentor",
@ -71,12 +77,20 @@
"model_name": "local-demo", "model_name": "local-demo",
"prompt_variant": "strict_grounding", "prompt_variant": "strict_grounding",
"language": "es", "language": "es",
"latency_ms": 0.019, "latency_ms": 0.028,
"adequacy_score": 0.65, "adequacy_score": 0.52,
"adequacy_rating": "borderline", "adequacy_rating": "inadequate",
"grounded_score": 0.65,
"multilingual_score": 0.0,
"response_preview": "[stubbed-response] [mentor] Concept: Independent Reasoning and Careful Comparison Prerequisites: Course Notes and Reference Texts Supporting lessons", "response_preview": "[stubbed-response] [mentor] Concept: Independent Reasoning and Careful Comparison Prerequisites: Course Notes and Reference Texts Supporting lessons",
"notes": [ "notes": [
"Did not ask a focused learner question." "Did not ask a focused learner question.",
"Response does not appear to be in Spanish.",
"Missing required multilingual term 'shannon-entropy' for language 'es'.",
"Missing required multilingual term 'channel-capacity' for language 'es'.",
"Missing required multilingual term 'thermodynamic-entropy' for language 'es'.",
"Missing required multilingual caveat 'shannon-vs-thermo-not-identical' for language 'es'.",
"Did not visibly preserve a key grounded concept term in multilingual output."
] ]
}, },
{ {
@ -85,11 +99,19 @@
"model_name": "local-demo", "model_name": "local-demo",
"prompt_variant": "strict_grounding", "prompt_variant": "strict_grounding",
"language": "es", "language": "es",
"latency_ms": 0.005, "latency_ms": 0.006,
"adequacy_score": 1.0, "adequacy_score": 0.82,
"adequacy_rating": "adequate", "adequacy_rating": "adequate",
"grounded_score": 1.0,
"multilingual_score": 0.1,
"response_preview": "[stubbed-response] [practice] Concept: Independent Reasoning and Careful Comparison Prerequisites: Course Notes and Reference Texts Supporting lessons", "response_preview": "[stubbed-response] [practice] Concept: Independent Reasoning and Careful Comparison Prerequisites: Course Notes and Reference Texts Supporting lessons",
"notes": [] "notes": [
"Response does not appear to be in Spanish.",
"Missing required multilingual term 'shannon-entropy' for language 'es'.",
"Missing required multilingual term 'channel-capacity' for language 'es'.",
"Missing required multilingual term 'thermodynamic-entropy' for language 'es'.",
"Missing required multilingual caveat 'shannon-vs-thermo-not-identical' for language 'es'."
]
}, },
{ {
"role": "evaluator", "role": "evaluator",
@ -97,13 +119,20 @@
"model_name": "local-demo", "model_name": "local-demo",
"prompt_variant": "strict_grounding", "prompt_variant": "strict_grounding",
"language": "es", "language": "es",
"latency_ms": 0.004, "latency_ms": 0.006,
"adequacy_score": 0.35, "adequacy_score": 0.3,
"adequacy_rating": "inadequate", "adequacy_rating": "inadequate",
"grounded_score": 0.35,
"multilingual_score": 0.1,
"response_preview": "[stubbed-response] [evaluator] Concept: Independent Reasoning and Careful Comparison Prerequisites: Course Notes and Reference Texts Supporting lessons", "response_preview": "[stubbed-response] [evaluator] Concept: Independent Reasoning and Careful Comparison Prerequisites: Course Notes and Reference Texts Supporting lessons",
"notes": [ "notes": [
"Did not acknowledge learner strengths.", "Did not acknowledge learner strengths.",
"Did not provide a concrete next step." "Did not provide a concrete next step.",
"Response does not appear to be in Spanish.",
"Missing required multilingual term 'shannon-entropy' for language 'es'.",
"Missing required multilingual term 'channel-capacity' for language 'es'.",
"Missing required multilingual term 'thermodynamic-entropy' for language 'es'.",
"Missing required multilingual caveat 'shannon-vs-thermo-not-identical' for language 'es'."
] ]
} }
] ]
@ -114,8 +143,8 @@
"prompt_variant": "trust_preserving", "prompt_variant": "trust_preserving",
"language": "fr", "language": "fr",
"provider": "stub", "provider": "stub",
"overall_score": 0.667, "overall_score": 0.547,
"overall_rating": "borderline", "overall_rating": "inadequate",
"role_results": [ "role_results": [
{ {
"role": "mentor", "role": "mentor",
@ -123,12 +152,20 @@
"model_name": "local-demo", "model_name": "local-demo",
"prompt_variant": "trust_preserving", "prompt_variant": "trust_preserving",
"language": "fr", "language": "fr",
"latency_ms": 0.025, "latency_ms": 0.024,
"adequacy_score": 0.65, "adequacy_score": 0.52,
"adequacy_rating": "borderline", "adequacy_rating": "inadequate",
"grounded_score": 0.65,
"multilingual_score": 0.0,
"response_preview": "[stubbed-response] [mentor] Concept: Independent Reasoning and Careful Comparison Prerequisites: Course Notes and Reference Texts Supporting lessons", "response_preview": "[stubbed-response] [mentor] Concept: Independent Reasoning and Careful Comparison Prerequisites: Course Notes and Reference Texts Supporting lessons",
"notes": [ "notes": [
"Did not ask a focused learner question." "Did not ask a focused learner question.",
"Response does not appear to be in French.",
"Missing required multilingual term 'shannon-entropy' for language 'fr'.",
"Missing required multilingual term 'channel-capacity' for language 'fr'.",
"Missing required multilingual term 'thermodynamic-entropy' for language 'fr'.",
"Missing required multilingual caveat 'shannon-vs-thermo-not-identical' for language 'fr'.",
"Did not visibly preserve a key grounded concept term in multilingual output."
] ]
}, },
{ {
@ -137,11 +174,19 @@
"model_name": "local-demo", "model_name": "local-demo",
"prompt_variant": "trust_preserving", "prompt_variant": "trust_preserving",
"language": "fr", "language": "fr",
"latency_ms": 0.005, "latency_ms": 0.006,
"adequacy_score": 1.0, "adequacy_score": 0.82,
"adequacy_rating": "adequate", "adequacy_rating": "adequate",
"grounded_score": 1.0,
"multilingual_score": 0.1,
"response_preview": "[stubbed-response] [practice] Concept: Independent Reasoning and Careful Comparison Prerequisites: Course Notes and Reference Texts Supporting lessons", "response_preview": "[stubbed-response] [practice] Concept: Independent Reasoning and Careful Comparison Prerequisites: Course Notes and Reference Texts Supporting lessons",
"notes": [] "notes": [
"Response does not appear to be in French.",
"Missing required multilingual term 'shannon-entropy' for language 'fr'.",
"Missing required multilingual term 'channel-capacity' for language 'fr'.",
"Missing required multilingual term 'thermodynamic-entropy' for language 'fr'.",
"Missing required multilingual caveat 'shannon-vs-thermo-not-identical' for language 'fr'."
]
}, },
{ {
"role": "evaluator", "role": "evaluator",
@ -150,12 +195,19 @@
"prompt_variant": "trust_preserving", "prompt_variant": "trust_preserving",
"language": "fr", "language": "fr",
"latency_ms": 0.005, "latency_ms": 0.005,
"adequacy_score": 0.35, "adequacy_score": 0.3,
"adequacy_rating": "inadequate", "adequacy_rating": "inadequate",
"grounded_score": 0.35,
"multilingual_score": 0.1,
"response_preview": "[stubbed-response] [evaluator] Concept: Independent Reasoning and Careful Comparison Prerequisites: Course Notes and Reference Texts Supporting lessons", "response_preview": "[stubbed-response] [evaluator] Concept: Independent Reasoning and Careful Comparison Prerequisites: Course Notes and Reference Texts Supporting lessons",
"notes": [ "notes": [
"Did not acknowledge learner strengths.", "Did not acknowledge learner strengths.",
"Did not provide a concrete next step." "Did not provide a concrete next step.",
"Response does not appear to be in French.",
"Missing required multilingual term 'shannon-entropy' for language 'fr'.",
"Missing required multilingual term 'channel-capacity' for language 'fr'.",
"Missing required multilingual term 'thermodynamic-entropy' for language 'fr'.",
"Missing required multilingual caveat 'shannon-vs-thermo-not-identical' for language 'fr'."
] ]
} }
] ]
@ -165,7 +217,7 @@
{ {
"candidate_name": "stub-baseline", "candidate_name": "stub-baseline",
"overall_rating": "borderline", "overall_rating": "borderline",
"overall_score": 0.667, "overall_score": 0.733,
"needs_human_review": true, "needs_human_review": true,
"weak_roles": [ "weak_roles": [
"mentor", "mentor",
@ -174,8 +226,8 @@
}, },
{ {
"candidate_name": "stub-strict-grounding", "candidate_name": "stub-strict-grounding",
"overall_rating": "borderline", "overall_rating": "inadequate",
"overall_score": 0.667, "overall_score": 0.547,
"needs_human_review": true, "needs_human_review": true,
"weak_roles": [ "weak_roles": [
"mentor", "mentor",
@ -184,8 +236,8 @@
}, },
{ {
"candidate_name": "stub-trust-preserving", "candidate_name": "stub-trust-preserving",
"overall_rating": "borderline", "overall_rating": "inadequate",
"overall_score": 0.667, "overall_score": 0.547,
"needs_human_review": true, "needs_human_review": true,
"weak_roles": [ "weak_roles": [
"mentor", "mentor",

View File

@ -2,7 +2,7 @@
{ {
"candidate_name": "stub-baseline", "candidate_name": "stub-baseline",
"overall_rating": "borderline", "overall_rating": "borderline",
"overall_score": 0.667, "overall_score": 0.733,
"needs_human_review": true, "needs_human_review": true,
"weak_roles": [ "weak_roles": [
"mentor", "mentor",
@ -11,8 +11,8 @@
}, },
{ {
"candidate_name": "stub-strict-grounding", "candidate_name": "stub-strict-grounding",
"overall_rating": "borderline", "overall_rating": "inadequate",
"overall_score": 0.667, "overall_score": 0.547,
"needs_human_review": true, "needs_human_review": true,
"weak_roles": [ "weak_roles": [
"mentor", "mentor",
@ -21,8 +21,8 @@
}, },
{ {
"candidate_name": "stub-trust-preserving", "candidate_name": "stub-trust-preserving",
"overall_rating": "borderline", "overall_rating": "inadequate",
"overall_score": 0.667, "overall_score": 0.547,
"needs_human_review": true, "needs_human_review": true,
"weak_roles": [ "weak_roles": [
"mentor", "mentor",

View File

@ -0,0 +1,152 @@
{
"benchmark": {
"name": "didactopus-local-model-adequacy",
"task_family": "graph-grounded-mentor-loop",
"provider": "stub",
"hardware_profile": {
"profile_name": "unspecified-local",
"cpu": "unknown",
"ram_gb": null,
"notes": ""
}
},
"context": {
"skill_name": "ocw-information-entropy-agent",
"study_plan_task": "Help a learner connect Shannon entropy, channel capacity, and thermodynamic entropy.",
"primary_concept": "Independent Reasoning and Careful Comparison",
"secondary_concept": "Thermodynamics and Entropy",
"source_language": "en",
"output_language": "es"
},
"role_results": [
{
"role": "mentor",
"provider": "stub",
"model_name": "local-demo",
"latency_ms": 0.025,
"response_preview": "[stubbed-response] [mentor] Concept: Independent Reasoning and Careful Comparison Prerequisites: Course Notes and Reference Texts Supporting lessons",
"adequacy_score": 0.52,
"adequacy_rating": "inadequate",
"grounded_score": 0.65,
"multilingual_score": 0.0,
"round_trip": {
"warnings": [
"Round-trip translation did not preserve source phrase 'entropia'.",
"Round-trip translation did not preserve source phrase 'capacidad del canal'.",
"Round-trip translation did not preserve source phrase 'entropia termodinamica'.",
"Round-trip translation did not preserve source phrase 'no es identica'."
],
"summary": {
"source_phrase_count": 4,
"round_trip_warning_count": 4,
"drifted_phrases": [
"entropia",
"capacidad del canal",
"entropia termodinamica",
"no es identica"
]
}
},
"notes": [
"Did not ask a focused learner question.",
"Response does not appear to be in Spanish.",
"Missing required multilingual term 'shannon-entropy' for language 'es'.",
"Missing required multilingual term 'channel-capacity' for language 'es'.",
"Missing required multilingual term 'thermodynamic-entropy' for language 'es'.",
"Missing required multilingual caveat 'shannon-vs-thermo-not-identical' for language 'es'.",
"Did not visibly preserve a key grounded concept term in multilingual output.",
"Round-trip translation did not preserve source phrase 'entropia'.",
"Round-trip translation did not preserve source phrase 'capacidad del canal'.",
"Round-trip translation did not preserve source phrase 'entropia termodinamica'.",
"Round-trip translation did not preserve source phrase 'no es identica'."
]
},
{
"role": "practice",
"provider": "stub",
"model_name": "local-demo",
"latency_ms": 0.004,
"response_preview": "[stubbed-response] [practice] Concept: Independent Reasoning and Careful Comparison Prerequisites: Course Notes and Reference Texts Supporting lessons",
"adequacy_score": 0.82,
"adequacy_rating": "adequate",
"grounded_score": 1.0,
"multilingual_score": 0.1,
"round_trip": {
"warnings": [
"Round-trip translation did not preserve source phrase 'entropia'.",
"Round-trip translation did not preserve source phrase 'capacidad del canal'.",
"Round-trip translation did not preserve source phrase 'entropia termodinamica'.",
"Round-trip translation did not preserve source phrase 'no es identica'."
],
"summary": {
"source_phrase_count": 4,
"round_trip_warning_count": 4,
"drifted_phrases": [
"entropia",
"capacidad del canal",
"entropia termodinamica",
"no es identica"
]
}
},
"notes": [
"Response does not appear to be in Spanish.",
"Missing required multilingual term 'shannon-entropy' for language 'es'.",
"Missing required multilingual term 'channel-capacity' for language 'es'.",
"Missing required multilingual term 'thermodynamic-entropy' for language 'es'.",
"Missing required multilingual caveat 'shannon-vs-thermo-not-identical' for language 'es'.",
"Round-trip translation did not preserve source phrase 'entropia'.",
"Round-trip translation did not preserve source phrase 'capacidad del canal'.",
"Round-trip translation did not preserve source phrase 'entropia termodinamica'.",
"Round-trip translation did not preserve source phrase 'no es identica'."
]
},
{
"role": "evaluator",
"provider": "stub",
"model_name": "local-demo",
"latency_ms": 0.004,
"response_preview": "[stubbed-response] [evaluator] Concept: Independent Reasoning and Careful Comparison Prerequisites: Course Notes and Reference Texts Supporting lessons",
"adequacy_score": 0.3,
"adequacy_rating": "inadequate",
"grounded_score": 0.35,
"multilingual_score": 0.1,
"round_trip": {
"warnings": [
"Round-trip translation did not preserve source phrase 'entropia'.",
"Round-trip translation did not preserve source phrase 'capacidad del canal'.",
"Round-trip translation did not preserve source phrase 'entropia termodinamica'.",
"Round-trip translation did not preserve source phrase 'no es identica'."
],
"summary": {
"source_phrase_count": 4,
"round_trip_warning_count": 4,
"drifted_phrases": [
"entropia",
"capacidad del canal",
"entropia termodinamica",
"no es identica"
]
}
},
"notes": [
"Did not acknowledge learner strengths.",
"Did not provide a concrete next step.",
"Response does not appear to be in Spanish.",
"Missing required multilingual term 'shannon-entropy' for language 'es'.",
"Missing required multilingual term 'channel-capacity' for language 'es'.",
"Missing required multilingual term 'thermodynamic-entropy' for language 'es'.",
"Missing required multilingual caveat 'shannon-vs-thermo-not-identical' for language 'es'.",
"Round-trip translation did not preserve source phrase 'entropia'.",
"Round-trip translation did not preserve source phrase 'capacidad del canal'.",
"Round-trip translation did not preserve source phrase 'entropia termodinamica'.",
"Round-trip translation did not preserve source phrase 'no es identica'."
]
}
],
"summary": {
"overall_adequacy_score": 0.547,
"overall_adequacy_rating": "inadequate",
"recommended_use": "Not recommended for learner-facing local deployment."
}
}

View File

@ -0,0 +1,16 @@
# Didactopus Local Model Benchmark
- Provider: `stub`
- Hardware profile: `unspecified-local`
- Primary concept: Independent Reasoning and Careful Comparison
- Secondary concept: Thermodynamics and Entropy
- Overall adequacy: inadequate (0.547)
- Recommended use: Not recommended for learner-facing local deployment.
## Role Results
- `mentor` via `local-demo`: inadequate (0.52), latency 0.025 ms
Notes: Did not ask a focused learner question.; Response does not appear to be in Spanish.; Missing required multilingual term 'shannon-entropy' for language 'es'.; Missing required multilingual term 'channel-capacity' for language 'es'.; Missing required multilingual term 'thermodynamic-entropy' for language 'es'.; Missing required multilingual caveat 'shannon-vs-thermo-not-identical' for language 'es'.; Did not visibly preserve a key grounded concept term in multilingual output.; Round-trip translation did not preserve source phrase 'entropia'.; Round-trip translation did not preserve source phrase 'capacidad del canal'.; Round-trip translation did not preserve source phrase 'entropia termodinamica'.; Round-trip translation did not preserve source phrase 'no es identica'.
- `practice` via `local-demo`: adequate (0.82), latency 0.004 ms
Notes: Response does not appear to be in Spanish.; Missing required multilingual term 'shannon-entropy' for language 'es'.; Missing required multilingual term 'channel-capacity' for language 'es'.; Missing required multilingual term 'thermodynamic-entropy' for language 'es'.; Missing required multilingual caveat 'shannon-vs-thermo-not-identical' for language 'es'.; Round-trip translation did not preserve source phrase 'entropia'.; Round-trip translation did not preserve source phrase 'capacidad del canal'.; Round-trip translation did not preserve source phrase 'entropia termodinamica'.; Round-trip translation did not preserve source phrase 'no es identica'.
- `evaluator` via `local-demo`: inadequate (0.3), latency 0.004 ms
Notes: Did not acknowledge learner strengths.; Did not provide a concrete next step.; Response does not appear to be in Spanish.; Missing required multilingual term 'shannon-entropy' for language 'es'.; Missing required multilingual term 'channel-capacity' for language 'es'.; Missing required multilingual term 'thermodynamic-entropy' for language 'es'.; Missing required multilingual caveat 'shannon-vs-thermo-not-identical' for language 'es'.; Round-trip translation did not preserve source phrase 'entropia'.; Round-trip translation did not preserve source phrase 'capacidad del canal'.; Round-trip translation did not preserve source phrase 'entropia termodinamica'.; Round-trip translation did not preserve source phrase 'no es identica'.

View File

@ -1,9 +1,9 @@
<!doctype html> <!doctype html>
<html lang="en"> <html lang="es">
<head> <head>
<meta charset="utf-8"> <meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1"> <meta name="viewport" content="width=device-width, initial-scale=1">
<title>Didactopus Learner Session</title> <title>Sesion de aprendizaje de Didactopus</title>
<style> <style>
:root { color-scheme: light; --bg: #f7f4ed; --panel: #fffdf8; --ink: #1e2b31; --muted: #53656d; --line: #d3c8b7; --accent: #155e63; } :root { color-scheme: light; --bg: #f7f4ed; --panel: #fffdf8; --ink: #1e2b31; --muted: #53656d; --line: #d3c8b7; --accent: #155e63; }
body { margin: 0; font-family: Georgia, 'Times New Roman', serif; background: var(--bg); color: var(--ink); line-height: 1.55; } body { margin: 0; font-family: Georgia, 'Times New Roman', serif; background: var(--bg); color: var(--ink); line-height: 1.55; }
@ -21,24 +21,24 @@ ol, ul { padding-left: 22px; }
</style> </style>
</head> </head>
<body> <body>
<a class="skip" href="#session-main">Skip to learner session</a> <a class="skip" href="#session-main">Saltar a la sesion de aprendizaje</a>
<main id="session-main" aria-label="Didactopus learner session"> <main id="session-main" aria-label="Didactopus learner session">
<section aria-labelledby="session-title"> <section aria-labelledby="session-title">
<h1 id="session-title">Didactopus Learner Session</h1> <h1 id="session-title">Sesion de aprendizaje de Didactopus</h1>
<p class="sr-note">This page is structured for keyboard and screen-reader use. It presents the learner goal, study plan, grounded source fragments, and conversation turns in reading order.</p> <p class="sr-note">Esta pagina esta estructurada para uso con teclado y lector de pantalla. Presenta el objetivo del aprendiz, el plan de estudio, los fragmentos de fundamento y los turnos de conversacion en orden de lectura.</p>
<p><strong>Learner goal:</strong> Help me understand how Shannon entropy leads into channel capacity and thermodynamic entropy.</p> <p><strong>Objetivo del aprendiz:</strong> Help me understand how Shannon entropy leads into channel capacity and thermodynamic entropy.</p>
<p><strong>Source language:</strong> en</p> <p><strong>Idioma de origen:</strong> en</p>
<p><strong>Output language:</strong> es</p> <p><strong>Idioma de salida:</strong> es</p>
</section> </section>
<section aria-labelledby="study-plan-title"> <section aria-labelledby="study-plan-title">
<h2 id="study-plan-title">Study Plan</h2> <h2 id="study-plan-title">Plan de estudio</h2>
<ol> <ol>
<li> <li>
<h3>Independent Reasoning and Careful Comparison</h3> <h3>Independent Reasoning and Careful Comparison</h3>
<p><strong>Status:</strong> mastered</p> <p><strong>Estado:</strong> mastered</p>
<p><strong>Prerequisites:</strong> Course Notes and Reference Texts</p> <p><strong>Prerrequisitos:</strong> Course Notes and Reference Texts</p>
<p><strong>Supporting lessons:</strong> Independent Reasoning and Careful Comparison</p> <p><strong>Lecciones de apoyo:</strong> Independent Reasoning and Careful Comparison</p>
<p><strong>Grounding fragments:</strong></p> <p><strong>Fragmentos de fundamento:</strong></p>
<ul> <ul>
<li><div class="fragment"><strong>Independent Reasoning and Careful Comparison</strong> (lesson_body)<br>- Objective: Explain why the course requires precise comparison of related but non-identical concepts. <li><div class="fragment"><strong>Independent Reasoning and Careful Comparison</strong> (lesson_body)<br>- Objective: Explain why the course requires precise comparison of related but non-identical concepts.
- Exercise: Write a short note distinguishing Shannon entropy, channel capacity, and thermodynamic entropy. - Exercise: Write a short note distinguishing Shannon entropy, channel capacity, and thermodynamic entropy.
@ -48,10 +48,10 @@ The syllabus framing implies a style of work where analogy is useful but dangero
</li> </li>
<li> <li>
<h3>Thermodynamics and Entropy</h3> <h3>Thermodynamics and Entropy</h3>
<p><strong>Status:</strong> mastered</p> <p><strong>Estado:</strong> mastered</p>
<p><strong>Prerequisites:</strong> Cryptography and Information Hiding</p> <p><strong>Prerrequisitos:</strong> Cryptography and Information Hiding</p>
<p><strong>Supporting lessons:</strong> Thermodynamics and Entropy</p> <p><strong>Lecciones de apoyo:</strong> Thermodynamics and Entropy</p>
<p><strong>Grounding fragments:</strong></p> <p><strong>Fragmentos de fundamento:</strong></p>
<ul> <ul>
<li><div class="fragment"><strong>Thermodynamics and Entropy</strong> (lesson_body)<br>- Objective: Explain how thermodynamic entropy relates to, and differs from, Shannon entropy. <li><div class="fragment"><strong>Thermodynamics and Entropy</strong> (lesson_body)<br>- Objective: Explain how thermodynamic entropy relates to, and differs from, Shannon entropy.
- Exercise: Compare the two entropy notions and identify what is preserved across the analogy. - Exercise: Compare the two entropy notions and identify what is preserved across the analogy.
@ -61,10 +61,10 @@ The course uses entropy as a bridge concept between communication theory and phy
</li> </li>
<li> <li>
<h3>Shannon Entropy</h3> <h3>Shannon Entropy</h3>
<p><strong>Status:</strong> mastered</p> <p><strong>Estado:</strong> mastered</p>
<p><strong>Prerequisites:</strong> Counting and Probability</p> <p><strong>Prerrequisitos:</strong> Counting and Probability</p>
<p><strong>Supporting lessons:</strong> Shannon Entropy</p> <p><strong>Lecciones de apoyo:</strong> Shannon Entropy</p>
<p><strong>Grounding fragments:</strong></p> <p><strong>Fragmentos de fundamento:</strong></p>
<ul> <ul>
<li><div class="fragment"><strong>Shannon Entropy</strong> (lesson_body)<br>- Objective: Explain Shannon entropy as a measure of uncertainty and compare high-entropy and low-entropy sources. <li><div class="fragment"><strong>Shannon Entropy</strong> (lesson_body)<br>- Objective: Explain Shannon entropy as a measure of uncertainty and compare high-entropy and low-entropy sources.
- Exercise: Compute the entropy of a Bernoulli source and interpret the result. - Exercise: Compute the entropy of a Bernoulli source and interpret the result.
@ -75,7 +75,7 @@ The course then introduces entropy as a quantitative measure of uncertainty for
</ol> </ol>
</section> </section>
<section aria-labelledby="conversation-title"> <section aria-labelledby="conversation-title">
<h2 id="conversation-title">Conversation</h2> <h2 id="conversation-title">Conversacion</h2>
<article class="turn" aria-label="Conversation turn"> <article class="turn" aria-label="Conversation turn">
<h3>Learner Goal</h3> <h3>Learner Goal</h3>
<p class="meta">Role: user</p> <p class="meta">Role: user</p>
@ -108,10 +108,10 @@ The course then introduces entropy as a quantitative measure of uncertainty for
</article> </article>
</section> </section>
<section aria-labelledby="evaluation-title"> <section aria-labelledby="evaluation-title">
<h2 id="evaluation-title">Evaluation Summary</h2> <h2 id="evaluation-title">Resumen de evaluacion</h2>
<p><strong>Verdict:</strong> needs_revision</p> <p><strong>Veredicto:</strong> needs_revision</p>
<p><strong>Aggregated dimensions:</strong> {&quot;correctness&quot;: 0.6000000000000001, &quot;critique&quot;: 0.6499999999999999, &quot;explanation&quot;: 0.85}</p> <p><strong>Dimensiones agregadas:</strong> {&quot;correctness&quot;: 0.6000000000000001, &quot;critique&quot;: 0.6499999999999999, &quot;explanation&quot;: 0.85}</p>
<p><strong>Follow-up:</strong> Rework the answer so it states the equality/relationship explicitly and explains why it matters.</p> <p><strong>Siguiente paso:</strong> Rework the answer so it states the equality/relationship explicitly and explains why it matters.</p>
</section> </section>
</main> </main>
</body> </body>

View File

@ -1,36 +1,36 @@
Didactopus Learner Session Sesion de aprendizaje de Didactopus
Learner goal: Help me understand how Shannon entropy leads into channel capacity and thermodynamic entropy. Objetivo del aprendiz: Help me understand how Shannon entropy leads into channel capacity and thermodynamic entropy.
Source language: en Idioma de origen: en
Output language: es Idioma de salida: es
Study plan: Plan de estudio:
1. Independent Reasoning and Careful Comparison 1. Independent Reasoning and Careful Comparison
Status: mastered Estado: mastered
Prerequisites: Course Notes and Reference Texts Prerrequisitos: Course Notes and Reference Texts
Supporting lessons: Independent Reasoning and Careful Comparison Lecciones de apoyo: Independent Reasoning and Careful Comparison
Source fragment (lesson_body): - Objective: Explain why the course requires precise comparison of related but non-identical concepts. Fragmento de fuente (lesson_body): - Objective: Explain why the course requires precise comparison of related but non-identical concepts.
- Exercise: Write a short note distinguishing Shannon entropy, channel capacity, and thermodynamic entropy. - Exercise: Write a short note distinguishing Shannon entropy, channel capacity, and thermodynamic entropy.
The syllabus framing implies a style of work where analogy is useful but dangerous when used loosely. Learners must compare models carefully, state assumptions, and notice where similar mathematics does not imply identical interpretation. The syllabus framing implies a style of work where analogy is useful but dangerous when used loosely. Learners must compare models carefully, state assumptions, and notice where similar mathematics does not imply identical interpretation.
Source fragment (objective): Explain why the course requires precise comparison of related but non-identical concepts. Fragmento de fuente (objective): Explain why the course requires precise comparison of related but non-identical concepts.
2. Thermodynamics and Entropy 2. Thermodynamics and Entropy
Status: mastered Estado: mastered
Prerequisites: Cryptography and Information Hiding Prerrequisitos: Cryptography and Information Hiding
Supporting lessons: Thermodynamics and Entropy Lecciones de apoyo: Thermodynamics and Entropy
Source fragment (lesson_body): - Objective: Explain how thermodynamic entropy relates to, and differs from, Shannon entropy. Fragmento de fuente (lesson_body): - Objective: Explain how thermodynamic entropy relates to, and differs from, Shannon entropy.
- Exercise: Compare the two entropy notions and identify what is preserved across the analogy. - Exercise: Compare the two entropy notions and identify what is preserved across the analogy.
The course uses entropy as a bridge concept between communication theory and physics while insisting on careful interpretation. The course uses entropy as a bridge concept between communication theory and physics while insisting on careful interpretation.
Source fragment (objective): Explain how thermodynamic entropy relates to, and differs from, Shannon entropy. Fragmento de fuente (objective): Explain how thermodynamic entropy relates to, and differs from, Shannon entropy.
3. Shannon Entropy 3. Shannon Entropy
Status: mastered Estado: mastered
Prerequisites: Counting and Probability Prerrequisitos: Counting and Probability
Supporting lessons: Shannon Entropy Lecciones de apoyo: Shannon Entropy
Source fragment (lesson_body): - Objective: Explain Shannon entropy as a measure of uncertainty and compare high-entropy and low-entropy sources. Fragmento de fuente (lesson_body): - Objective: Explain Shannon entropy as a measure of uncertainty and compare high-entropy and low-entropy sources.
- Exercise: Compute the entropy of a Bernoulli source and interpret the result. - Exercise: Compute the entropy of a Bernoulli source and interpret the result.
The course then introduces entropy as a quantitative measure of uncertainty for a source model and uses it to reason about representation cost and surprise. The course then introduces entropy as a quantitative measure of uncertainty for a source model and uses it to reason about representation cost and surprise.
Source fragment (objective): Explain Shannon entropy as a measure of uncertainty and compare high-entropy and low-entropy sources. Fragmento de fuente (objective): Explain Shannon entropy as a measure of uncertainty and compare high-entropy and low-entropy sources.
Conversation: Conversacion:
Learner Goal: Learner Goal:
Help me understand how Shannon entropy leads into channel capacity and thermodynamic entropy. Help me understand how Shannon entropy leads into channel capacity and thermodynamic entropy.
@ -49,7 +49,7 @@ Didactopus Evaluator:
Didactopus Mentor: Didactopus Mentor:
[stubbed-response] [mentor] Concept: Independent Reasoning and Careful Comparison Prerequisites: Course Notes and Reference Texts Supporting lessons [stubbed-response] [mentor] Concept: Independent Reasoning and Careful Comparison Prerequisites: Course Notes and Reference Texts Supporting lessons
Evaluation summary: Resumen de evaluacion:
Verdict: needs_revision Veredicto: needs_revision
Aggregated dimensions: {"correctness": 0.6000000000000001, "critique": 0.6499999999999999, "explanation": 0.85} Dimensiones agregadas: {"correctness": 0.6000000000000001, "critique": 0.6499999999999999, "explanation": 0.85}
Follow-up: Rework the answer so it states the equality/relationship explicitly and explains why it matters. Siguiente paso: Rework the answer so it states the equality/relationship explicitly and explains why it matters.

View File

@ -0,0 +1,59 @@
source_language: en
targets:
es:
required_terms:
- id: shannon-entropy
accepted:
- "entropia"
- "entropía"
- "entropia de shannon"
- "entropía de shannon"
- id: channel-capacity
accepted:
- "capacidad del canal"
- "capacidad de canal"
- id: thermodynamic-entropy
accepted:
- "entropia termodinamica"
- "entropía termodinámica"
required_caveats:
- id: shannon-vs-thermo-not-identical
accepted:
- "no es identica"
- "no es idéntica"
- "no son identicas"
- "no son idénticas"
- "no equivale exactamente"
forbidden_confusions:
- id: shannon-equals-thermodynamic-entropy
patterns:
- "es identica a la entropia termodinamica"
- "es idéntica a la entropía termodinámica"
- "son identicas"
- "son idénticas"
fr:
required_terms:
- id: shannon-entropy
accepted:
- "entropie"
- "entropie de shannon"
- id: channel-capacity
accepted:
- "capacite du canal"
- "capacité du canal"
- id: thermodynamic-entropy
accepted:
- "entropie thermodynamique"
required_caveats:
- id: shannon-vs-thermo-not-identical
accepted:
- "n'est pas identique"
- "ne sont pas identiques"
- "n'est pas equivalente"
- "n'est pas équivalente"
forbidden_confusions:
- id: shannon-equals-thermodynamic-entropy
patterns:
- "est identique a l'entropie thermodynamique"
- "est identique à l'entropie thermodynamique"
- "sont identiques"

View File

@ -9,8 +9,16 @@ import yaml
from .config import load_config from .config import load_config
from .language_support import response_language_instruction from .language_support import response_language_instruction
from .learner_session import _grounding_block from .learner_session import _grounding_block
from .model_bench import _adequacy_rating, _score_evaluator_response, _score_mentor_response, _score_practice_response from .model_bench import (
_adequacy_rating,
_multilingual_score,
_round_trip_phrases,
_score_evaluator_response,
_score_mentor_response,
_score_practice_response,
)
from .model_provider import ModelProvider from .model_provider import ModelProvider
from .multilingual_qa import round_trip_warning_for_phrases
from .ocw_skill_agent_demo import build_skill_grounded_study_plan, evaluate_submission_with_skill, load_ocw_skill_context from .ocw_skill_agent_demo import build_skill_grounded_study_plan, evaluate_submission_with_skill, load_ocw_skill_context
from .role_prompts import system_prompt_for_role_variant from .role_prompts import system_prompt_for_role_variant
@ -110,7 +118,24 @@ def _run_candidate(candidate: dict, skill_dir: str | Path) -> dict:
) )
elapsed_ms = round((perf_counter() - started) * 1000.0, 3) elapsed_ms = round((perf_counter() - started) * 1000.0, 3)
score, notes = _scorer_for_role(role)(response.text) score, notes = _scorer_for_role(role)(response.text)
overall += score multilingual_score, multilingual_notes = _multilingual_score(role, response.text, language, context.multilingual_qa)
combined_score = (score * 0.8) + (multilingual_score * 0.2)
round_trip = {"warnings": [], "summary": {"source_phrase_count": 0, "round_trip_warning_count": 0, "drifted_phrases": []}}
if language != "en":
source_phrases = _round_trip_phrases(context.multilingual_qa, language)
if source_phrases:
back_translation = provider.generate(
(
"Translate the following text into English as faithfully as possible, preserving technical meaning and caveats.\n\n"
f"{response.text}"
),
role=role,
system_prompt=system_prompt_for_role_variant(role, variant),
temperature=0.0,
max_tokens=220,
).text
round_trip = round_trip_warning_for_phrases(source_phrases, back_translation)
overall += combined_score
role_results.append( role_results.append(
{ {
"role": role, "role": role,
@ -119,10 +144,13 @@ def _run_candidate(candidate: dict, skill_dir: str | Path) -> dict:
"prompt_variant": variant, "prompt_variant": variant,
"language": language, "language": language,
"latency_ms": elapsed_ms, "latency_ms": elapsed_ms,
"adequacy_score": round(score, 3), "adequacy_score": round(combined_score, 3),
"adequacy_rating": _adequacy_rating(score), "adequacy_rating": _adequacy_rating(combined_score),
"grounded_score": round(score, 3),
"multilingual_score": round(multilingual_score, 3),
"round_trip": round_trip,
"response_preview": response.text[:280], "response_preview": response.text[:280],
"notes": notes, "notes": [*notes, *multilingual_notes, *round_trip["warnings"]],
} }
) )

View File

@ -14,11 +14,84 @@ LANGUAGE_LABELS = {
"ja": "Japanese", "ja": "Japanese",
} }
UI_STRINGS = {
"en": {
"didactopus_learner_session": "Didactopus Learner Session",
"learner_goal": "Learner goal",
"source_language": "Source language",
"output_language": "Output language",
"study_plan": "Study Plan",
"conversation": "Conversation",
"evaluation_summary": "Evaluation Summary",
"verdict": "Verdict",
"aggregated_dimensions": "Aggregated dimensions",
"follow_up": "Follow-up",
"status": "Status",
"prerequisites": "Prerequisites",
"supporting_lessons": "Supporting lessons",
"grounding_fragments": "Grounding fragments",
"source_fragment": "Source fragment",
"skip_to_session": "Skip to learner session",
"screen_reader_note": "This page is structured for keyboard and screen-reader use. It presents the learner goal, study plan, grounded source fragments, and conversation turns in reading order.",
},
"es": {
"didactopus_learner_session": "Sesion de aprendizaje de Didactopus",
"learner_goal": "Objetivo del aprendiz",
"source_language": "Idioma de origen",
"output_language": "Idioma de salida",
"study_plan": "Plan de estudio",
"conversation": "Conversacion",
"evaluation_summary": "Resumen de evaluacion",
"verdict": "Veredicto",
"aggregated_dimensions": "Dimensiones agregadas",
"follow_up": "Siguiente paso",
"status": "Estado",
"prerequisites": "Prerrequisitos",
"supporting_lessons": "Lecciones de apoyo",
"grounding_fragments": "Fragmentos de fundamento",
"source_fragment": "Fragmento de fuente",
"skip_to_session": "Saltar a la sesion de aprendizaje",
"screen_reader_note": "Esta pagina esta estructurada para uso con teclado y lector de pantalla. Presenta el objetivo del aprendiz, el plan de estudio, los fragmentos de fundamento y los turnos de conversacion en orden de lectura.",
},
"fr": {
"didactopus_learner_session": "Session d'apprentissage Didactopus",
"learner_goal": "Objectif de l'apprenant",
"source_language": "Langue source",
"output_language": "Langue de sortie",
"study_plan": "Plan d'etude",
"conversation": "Conversation",
"evaluation_summary": "Resume de l'evaluation",
"verdict": "Verdict",
"aggregated_dimensions": "Dimensions agregees",
"follow_up": "Etape suivante",
"status": "Statut",
"prerequisites": "Prerquis",
"supporting_lessons": "Lecons de soutien",
"grounding_fragments": "Fragments d'ancrage",
"source_fragment": "Fragment source",
"skip_to_session": "Aller a la session d'apprentissage",
"screen_reader_note": "Cette page est structuree pour une utilisation au clavier et avec un lecteur d'ecran. Elle presente l'objectif de l'apprenant, le plan d'etude, les fragments d'ancrage et les tours de conversation dans l'ordre de lecture.",
},
}
LANGUAGE_MARKERS = {
"es": (" el ", " la ", " de ", " y ", " que ", " para ", " no ", "una ", "un "),
"fr": (" le ", " la ", " de ", " et ", " que ", " pour ", " pas ", "une ", "un "),
"de": (" der ", " die ", " und ", " nicht ", " ist ", " fur "),
"pt": (" o ", " a ", " de ", " e ", " para ", " nao "),
"it": (" il ", " la ", " di ", " e ", " per ", " non "),
}
def language_label(language: str) -> str: def language_label(language: str) -> str:
return LANGUAGE_LABELS.get(language, language) return LANGUAGE_LABELS.get(language, language)
def ui_text(key: str, language: str) -> str:
table = UI_STRINGS.get(language, UI_STRINGS["en"])
return table.get(key, UI_STRINGS["en"].get(key, key))
def response_language_instruction(language: str, source_language: str = "en") -> str: def response_language_instruction(language: str, source_language: str = "en") -> str:
if language == source_language: if language == source_language:
return "" return ""
@ -26,3 +99,18 @@ def response_language_instruction(language: str, source_language: str = "en") ->
f" Respond in {language_label(language)}. Preserve key source-grounded concepts and caveats faithfully, " f" Respond in {language_label(language)}. Preserve key source-grounded concepts and caveats faithfully, "
f"and make clear when you are explaining material whose source language is {language_label(source_language)}." f"and make clear when you are explaining material whose source language is {language_label(source_language)}."
) )
def language_alignment_score(text: str, language: str) -> tuple[float, list[str]]:
if language == "en":
return 1.0, []
lowered = f" {text.lower()} "
markers = LANGUAGE_MARKERS.get(language)
if markers is None:
return 0.5, [f"No language-specific heuristic markers are defined for {language} yet."]
marker_hits = sum(1 for marker in markers if marker in lowered)
if marker_hits >= 2:
return 1.0, []
if marker_hits == 1:
return 0.6, [f"Only weak evidence that the response is actually in {language_label(language)}."]
return 0.0, [f"Response does not appear to be in {language_label(language)}."]

View File

@ -4,36 +4,38 @@ import html
import json import json
from pathlib import Path from pathlib import Path
from .language_support import ui_text
def _escape(value: object) -> str: def _escape(value: object) -> str:
return html.escape(str(value)) return html.escape(str(value))
def build_accessible_session_text(session: dict) -> str: def build_accessible_session_text(session: dict) -> str:
language = str(session.get("output_language", "en"))
lines = [ lines = [
"Didactopus Learner Session", ui_text("didactopus_learner_session", language),
"", "",
f"Learner goal: {session.get('goal', '')}", f"{ui_text('learner_goal', language)}: {session.get('goal', '')}",
f"Source language: {session.get('source_language', 'en')}", f"{ui_text('source_language', language)}: {session.get('source_language', 'en')}",
f"Output language: {session.get('output_language', 'en')}", f"{ui_text('output_language', language)}: {session.get('output_language', 'en')}",
"", "",
"Study plan:", f"{ui_text('study_plan', language)}:",
] ]
for index, step in enumerate(session.get("study_plan", {}).get("steps", []), start=1): for index, step in enumerate(session.get("study_plan", {}).get("steps", []), start=1):
lines.extend( lines.extend(
[ [
f"{index}. {step.get('title', '')}", f"{index}. {step.get('title', '')}",
f" Status: {step.get('status', '')}", f" {ui_text('status', language)}: {step.get('status', '')}",
f" Prerequisites: {', '.join(step.get('prerequisite_titles', []) or ['none explicit'])}", f" {ui_text('prerequisites', language)}: {', '.join(step.get('prerequisite_titles', []) or ['none explicit'])}",
f" Supporting lessons: {', '.join(step.get('supporting_lessons', []) or ['none listed'])}", f" {ui_text('supporting_lessons', language)}: {', '.join(step.get('supporting_lessons', []) or ['none listed'])}",
] ]
) )
for fragment in step.get("source_fragments", [])[:2]: for fragment in step.get("source_fragments", [])[:2]:
lines.append(f" Source fragment ({fragment.get('kind', 'fragment')}): {fragment.get('text', '')}") lines.append(f" {ui_text('source_fragment', language)} ({fragment.get('kind', 'fragment')}): {fragment.get('text', '')}")
lines.extend( lines.extend(
[ [
"", "",
"Conversation:", f"{ui_text('conversation', language)}:",
] ]
) )
for turn in session.get("turns", []): for turn in session.get("turns", []):
@ -47,26 +49,27 @@ def build_accessible_session_text(session: dict) -> str:
evaluation = session.get("evaluation", {}) evaluation = session.get("evaluation", {})
lines.extend( lines.extend(
[ [
"Evaluation summary:", f"{ui_text('evaluation_summary', language)}:",
f"Verdict: {evaluation.get('verdict', '')}", f"{ui_text('verdict', language)}: {evaluation.get('verdict', '')}",
f"Aggregated dimensions: {json.dumps(evaluation.get('aggregated', {}), sort_keys=True)}", f"{ui_text('aggregated_dimensions', language)}: {json.dumps(evaluation.get('aggregated', {}), sort_keys=True)}",
f"Follow-up: {evaluation.get('follow_up', '')}", f"{ui_text('follow_up', language)}: {evaluation.get('follow_up', '')}",
] ]
) )
return "\n".join(lines).strip() + "\n" return "\n".join(lines).strip() + "\n"
def build_accessible_session_html(session: dict) -> str: def build_accessible_session_html(session: dict) -> str:
language = str(session.get("output_language", "en"))
steps = session.get("study_plan", {}).get("steps", []) steps = session.get("study_plan", {}).get("steps", [])
turns = session.get("turns", []) turns = session.get("turns", [])
evaluation = session.get("evaluation", {}) evaluation = session.get("evaluation", {})
body = [ body = [
"<!doctype html>", "<!doctype html>",
'<html lang="en">', f'<html lang="{_escape(language)}">',
"<head>", "<head>",
'<meta charset="utf-8">', '<meta charset="utf-8">',
'<meta name="viewport" content="width=device-width, initial-scale=1">', '<meta name="viewport" content="width=device-width, initial-scale=1">',
"<title>Didactopus Learner Session</title>", f"<title>{_escape(ui_text('didactopus_learner_session', language))}</title>",
"<style>", "<style>",
":root { color-scheme: light; --bg: #f7f4ed; --panel: #fffdf8; --ink: #1e2b31; --muted: #53656d; --line: #d3c8b7; --accent: #155e63; }", ":root { color-scheme: light; --bg: #f7f4ed; --panel: #fffdf8; --ink: #1e2b31; --muted: #53656d; --line: #d3c8b7; --accent: #155e63; }",
"body { margin: 0; font-family: Georgia, 'Times New Roman', serif; background: var(--bg); color: var(--ink); line-height: 1.55; }", "body { margin: 0; font-family: Georgia, 'Times New Roman', serif; background: var(--bg); color: var(--ink); line-height: 1.55; }",
@ -84,32 +87,32 @@ def build_accessible_session_html(session: dict) -> str:
"</style>", "</style>",
"</head>", "</head>",
"<body>", "<body>",
'<a class="skip" href="#session-main">Skip to learner session</a>', f'<a class="skip" href="#session-main">{_escape(ui_text("skip_to_session", language))}</a>',
'<main id="session-main" aria-label="Didactopus learner session">', '<main id="session-main" aria-label="Didactopus learner session">',
'<section aria-labelledby="session-title">', '<section aria-labelledby="session-title">',
'<h1 id="session-title">Didactopus Learner Session</h1>', f'<h1 id="session-title">{_escape(ui_text("didactopus_learner_session", language))}</h1>',
'<p class="sr-note">This page is structured for keyboard and screen-reader use. It presents the learner goal, study plan, grounded source fragments, and conversation turns in reading order.</p>', f'<p class="sr-note">{_escape(ui_text("screen_reader_note", language))}</p>',
f"<p><strong>Learner goal:</strong> {_escape(session.get('goal', ''))}</p>", f"<p><strong>{_escape(ui_text('learner_goal', language))}:</strong> {_escape(session.get('goal', ''))}</p>",
f"<p><strong>Source language:</strong> {_escape(session.get('source_language', 'en'))}</p>", f"<p><strong>{_escape(ui_text('source_language', language))}:</strong> {_escape(session.get('source_language', 'en'))}</p>",
f"<p><strong>Output language:</strong> {_escape(session.get('output_language', 'en'))}</p>", f"<p><strong>{_escape(ui_text('output_language', language))}:</strong> {_escape(session.get('output_language', 'en'))}</p>",
"</section>", "</section>",
'<section aria-labelledby="study-plan-title">', '<section aria-labelledby="study-plan-title">',
'<h2 id="study-plan-title">Study Plan</h2>', f'<h2 id="study-plan-title">{_escape(ui_text("study_plan", language))}</h2>',
'<ol>', '<ol>',
] ]
for step in steps: for step in steps:
body.append("<li>") body.append("<li>")
body.append(f"<h3>{_escape(step.get('title', ''))}</h3>") body.append(f"<h3>{_escape(step.get('title', ''))}</h3>")
body.append(f"<p><strong>Status:</strong> {_escape(step.get('status', ''))}</p>") body.append(f"<p><strong>{_escape(ui_text('status', language))}:</strong> {_escape(step.get('status', ''))}</p>")
body.append( body.append(
f"<p><strong>Prerequisites:</strong> {_escape(', '.join(step.get('prerequisite_titles', []) or ['none explicit']))}</p>" f"<p><strong>{_escape(ui_text('prerequisites', language))}:</strong> {_escape(', '.join(step.get('prerequisite_titles', []) or ['none explicit']))}</p>"
) )
body.append( body.append(
f"<p><strong>Supporting lessons:</strong> {_escape(', '.join(step.get('supporting_lessons', []) or ['none listed']))}</p>" f"<p><strong>{_escape(ui_text('supporting_lessons', language))}:</strong> {_escape(', '.join(step.get('supporting_lessons', []) or ['none listed']))}</p>"
) )
fragments = step.get("source_fragments", [])[:2] fragments = step.get("source_fragments", [])[:2]
if fragments: if fragments:
body.append("<p><strong>Grounding fragments:</strong></p>") body.append(f"<p><strong>{_escape(ui_text('grounding_fragments', language))}:</strong></p>")
body.append("<ul>") body.append("<ul>")
for fragment in fragments: for fragment in fragments:
body.append( body.append(
@ -123,7 +126,7 @@ def build_accessible_session_html(session: dict) -> str:
"</ol>", "</ol>",
"</section>", "</section>",
'<section aria-labelledby="conversation-title">', '<section aria-labelledby="conversation-title">',
'<h2 id="conversation-title">Conversation</h2>', f'<h2 id="conversation-title">{_escape(ui_text("conversation", language))}</h2>',
] ]
) )
for turn in turns: for turn in turns:
@ -136,10 +139,10 @@ def build_accessible_session_html(session: dict) -> str:
[ [
"</section>", "</section>",
'<section aria-labelledby="evaluation-title">', '<section aria-labelledby="evaluation-title">',
'<h2 id="evaluation-title">Evaluation Summary</h2>', f'<h2 id="evaluation-title">{_escape(ui_text("evaluation_summary", language))}</h2>',
f"<p><strong>Verdict:</strong> {_escape(evaluation.get('verdict', ''))}</p>", f"<p><strong>{_escape(ui_text('verdict', language))}:</strong> {_escape(evaluation.get('verdict', ''))}</p>",
f"<p><strong>Aggregated dimensions:</strong> {_escape(json.dumps(evaluation.get('aggregated', {}), sort_keys=True))}</p>", f"<p><strong>{_escape(ui_text('aggregated_dimensions', language))}:</strong> {_escape(json.dumps(evaluation.get('aggregated', {}), sort_keys=True))}</p>",
f"<p><strong>Follow-up:</strong> {_escape(evaluation.get('follow_up', ''))}</p>", f"<p><strong>{_escape(ui_text('follow_up', language))}:</strong> {_escape(evaluation.get('follow_up', ''))}</p>",
"</section>", "</section>",
"</main>", "</main>",
"</body>", "</body>",

View File

@ -5,9 +5,10 @@ from pathlib import Path
from time import perf_counter from time import perf_counter
from .config import load_config from .config import load_config
from .language_support import response_language_instruction from .language_support import language_alignment_score, response_language_instruction
from .learner_session import _grounding_block from .learner_session import _grounding_block
from .model_provider import ModelProvider from .model_provider import ModelProvider
from .multilingual_qa import multilingual_qa_for_text, round_trip_warning_for_phrases
from .ocw_skill_agent_demo import build_skill_grounded_study_plan, evaluate_submission_with_skill, load_ocw_skill_context from .ocw_skill_agent_demo import build_skill_grounded_study_plan, evaluate_submission_with_skill, load_ocw_skill_context
from .role_prompts import system_prompt_for_role from .role_prompts import system_prompt_for_role
@ -77,6 +78,47 @@ def _adequacy_rating(score: float) -> str:
return "inadequate" return "inadequate"
def _multilingual_score(role: str, text: str, language: str, qa_spec: dict | None = None) -> tuple[float, list[str]]:
score, notes = language_alignment_score(text, language)
if language == "en":
return score, notes
qa_score = 1.0
qa_notes: list[str] = []
if qa_spec:
qa_result = multilingual_qa_for_text(qa_spec, language=language, text=text)
qa_notes = list(qa_result["warnings"])
summary = qa_result["summary"]
denominator = summary["required_term_count"] + summary["required_caveat_count"] + summary["forbidden_confusion_count"]
numerator = summary["matched_term_count"] + summary["matched_caveat_count"] + (
summary["forbidden_confusion_count"] - summary["confusion_hit_count"]
)
if denominator > 0:
qa_score = max(0.0, min(1.0, numerator / denominator))
role_lower = role.lower()
if role_lower == "mentor" and "entropy" not in text.lower():
qa_notes = list(qa_notes)
qa_notes.append("Did not visibly preserve a key grounded concept term in multilingual output.")
qa_score = max(0.0, qa_score - 0.2)
combined = (score * 0.5) + (qa_score * 0.5)
return combined, [*notes, *qa_notes]
def _round_trip_phrases(qa_spec: dict | None, language: str) -> list[str]:
if not qa_spec or language == "en":
return []
target = (qa_spec.get("targets", {}) or {}).get(language, {}) or {}
phrases: list[str] = []
for entry in target.get("required_terms", []) or []:
accepted = entry.get("accepted", []) or []
if accepted:
phrases.append(str(accepted[0]))
for entry in target.get("required_caveats", []) or []:
accepted = entry.get("accepted", []) or []
if accepted:
phrases.append(str(accepted[0]))
return phrases[:6]
def _hardware_profile( def _hardware_profile(
*, *,
profile_name: str, profile_name: str,
@ -163,7 +205,24 @@ def run_model_benchmark(
) )
elapsed_ms = round((perf_counter() - started) * 1000.0, 3) elapsed_ms = round((perf_counter() - started) * 1000.0, 3)
score, notes = scorers[role](response.text) score, notes = scorers[role](response.text)
adequacy_scores.append(score) multilingual_score, multilingual_notes = _multilingual_score(role, response.text, language, context.multilingual_qa)
combined_score = (score * 0.8) + (multilingual_score * 0.2)
round_trip = {"warnings": [], "summary": {"source_phrase_count": 0, "round_trip_warning_count": 0, "drifted_phrases": []}}
if language != "en":
source_phrases = _round_trip_phrases(context.multilingual_qa, language)
if source_phrases:
back_translation = provider.generate(
(
"Translate the following text into English as faithfully as possible, preserving technical meaning and caveats.\n\n"
f"{response.text}"
),
role=role,
system_prompt=system_prompt_for_role(role),
temperature=0.0,
max_tokens=220,
).text
round_trip = round_trip_warning_for_phrases(source_phrases, back_translation)
adequacy_scores.append(combined_score)
role_results.append( role_results.append(
{ {
"role": role, "role": role,
@ -171,9 +230,12 @@ def run_model_benchmark(
"model_name": response.model_name, "model_name": response.model_name,
"latency_ms": elapsed_ms, "latency_ms": elapsed_ms,
"response_preview": response.text[:280], "response_preview": response.text[:280],
"adequacy_score": round(score, 3), "adequacy_score": round(combined_score, 3),
"adequacy_rating": _adequacy_rating(score), "adequacy_rating": _adequacy_rating(combined_score),
"notes": notes, "grounded_score": round(score, 3),
"multilingual_score": round(multilingual_score, 3),
"round_trip": round_trip,
"notes": [*notes, *multilingual_notes, *round_trip["warnings"]],
} }
) )

View File

@ -0,0 +1,100 @@
from __future__ import annotations
from pathlib import Path
import yaml
def _contains_non_negated_pattern(lowered: str, pattern: str) -> bool:
start = lowered.find(pattern)
while start != -1:
prefix = lowered[max(0, start - 4):start]
if not prefix.endswith("no "):
return True
start = lowered.find(pattern, start + 1)
return False
def load_multilingual_qa_spec(source_dir: str | Path) -> dict:
source = Path(source_dir)
path = source / "multilingual_qa.yaml"
if not path.exists():
return {}
return yaml.safe_load(path.read_text(encoding="utf-8")) or {}
def multilingual_qa_for_text(spec: dict, *, language: str, text: str) -> dict:
targets = spec.get("targets", {}) or {}
target = targets.get(language, {}) or {}
warnings: list[str] = []
summary = {
"language": language,
"required_term_count": 0,
"matched_term_count": 0,
"required_caveat_count": 0,
"matched_caveat_count": 0,
"forbidden_confusion_count": 0,
"confusion_hit_count": 0,
}
if not target:
warnings.append(f"No multilingual QA spec is defined for language '{language}'.")
return {"warnings": warnings, "summary": summary}
lowered = text.lower()
required_terms = target.get("required_terms", []) or []
summary["required_term_count"] = len(required_terms)
for term in required_terms:
accepted = [str(item).lower() for item in term.get("accepted", []) or []]
if any(candidate in lowered for candidate in accepted):
summary["matched_term_count"] += 1
else:
warnings.append(f"Missing required multilingual term '{term.get('id', 'unknown')}' for language '{language}'.")
required_caveats = target.get("required_caveats", []) or []
summary["required_caveat_count"] = len(required_caveats)
for caveat in required_caveats:
accepted = [str(item).lower() for item in caveat.get("accepted", []) or []]
if any(candidate in lowered for candidate in accepted):
summary["matched_caveat_count"] += 1
else:
warnings.append(f"Missing required multilingual caveat '{caveat.get('id', 'unknown')}' for language '{language}'.")
forbidden_confusions = target.get("forbidden_confusions", []) or []
summary["forbidden_confusion_count"] = len(forbidden_confusions)
for confusion in forbidden_confusions:
patterns = [str(item).lower() for item in confusion.get("patterns", []) or []]
if any(_contains_non_negated_pattern(lowered, pattern) for pattern in patterns):
summary["confusion_hit_count"] += 1
warnings.append(f"Detected forbidden multilingual confusion '{confusion.get('id', 'unknown')}' for language '{language}'.")
return {"warnings": warnings, "summary": summary}
def multilingual_qa_for_pack(source_dir: str | Path, *, language: str, text: str) -> dict:
spec = load_multilingual_qa_spec(source_dir)
return multilingual_qa_for_text(spec, language=language, text=text)
def round_trip_warning_for_phrases(
source_phrases: list[str],
back_translated_text: str,
) -> dict:
lowered = back_translated_text.lower()
warnings: list[str] = []
drifted: list[str] = []
for phrase in source_phrases:
normalized = str(phrase).strip().lower()
if not normalized:
continue
if normalized not in lowered:
warnings.append(f"Round-trip translation did not preserve source phrase '{phrase}'.")
drifted.append(phrase)
return {
"warnings": warnings,
"summary": {
"source_phrase_count": len([phrase for phrase in source_phrases if str(phrase).strip()]),
"round_trip_warning_count": len(warnings),
"drifted_phrases": drifted,
},
}

View File

@ -0,0 +1,148 @@
from __future__ import annotations
import json
from pathlib import Path
import yaml
from .pack_validator import load_pack_artifacts
def _normalize_phrase(text: str) -> str:
return " ".join(str(text).replace(":", " ").replace("-", " ").split()).strip()
def _candidate_languages(languages: list[str] | None) -> list[str]:
return list(languages) if languages else ["es", "fr"]
def _seed_required_terms(concepts: list[dict]) -> list[dict]:
seeded = []
seen = set()
for concept in concepts:
title = str(concept.get("title", "")).strip()
concept_id = str(concept.get("id", "")).strip()
if not title or not concept_id:
continue
normalized = _normalize_phrase(title)
if len(normalized.split()) < 2:
continue
if concept_id in seen:
continue
seen.add(concept_id)
seeded.append(
{
"id": concept_id,
"accepted": [normalized],
}
)
return seeded[:12]
def _seed_required_caveats(source_corpus: dict) -> list[dict]:
caveats = []
seen = set()
for fragment in source_corpus.get("fragments", []) or []:
texts = [fragment.get("text", "")]
texts.extend(fragment.get("objectives", []) or [])
texts.extend(fragment.get("exercises", []) or [])
for text in texts:
lowered = str(text).lower()
if "not identical" in lowered or "differs from" in lowered or "careful interpretation" in lowered:
lesson_title = _normalize_phrase(fragment.get("lesson_title", "lesson"))
caveat_id = lesson_title.lower().replace(" ", "-")[:48] or "caveat"
if caveat_id in seen:
continue
seen.add(caveat_id)
caveats.append(
{
"id": caveat_id,
"accepted": [_normalize_phrase(text)],
}
)
return caveats[:6]
def _seed_forbidden_confusions(required_caveats: list[dict]) -> list[dict]:
confusions = []
for caveat in required_caveats:
accepted = caveat.get("accepted", []) or []
if not accepted:
continue
phrase = str(accepted[0])
lowered = phrase.lower()
if "not identical" in lowered:
confusion = phrase.replace("not identical", "identical")
elif "differs from" in lowered:
confusion = phrase.replace("differs from", "is identical to")
else:
continue
confusions.append(
{
"id": f"{caveat['id']}-confusion",
"patterns": [_normalize_phrase(confusion)],
}
)
return confusions[:6]
def generate_multilingual_qa_seed(
source_dir: str | Path,
*,
languages: list[str] | None = None,
) -> dict:
source_dir = Path(source_dir)
loaded = load_pack_artifacts(source_dir)
if not loaded["ok"]:
raise ValueError(f"Cannot seed multilingual QA for invalid pack directory: {source_dir}")
concepts = loaded["artifacts"]["concepts"].get("concepts", []) or []
source_corpus_path = source_dir / "source_corpus.json"
source_corpus = json.loads(source_corpus_path.read_text(encoding="utf-8")) if source_corpus_path.exists() else {"fragments": []}
required_terms = _seed_required_terms(concepts)
required_caveats = _seed_required_caveats(source_corpus)
forbidden_confusions = _seed_forbidden_confusions(required_caveats)
targets = {}
for language in _candidate_languages(languages):
targets[language] = {
"required_terms": required_terms,
"required_caveats": required_caveats,
"forbidden_confusions": forbidden_confusions,
}
return {
"source_language": "en",
"generated_by": "didactopus.multilingual_qa_seed",
"review_status": "draft-seed",
"targets": targets,
}
def write_multilingual_qa_seed(
source_dir: str | Path,
*,
out_path: str | Path | None = None,
languages: list[str] | None = None,
) -> Path:
source_dir = Path(source_dir)
payload = generate_multilingual_qa_seed(source_dir, languages=languages)
out_path = Path(out_path) if out_path is not None else source_dir / "multilingual_qa.seed.yaml"
out_path.write_text(yaml.safe_dump(payload, sort_keys=False, allow_unicode=False), encoding="utf-8")
return out_path
def main() -> None:
import argparse
parser = argparse.ArgumentParser(description="Generate a starter multilingual QA spec from a Didactopus pack.")
parser.add_argument("pack_dir")
parser.add_argument("--out", default=None)
parser.add_argument("--languages", nargs="*", default=None)
args = parser.parse_args()
out_path = write_multilingual_qa_seed(args.pack_dir, out_path=args.out, languages=args.languages)
print(json.dumps({"written": str(out_path)}, indent=2))
if __name__ == "__main__":
main()

View File

@ -8,6 +8,7 @@ import yaml
from .evaluator_pipeline import CritiqueEvaluator, LearnerAttempt, RubricEvaluator, SymbolicRuleEvaluator, aggregate, run_pipeline from .evaluator_pipeline import CritiqueEvaluator, LearnerAttempt, RubricEvaluator, SymbolicRuleEvaluator, aggregate, run_pipeline
from .graph_retrieval import GraphBundle, lesson_titles_for_concept, prerequisite_titles, source_fragments_for_concept from .graph_retrieval import GraphBundle, lesson_titles_for_concept, prerequisite_titles, source_fragments_for_concept
from .multilingual_qa import load_multilingual_qa_spec
@dataclass @dataclass
@ -21,6 +22,7 @@ class SkillContext:
graph_bundle: GraphBundle graph_bundle: GraphBundle
capability_profile: dict capability_profile: dict
run_summary: dict run_summary: dict
multilingual_qa: dict
def load_ocw_skill_context(skill_dir: str | Path) -> SkillContext: def load_ocw_skill_context(skill_dir: str | Path) -> SkillContext:
@ -54,6 +56,7 @@ def load_ocw_skill_context(skill_dir: str | Path) -> SkillContext:
), ),
capability_profile=json.loads((run_dir / "capability_profile.json").read_text(encoding="utf-8")), capability_profile=json.loads((run_dir / "capability_profile.json").read_text(encoding="utf-8")),
run_summary=json.loads((run_dir / "run_summary.json").read_text(encoding="utf-8")), run_summary=json.loads((run_dir / "run_summary.json").read_text(encoding="utf-8")),
multilingual_qa=load_multilingual_qa_spec(pack_dir),
) )

View File

@ -35,4 +35,6 @@ def test_run_didactopus_arena_writes_outputs(tmp_path: Path) -> None:
queue = json.loads((tmp_path / "arena_review_queue.json").read_text(encoding="utf-8")) queue = json.loads((tmp_path / "arena_review_queue.json").read_text(encoding="utf-8"))
assert queue assert queue
assert payload["ranked_candidates"][0]["language"] in {"en", "es", "fr"} assert payload["ranked_candidates"][0]["language"] in {"en", "es", "fr"}
assert "multilingual_score" in payload["ranked_candidates"][0]["role_results"][0]
assert "round_trip" in payload["ranked_candidates"][0]["role_results"][0]
assert "LLM Review Summary" in (tmp_path / "arena_report.md").read_text(encoding="utf-8") assert "LLM Review Summary" in (tmp_path / "arena_report.md").read_text(encoding="utf-8")

View File

@ -0,0 +1,28 @@
from didactopus.language_support import language_alignment_score, response_language_instruction, ui_text
def test_response_language_instruction_is_empty_for_source_language() -> None:
assert response_language_instruction("en", "en") == ""
def test_response_language_instruction_mentions_target_language() -> None:
instruction = response_language_instruction("es", "en")
assert "Spanish" in instruction
assert "English" in instruction
def test_ui_text_uses_translated_labels() -> None:
assert ui_text("study_plan", "es") == "Plan de estudio"
assert ui_text("evaluation_summary", "fr") == "Resume de l'evaluation"
def test_language_alignment_score_detects_non_english_markers() -> None:
score, notes = language_alignment_score("La entropia y la capacidad del canal se comparan para el aprendiz.", "es")
assert score == 1.0
assert notes == []
def test_language_alignment_score_flags_wrong_language() -> None:
score, notes = language_alignment_score("This response is still entirely in English.", "es")
assert score == 0.0
assert notes

View File

@ -30,9 +30,23 @@ def test_accessible_session_text_is_linearized() -> None:
assert "Learner goal:" in text assert "Learner goal:" in text
assert "Source language:" in text assert "Source language:" in text
assert "Output language:" in text assert "Output language:" in text
assert "Study plan:" in text assert "Study Plan:" in text
assert "Conversation:" in text assert "Conversation:" in text
assert "Evaluation summary:" in text assert "Evaluation Summary:" in text
def test_accessible_session_outputs_localize_fixed_labels() -> None:
root = Path(__file__).resolve().parents[1]
payload = run_learner_session_demo(
root / "configs" / "config.example.yaml",
root / "skills" / "ocw-information-entropy-agent",
language="es",
)
html = build_accessible_session_html(payload)
text = build_accessible_session_text(payload)
assert "Sesion de aprendizaje de Didactopus" in html
assert "Plan de estudio" in html
assert "Objetivo del aprendiz:" in text
def test_render_accessible_session_outputs_writes_files(tmp_path: Path) -> None: def test_render_accessible_session_outputs_writes_files(tmp_path: Path) -> None:

View File

@ -43,3 +43,15 @@ def test_model_benchmark_captures_response_preview_and_latency(tmp_path) -> None
assert result["latency_ms"] >= 0.0 assert result["latency_ms"] >= 0.0
assert result["response_preview"] assert result["response_preview"]
assert "adequacy_score" in result assert "adequacy_score" in result
assert "round_trip" in result
def test_model_benchmark_penalizes_stub_for_non_english_output(tmp_path) -> None:
payload = run_model_benchmark(
config_path="configs/config.example.yaml",
skill_dir="skills/ocw-information-entropy-agent",
out_dir=tmp_path,
language="es",
)
assert payload["context"]["output_language"] == "es"
assert any(result["multilingual_score"] < 1.0 for result in payload["role_results"])

View File

@ -0,0 +1,52 @@
from pathlib import Path
from didactopus.multilingual_qa import (
load_multilingual_qa_spec,
multilingual_qa_for_pack,
multilingual_qa_for_text,
round_trip_warning_for_phrases,
)
def test_load_multilingual_qa_spec_reads_ocw_pack() -> None:
spec = load_multilingual_qa_spec("domain-packs/mit-ocw-information-entropy")
assert spec["source_language"] == "en"
assert "es" in spec["targets"]
assert "fr" in spec["targets"]
def test_multilingual_qa_for_text_accepts_spanish_preservation() -> None:
spec = load_multilingual_qa_spec("domain-packs/mit-ocw-information-entropy")
result = multilingual_qa_for_text(
spec,
language="es",
text="La entropía de Shannon no es idéntica a la entropía termodinámica, y la capacidad del canal impone otro límite.",
)
assert result["summary"]["matched_term_count"] >= 2
assert result["summary"]["matched_caveat_count"] == 1
assert result["summary"]["confusion_hit_count"] == 0
def test_multilingual_qa_for_text_flags_confusion() -> None:
spec = load_multilingual_qa_spec("domain-packs/mit-ocw-information-entropy")
result = multilingual_qa_for_text(
spec,
language="es",
text="La entropía de Shannon es idéntica a la entropía termodinámica.",
)
assert result["summary"]["confusion_hit_count"] == 1
assert any("forbidden multilingual confusion" in warning.lower() for warning in result["warnings"])
def test_multilingual_qa_for_pack_handles_missing_spec(tmp_path: Path) -> None:
result = multilingual_qa_for_pack(tmp_path, language="es", text="Texto de prueba.")
assert any("no multilingual qa spec" in warning.lower() for warning in result["warnings"])
def test_round_trip_warning_for_phrases_flags_drift() -> None:
result = round_trip_warning_for_phrases(
["Shannon entropy", "channel capacity"],
"This back translation only preserved Shannon entropy.",
)
assert result["summary"]["round_trip_warning_count"] == 1
assert result["summary"]["drifted_phrases"] == ["channel capacity"]

View File

@ -0,0 +1,27 @@
from pathlib import Path
import yaml
from didactopus.multilingual_qa_seed import generate_multilingual_qa_seed, write_multilingual_qa_seed
def test_generate_multilingual_qa_seed_uses_pack_content() -> None:
payload = generate_multilingual_qa_seed("domain-packs/mit-ocw-information-entropy", languages=["es"])
assert payload["source_language"] == "en"
assert payload["review_status"] == "draft-seed"
assert "es" in payload["targets"]
target = payload["targets"]["es"]
assert target["required_terms"]
assert any(item["id"] == "shannon-entropy" for item in target["required_terms"])
assert target["required_caveats"]
def test_write_multilingual_qa_seed_writes_yaml(tmp_path: Path) -> None:
out = write_multilingual_qa_seed(
"domain-packs/mit-ocw-information-entropy",
out_path=tmp_path / "multilingual_qa.seed.yaml",
languages=["es", "fr"],
)
assert out.exists()
written = yaml.safe_load(out.read_text(encoding="utf-8"))
assert set(written["targets"]) == {"es", "fr"}