Didactopus Learner Session
-This page is structured for keyboard and screen-reader use. It presents the learner goal, study plan, grounded source fragments, and conversation turns in reading order.
-Learner goal: Help me understand how Shannon entropy leads into channel capacity and thermodynamic entropy.
-Source language: en
-Output language: es
+Sesion de aprendizaje de Didactopus
+Esta pagina esta estructurada para uso con teclado y lector de pantalla. Presenta el objetivo del aprendiz, el plan de estudio, los fragmentos de fundamento y los turnos de conversacion en orden de lectura.
+Objetivo del aprendiz: Help me understand how Shannon entropy leads into channel capacity and thermodynamic entropy.
+Idioma de origen: en
+Idioma de salida: es
Study Plan
+Plan de estudio
-
Independent Reasoning and Careful Comparison
-Status: mastered
-Prerequisites: Course Notes and Reference Texts
-Supporting lessons: Independent Reasoning and Careful Comparison
-Grounding fragments:
+Estado: mastered
+Prerrequisitos: Course Notes and Reference Texts
+Lecciones de apoyo: Independent Reasoning and Careful Comparison
+Fragmentos de fundamento:
- Independent Reasoning and Careful Comparison (lesson_body)
- Objective: Explain why the course requires precise comparison of related but non-identical concepts. - Exercise: Write a short note distinguishing Shannon entropy, channel capacity, and thermodynamic entropy. @@ -48,10 +48,10 @@ The syllabus framing implies a style of work where analogy is useful but dangeroThermodynamics and Entropy
-Status: mastered
-Prerequisites: Cryptography and Information Hiding
-Supporting lessons: Thermodynamics and Entropy
-Grounding fragments:
+Estado: mastered
+Prerrequisitos: Cryptography and Information Hiding
+Lecciones de apoyo: Thermodynamics and Entropy
+Fragmentos de fundamento:
- Thermodynamics and Entropy (lesson_body)
- Objective: Explain how thermodynamic entropy relates to, and differs from, Shannon entropy. - Exercise: Compare the two entropy notions and identify what is preserved across the analogy. @@ -61,10 +61,10 @@ The course uses entropy as a bridge concept between communication theory and phyShannon Entropy
-Status: mastered
-Prerequisites: Counting and Probability
-Supporting lessons: Shannon Entropy
-Grounding fragments:
+Estado: mastered
+Prerrequisitos: Counting and Probability
+Lecciones de apoyo: Shannon Entropy
+Fragmentos de fundamento:
- Shannon Entropy (lesson_body)
- Objective: Explain Shannon entropy as a measure of uncertainty and compare high-entropy and low-entropy sources. - Exercise: Compute the entropy of a Bernoulli source and interpret the result. @@ -75,7 +75,7 @@ The course then introduces entropy as a quantitative measure of uncertainty for- Conversation
+Conversacion
- diff --git a/examples/ocw-information-entropy-session-es.txt b/examples/ocw-information-entropy-session-es.txt index 6b45a57..d6c1620 100644 --- a/examples/ocw-information-entropy-session-es.txt +++ b/examples/ocw-information-entropy-session-es.txt @@ -1,36 +1,36 @@ -Didactopus Learner Session +Sesion de aprendizaje de Didactopus -Learner goal: Help me understand how Shannon entropy leads into channel capacity and thermodynamic entropy. -Source language: en -Output language: es +Objetivo del aprendiz: Help me understand how Shannon entropy leads into channel capacity and thermodynamic entropy. +Idioma de origen: en +Idioma de salida: es -Study plan: +Plan de estudio: 1. Independent Reasoning and Careful Comparison - Status: mastered - Prerequisites: Course Notes and Reference Texts - Supporting lessons: Independent Reasoning and Careful Comparison - Source fragment (lesson_body): - Objective: Explain why the course requires precise comparison of related but non-identical concepts. + Estado: mastered + Prerrequisitos: Course Notes and Reference Texts + Lecciones de apoyo: Independent Reasoning and Careful Comparison + Fragmento de fuente (lesson_body): - Objective: Explain why the course requires precise comparison of related but non-identical concepts. - Exercise: Write a short note distinguishing Shannon entropy, channel capacity, and thermodynamic entropy. The syllabus framing implies a style of work where analogy is useful but dangerous when used loosely. Learners must compare models carefully, state assumptions, and notice where similar mathematics does not imply identical interpretation. - Source fragment (objective): Explain why the course requires precise comparison of related but non-identical concepts. + Fragmento de fuente (objective): Explain why the course requires precise comparison of related but non-identical concepts. 2. Thermodynamics and Entropy - Status: mastered - Prerequisites: Cryptography and Information Hiding - Supporting lessons: Thermodynamics and Entropy - Source fragment (lesson_body): - Objective: Explain how thermodynamic entropy relates to, and differs from, Shannon entropy. + Estado: mastered + Prerrequisitos: Cryptography and Information Hiding + Lecciones de apoyo: Thermodynamics and Entropy + Fragmento de fuente (lesson_body): - Objective: Explain how thermodynamic entropy relates to, and differs from, Shannon entropy. - Exercise: Compare the two entropy notions and identify what is preserved across the analogy. The course uses entropy as a bridge concept between communication theory and physics while insisting on careful interpretation. - Source fragment (objective): Explain how thermodynamic entropy relates to, and differs from, Shannon entropy. + Fragmento de fuente (objective): Explain how thermodynamic entropy relates to, and differs from, Shannon entropy. 3. Shannon Entropy - Status: mastered - Prerequisites: Counting and Probability - Supporting lessons: Shannon Entropy - Source fragment (lesson_body): - Objective: Explain Shannon entropy as a measure of uncertainty and compare high-entropy and low-entropy sources. + Estado: mastered + Prerrequisitos: Counting and Probability + Lecciones de apoyo: Shannon Entropy + Fragmento de fuente (lesson_body): - Objective: Explain Shannon entropy as a measure of uncertainty and compare high-entropy and low-entropy sources. - Exercise: Compute the entropy of a Bernoulli source and interpret the result. The course then introduces entropy as a quantitative measure of uncertainty for a source model and uses it to reason about representation cost and surprise. - Source fragment (objective): Explain Shannon entropy as a measure of uncertainty and compare high-entropy and low-entropy sources. + Fragmento de fuente (objective): Explain Shannon entropy as a measure of uncertainty and compare high-entropy and low-entropy sources. -Conversation: +Conversacion: Learner Goal: Help me understand how Shannon entropy leads into channel capacity and thermodynamic entropy. @@ -49,7 +49,7 @@ Didactopus Evaluator: Didactopus Mentor: [stubbed-response] [mentor] Concept: Independent Reasoning and Careful Comparison Prerequisites: Course Notes and Reference Texts Supporting lessons -Evaluation summary: -Verdict: needs_revision -Aggregated dimensions: {"correctness": 0.6000000000000001, "critique": 0.6499999999999999, "explanation": 0.85} -Follow-up: Rework the answer so it states the equality/relationship explicitly and explains why it matters. +Resumen de evaluacion: +Veredicto: needs_revision +Dimensiones agregadas: {"correctness": 0.6000000000000001, "critique": 0.6499999999999999, "explanation": 0.85} +Siguiente paso: Rework the answer so it states the equality/relationship explicitly and explains why it matters. diff --git a/skills/ocw-information-entropy-agent/assets/generated/pack/multilingual_qa.yaml b/skills/ocw-information-entropy-agent/assets/generated/pack/multilingual_qa.yaml new file mode 100644 index 0000000..feaf928 --- /dev/null +++ b/skills/ocw-information-entropy-agent/assets/generated/pack/multilingual_qa.yaml @@ -0,0 +1,59 @@ +source_language: en +targets: + es: + required_terms: + - id: shannon-entropy + accepted: + - "entropia" + - "entropía" + - "entropia de shannon" + - "entropía de shannon" + - id: channel-capacity + accepted: + - "capacidad del canal" + - "capacidad de canal" + - id: thermodynamic-entropy + accepted: + - "entropia termodinamica" + - "entropía termodinámica" + required_caveats: + - id: shannon-vs-thermo-not-identical + accepted: + - "no es identica" + - "no es idéntica" + - "no son identicas" + - "no son idénticas" + - "no equivale exactamente" + forbidden_confusions: + - id: shannon-equals-thermodynamic-entropy + patterns: + - "es identica a la entropia termodinamica" + - "es idéntica a la entropía termodinámica" + - "son identicas" + - "son idénticas" + fr: + required_terms: + - id: shannon-entropy + accepted: + - "entropie" + - "entropie de shannon" + - id: channel-capacity + accepted: + - "capacite du canal" + - "capacité du canal" + - id: thermodynamic-entropy + accepted: + - "entropie thermodynamique" + required_caveats: + - id: shannon-vs-thermo-not-identical + accepted: + - "n'est pas identique" + - "ne sont pas identiques" + - "n'est pas equivalente" + - "n'est pas équivalente" + forbidden_confusions: + - id: shannon-equals-thermodynamic-entropy + patterns: + - "est identique a l'entropie thermodynamique" + - "est identique à l'entropie thermodynamique" + - "sont identiques" diff --git a/src/didactopus/arena.py b/src/didactopus/arena.py index e69195c..66156bd 100644 --- a/src/didactopus/arena.py +++ b/src/didactopus/arena.py @@ -9,8 +9,16 @@ import yaml from .config import load_config from .language_support import response_language_instruction from .learner_session import _grounding_block -from .model_bench import _adequacy_rating, _score_evaluator_response, _score_mentor_response, _score_practice_response +from .model_bench import ( + _adequacy_rating, + _multilingual_score, + _round_trip_phrases, + _score_evaluator_response, + _score_mentor_response, + _score_practice_response, +) from .model_provider import ModelProvider +from .multilingual_qa import round_trip_warning_for_phrases from .ocw_skill_agent_demo import build_skill_grounded_study_plan, evaluate_submission_with_skill, load_ocw_skill_context from .role_prompts import system_prompt_for_role_variant @@ -110,7 +118,24 @@ def _run_candidate(candidate: dict, skill_dir: str | Path) -> dict: ) elapsed_ms = round((perf_counter() - started) * 1000.0, 3) score, notes = _scorer_for_role(role)(response.text) - overall += score + multilingual_score, multilingual_notes = _multilingual_score(role, response.text, language, context.multilingual_qa) + combined_score = (score * 0.8) + (multilingual_score * 0.2) + round_trip = {"warnings": [], "summary": {"source_phrase_count": 0, "round_trip_warning_count": 0, "drifted_phrases": []}} + if language != "en": + source_phrases = _round_trip_phrases(context.multilingual_qa, language) + if source_phrases: + back_translation = provider.generate( + ( + "Translate the following text into English as faithfully as possible, preserving technical meaning and caveats.\n\n" + f"{response.text}" + ), + role=role, + system_prompt=system_prompt_for_role_variant(role, variant), + temperature=0.0, + max_tokens=220, + ).text + round_trip = round_trip_warning_for_phrases(source_phrases, back_translation) + overall += combined_score role_results.append( { "role": role, @@ -119,10 +144,13 @@ def _run_candidate(candidate: dict, skill_dir: str | Path) -> dict: "prompt_variant": variant, "language": language, "latency_ms": elapsed_ms, - "adequacy_score": round(score, 3), - "adequacy_rating": _adequacy_rating(score), + "adequacy_score": round(combined_score, 3), + "adequacy_rating": _adequacy_rating(combined_score), + "grounded_score": round(score, 3), + "multilingual_score": round(multilingual_score, 3), + "round_trip": round_trip, "response_preview": response.text[:280], - "notes": notes, + "notes": [*notes, *multilingual_notes, *round_trip["warnings"]], } ) diff --git a/src/didactopus/language_support.py b/src/didactopus/language_support.py index 8ae297a..874828c 100644 --- a/src/didactopus/language_support.py +++ b/src/didactopus/language_support.py @@ -14,11 +14,84 @@ LANGUAGE_LABELS = { "ja": "Japanese", } +UI_STRINGS = { + "en": { + "didactopus_learner_session": "Didactopus Learner Session", + "learner_goal": "Learner goal", + "source_language": "Source language", + "output_language": "Output language", + "study_plan": "Study Plan", + "conversation": "Conversation", + "evaluation_summary": "Evaluation Summary", + "verdict": "Verdict", + "aggregated_dimensions": "Aggregated dimensions", + "follow_up": "Follow-up", + "status": "Status", + "prerequisites": "Prerequisites", + "supporting_lessons": "Supporting lessons", + "grounding_fragments": "Grounding fragments", + "source_fragment": "Source fragment", + "skip_to_session": "Skip to learner session", + "screen_reader_note": "This page is structured for keyboard and screen-reader use. It presents the learner goal, study plan, grounded source fragments, and conversation turns in reading order.", + }, + "es": { + "didactopus_learner_session": "Sesion de aprendizaje de Didactopus", + "learner_goal": "Objetivo del aprendiz", + "source_language": "Idioma de origen", + "output_language": "Idioma de salida", + "study_plan": "Plan de estudio", + "conversation": "Conversacion", + "evaluation_summary": "Resumen de evaluacion", + "verdict": "Veredicto", + "aggregated_dimensions": "Dimensiones agregadas", + "follow_up": "Siguiente paso", + "status": "Estado", + "prerequisites": "Prerrequisitos", + "supporting_lessons": "Lecciones de apoyo", + "grounding_fragments": "Fragmentos de fundamento", + "source_fragment": "Fragmento de fuente", + "skip_to_session": "Saltar a la sesion de aprendizaje", + "screen_reader_note": "Esta pagina esta estructurada para uso con teclado y lector de pantalla. Presenta el objetivo del aprendiz, el plan de estudio, los fragmentos de fundamento y los turnos de conversacion en orden de lectura.", + }, + "fr": { + "didactopus_learner_session": "Session d'apprentissage Didactopus", + "learner_goal": "Objectif de l'apprenant", + "source_language": "Langue source", + "output_language": "Langue de sortie", + "study_plan": "Plan d'etude", + "conversation": "Conversation", + "evaluation_summary": "Resume de l'evaluation", + "verdict": "Verdict", + "aggregated_dimensions": "Dimensions agregees", + "follow_up": "Etape suivante", + "status": "Statut", + "prerequisites": "Prerquis", + "supporting_lessons": "Lecons de soutien", + "grounding_fragments": "Fragments d'ancrage", + "source_fragment": "Fragment source", + "skip_to_session": "Aller a la session d'apprentissage", + "screen_reader_note": "Cette page est structuree pour une utilisation au clavier et avec un lecteur d'ecran. Elle presente l'objectif de l'apprenant, le plan d'etude, les fragments d'ancrage et les tours de conversation dans l'ordre de lecture.", + }, +} + +LANGUAGE_MARKERS = { + "es": (" el ", " la ", " de ", " y ", " que ", " para ", " no ", "una ", "un "), + "fr": (" le ", " la ", " de ", " et ", " que ", " pour ", " pas ", "une ", "un "), + "de": (" der ", " die ", " und ", " nicht ", " ist ", " fur "), + "pt": (" o ", " a ", " de ", " e ", " para ", " nao "), + "it": (" il ", " la ", " di ", " e ", " per ", " non "), +} + def language_label(language: str) -> str: return LANGUAGE_LABELS.get(language, language) +def ui_text(key: str, language: str) -> str: + table = UI_STRINGS.get(language, UI_STRINGS["en"]) + return table.get(key, UI_STRINGS["en"].get(key, key)) + + def response_language_instruction(language: str, source_language: str = "en") -> str: if language == source_language: return "" @@ -26,3 +99,18 @@ def response_language_instruction(language: str, source_language: str = "en") -> f" Respond in {language_label(language)}. Preserve key source-grounded concepts and caveats faithfully, " f"and make clear when you are explaining material whose source language is {language_label(source_language)}." ) + + +def language_alignment_score(text: str, language: str) -> tuple[float, list[str]]: + if language == "en": + return 1.0, [] + lowered = f" {text.lower()} " + markers = LANGUAGE_MARKERS.get(language) + if markers is None: + return 0.5, [f"No language-specific heuristic markers are defined for {language} yet."] + marker_hits = sum(1 for marker in markers if marker in lowered) + if marker_hits >= 2: + return 1.0, [] + if marker_hits == 1: + return 0.6, [f"Only weak evidence that the response is actually in {language_label(language)}."] + return 0.0, [f"Response does not appear to be in {language_label(language)}."] diff --git a/src/didactopus/learner_accessibility.py b/src/didactopus/learner_accessibility.py index 9d175b5..72c0094 100644 --- a/src/didactopus/learner_accessibility.py +++ b/src/didactopus/learner_accessibility.py @@ -4,36 +4,38 @@ import html import json from pathlib import Path +from .language_support import ui_text def _escape(value: object) -> str: return html.escape(str(value)) def build_accessible_session_text(session: dict) -> str: + language = str(session.get("output_language", "en")) lines = [ - "Didactopus Learner Session", + ui_text("didactopus_learner_session", language), "", - f"Learner goal: {session.get('goal', '')}", - f"Source language: {session.get('source_language', 'en')}", - f"Output language: {session.get('output_language', 'en')}", + f"{ui_text('learner_goal', language)}: {session.get('goal', '')}", + f"{ui_text('source_language', language)}: {session.get('source_language', 'en')}", + f"{ui_text('output_language', language)}: {session.get('output_language', 'en')}", "", - "Study plan:", + f"{ui_text('study_plan', language)}:", ] for index, step in enumerate(session.get("study_plan", {}).get("steps", []), start=1): lines.extend( [ f"{index}. {step.get('title', '')}", - f" Status: {step.get('status', '')}", - f" Prerequisites: {', '.join(step.get('prerequisite_titles', []) or ['none explicit'])}", - f" Supporting lessons: {', '.join(step.get('supporting_lessons', []) or ['none listed'])}", + f" {ui_text('status', language)}: {step.get('status', '')}", + f" {ui_text('prerequisites', language)}: {', '.join(step.get('prerequisite_titles', []) or ['none explicit'])}", + f" {ui_text('supporting_lessons', language)}: {', '.join(step.get('supporting_lessons', []) or ['none listed'])}", ] ) for fragment in step.get("source_fragments", [])[:2]: - lines.append(f" Source fragment ({fragment.get('kind', 'fragment')}): {fragment.get('text', '')}") + lines.append(f" {ui_text('source_fragment', language)} ({fragment.get('kind', 'fragment')}): {fragment.get('text', '')}") lines.extend( [ "", - "Conversation:", + f"{ui_text('conversation', language)}:", ] ) for turn in session.get("turns", []): @@ -47,26 +49,27 @@ def build_accessible_session_text(session: dict) -> str: evaluation = session.get("evaluation", {}) lines.extend( [ - "Evaluation summary:", - f"Verdict: {evaluation.get('verdict', '')}", - f"Aggregated dimensions: {json.dumps(evaluation.get('aggregated', {}), sort_keys=True)}", - f"Follow-up: {evaluation.get('follow_up', '')}", + f"{ui_text('evaluation_summary', language)}:", + f"{ui_text('verdict', language)}: {evaluation.get('verdict', '')}", + f"{ui_text('aggregated_dimensions', language)}: {json.dumps(evaluation.get('aggregated', {}), sort_keys=True)}", + f"{ui_text('follow_up', language)}: {evaluation.get('follow_up', '')}", ] ) return "\n".join(lines).strip() + "\n" def build_accessible_session_html(session: dict) -> str: + language = str(session.get("output_language", "en")) steps = session.get("study_plan", {}).get("steps", []) turns = session.get("turns", []) evaluation = session.get("evaluation", {}) body = [ "", - '', + f'', "", '', '', - "Evaluation Summary
-Verdict: needs_revision
-Aggregated dimensions: {"correctness": 0.6000000000000001, "critique": 0.6499999999999999, "explanation": 0.85}
-Follow-up: Rework the answer so it states the equality/relationship explicitly and explains why it matters.
+Resumen de evaluacion
+Veredicto: needs_revision
+Dimensiones agregadas: {"correctness": 0.6000000000000001, "critique": 0.6499999999999999, "explanation": 0.85}
+Siguiente paso: Rework the answer so it states the equality/relationship explicitly and explains why it matters.
Didactopus Learner Session ", + f"{_escape(ui_text('didactopus_learner_session', language))} ", "", "", "", - 'Skip to learner session', + f'{_escape(ui_text("skip_to_session", language))}', '', ' ", "", diff --git a/src/didactopus/model_bench.py b/src/didactopus/model_bench.py index 654ba80..c1d8f50 100644 --- a/src/didactopus/model_bench.py +++ b/src/didactopus/model_bench.py @@ -5,9 +5,10 @@ from pathlib import Path from time import perf_counter from .config import load_config -from .language_support import response_language_instruction +from .language_support import language_alignment_score, response_language_instruction from .learner_session import _grounding_block from .model_provider import ModelProvider +from .multilingual_qa import multilingual_qa_for_text, round_trip_warning_for_phrases from .ocw_skill_agent_demo import build_skill_grounded_study_plan, evaluate_submission_with_skill, load_ocw_skill_context from .role_prompts import system_prompt_for_role @@ -77,6 +78,47 @@ def _adequacy_rating(score: float) -> str: return "inadequate" +def _multilingual_score(role: str, text: str, language: str, qa_spec: dict | None = None) -> tuple[float, list[str]]: + score, notes = language_alignment_score(text, language) + if language == "en": + return score, notes + qa_score = 1.0 + qa_notes: list[str] = [] + if qa_spec: + qa_result = multilingual_qa_for_text(qa_spec, language=language, text=text) + qa_notes = list(qa_result["warnings"]) + summary = qa_result["summary"] + denominator = summary["required_term_count"] + summary["required_caveat_count"] + summary["forbidden_confusion_count"] + numerator = summary["matched_term_count"] + summary["matched_caveat_count"] + ( + summary["forbidden_confusion_count"] - summary["confusion_hit_count"] + ) + if denominator > 0: + qa_score = max(0.0, min(1.0, numerator / denominator)) + role_lower = role.lower() + if role_lower == "mentor" and "entropy" not in text.lower(): + qa_notes = list(qa_notes) + qa_notes.append("Did not visibly preserve a key grounded concept term in multilingual output.") + qa_score = max(0.0, qa_score - 0.2) + combined = (score * 0.5) + (qa_score * 0.5) + return combined, [*notes, *qa_notes] + + +def _round_trip_phrases(qa_spec: dict | None, language: str) -> list[str]: + if not qa_spec or language == "en": + return [] + target = (qa_spec.get("targets", {}) or {}).get(language, {}) or {} + phrases: list[str] = [] + for entry in target.get("required_terms", []) or []: + accepted = entry.get("accepted", []) or [] + if accepted: + phrases.append(str(accepted[0])) + for entry in target.get("required_caveats", []) or []: + accepted = entry.get("accepted", []) or [] + if accepted: + phrases.append(str(accepted[0])) + return phrases[:6] + + def _hardware_profile( *, profile_name: str, @@ -163,7 +205,24 @@ def run_model_benchmark( ) elapsed_ms = round((perf_counter() - started) * 1000.0, 3) score, notes = scorers[role](response.text) - adequacy_scores.append(score) + multilingual_score, multilingual_notes = _multilingual_score(role, response.text, language, context.multilingual_qa) + combined_score = (score * 0.8) + (multilingual_score * 0.2) + round_trip = {"warnings": [], "summary": {"source_phrase_count": 0, "round_trip_warning_count": 0, "drifted_phrases": []}} + if language != "en": + source_phrases = _round_trip_phrases(context.multilingual_qa, language) + if source_phrases: + back_translation = provider.generate( + ( + "Translate the following text into English as faithfully as possible, preserving technical meaning and caveats.\n\n" + f"{response.text}" + ), + role=role, + system_prompt=system_prompt_for_role(role), + temperature=0.0, + max_tokens=220, + ).text + round_trip = round_trip_warning_for_phrases(source_phrases, back_translation) + adequacy_scores.append(combined_score) role_results.append( { "role": role, @@ -171,9 +230,12 @@ def run_model_benchmark( "model_name": response.model_name, "latency_ms": elapsed_ms, "response_preview": response.text[:280], - "adequacy_score": round(score, 3), - "adequacy_rating": _adequacy_rating(score), - "notes": notes, + "adequacy_score": round(combined_score, 3), + "adequacy_rating": _adequacy_rating(combined_score), + "grounded_score": round(score, 3), + "multilingual_score": round(multilingual_score, 3), + "round_trip": round_trip, + "notes": [*notes, *multilingual_notes, *round_trip["warnings"]], } ) diff --git a/src/didactopus/multilingual_qa.py b/src/didactopus/multilingual_qa.py new file mode 100644 index 0000000..6726290 --- /dev/null +++ b/src/didactopus/multilingual_qa.py @@ -0,0 +1,100 @@ +from __future__ import annotations + +from pathlib import Path + +import yaml + + +def _contains_non_negated_pattern(lowered: str, pattern: str) -> bool: + start = lowered.find(pattern) + while start != -1: + prefix = lowered[max(0, start - 4):start] + if not prefix.endswith("no "): + return True + start = lowered.find(pattern, start + 1) + return False + + +def load_multilingual_qa_spec(source_dir: str | Path) -> dict: + source = Path(source_dir) + path = source / "multilingual_qa.yaml" + if not path.exists(): + return {} + return yaml.safe_load(path.read_text(encoding="utf-8")) or {} + + +def multilingual_qa_for_text(spec: dict, *, language: str, text: str) -> dict: + targets = spec.get("targets", {}) or {} + target = targets.get(language, {}) or {} + warnings: list[str] = [] + summary = { + "language": language, + "required_term_count": 0, + "matched_term_count": 0, + "required_caveat_count": 0, + "matched_caveat_count": 0, + "forbidden_confusion_count": 0, + "confusion_hit_count": 0, + } + if not target: + warnings.append(f"No multilingual QA spec is defined for language '{language}'.") + return {"warnings": warnings, "summary": summary} + + lowered = text.lower() + + required_terms = target.get("required_terms", []) or [] + summary["required_term_count"] = len(required_terms) + for term in required_terms: + accepted = [str(item).lower() for item in term.get("accepted", []) or []] + if any(candidate in lowered for candidate in accepted): + summary["matched_term_count"] += 1 + else: + warnings.append(f"Missing required multilingual term '{term.get('id', 'unknown')}' for language '{language}'.") + + required_caveats = target.get("required_caveats", []) or [] + summary["required_caveat_count"] = len(required_caveats) + for caveat in required_caveats: + accepted = [str(item).lower() for item in caveat.get("accepted", []) or []] + if any(candidate in lowered for candidate in accepted): + summary["matched_caveat_count"] += 1 + else: + warnings.append(f"Missing required multilingual caveat '{caveat.get('id', 'unknown')}' for language '{language}'.") + + forbidden_confusions = target.get("forbidden_confusions", []) or [] + summary["forbidden_confusion_count"] = len(forbidden_confusions) + for confusion in forbidden_confusions: + patterns = [str(item).lower() for item in confusion.get("patterns", []) or []] + if any(_contains_non_negated_pattern(lowered, pattern) for pattern in patterns): + summary["confusion_hit_count"] += 1 + warnings.append(f"Detected forbidden multilingual confusion '{confusion.get('id', 'unknown')}' for language '{language}'.") + + return {"warnings": warnings, "summary": summary} + + +def multilingual_qa_for_pack(source_dir: str | Path, *, language: str, text: str) -> dict: + spec = load_multilingual_qa_spec(source_dir) + return multilingual_qa_for_text(spec, language=language, text=text) + + +def round_trip_warning_for_phrases( + source_phrases: list[str], + back_translated_text: str, +) -> dict: + lowered = back_translated_text.lower() + warnings: list[str] = [] + drifted: list[str] = [] + for phrase in source_phrases: + normalized = str(phrase).strip().lower() + if not normalized: + continue + if normalized not in lowered: + warnings.append(f"Round-trip translation did not preserve source phrase '{phrase}'.") + drifted.append(phrase) + return { + "warnings": warnings, + "summary": { + "source_phrase_count": len([phrase for phrase in source_phrases if str(phrase).strip()]), + "round_trip_warning_count": len(warnings), + "drifted_phrases": drifted, + }, + } diff --git a/src/didactopus/multilingual_qa_seed.py b/src/didactopus/multilingual_qa_seed.py new file mode 100644 index 0000000..a168efb --- /dev/null +++ b/src/didactopus/multilingual_qa_seed.py @@ -0,0 +1,148 @@ +from __future__ import annotations + +import json +from pathlib import Path + +import yaml + +from .pack_validator import load_pack_artifacts + + +def _normalize_phrase(text: str) -> str: + return " ".join(str(text).replace(":", " ").replace("-", " ").split()).strip() + + +def _candidate_languages(languages: list[str] | None) -> list[str]: + return list(languages) if languages else ["es", "fr"] + + +def _seed_required_terms(concepts: list[dict]) -> list[dict]: + seeded = [] + seen = set() + for concept in concepts: + title = str(concept.get("title", "")).strip() + concept_id = str(concept.get("id", "")).strip() + if not title or not concept_id: + continue + normalized = _normalize_phrase(title) + if len(normalized.split()) < 2: + continue + if concept_id in seen: + continue + seen.add(concept_id) + seeded.append( + { + "id": concept_id, + "accepted": [normalized], + } + ) + return seeded[:12] + + +def _seed_required_caveats(source_corpus: dict) -> list[dict]: + caveats = [] + seen = set() + for fragment in source_corpus.get("fragments", []) or []: + texts = [fragment.get("text", "")] + texts.extend(fragment.get("objectives", []) or []) + texts.extend(fragment.get("exercises", []) or []) + for text in texts: + lowered = str(text).lower() + if "not identical" in lowered or "differs from" in lowered or "careful interpretation" in lowered: + lesson_title = _normalize_phrase(fragment.get("lesson_title", "lesson")) + caveat_id = lesson_title.lower().replace(" ", "-")[:48] or "caveat" + if caveat_id in seen: + continue + seen.add(caveat_id) + caveats.append( + { + "id": caveat_id, + "accepted": [_normalize_phrase(text)], + } + ) + return caveats[:6] + + +def _seed_forbidden_confusions(required_caveats: list[dict]) -> list[dict]: + confusions = [] + for caveat in required_caveats: + accepted = caveat.get("accepted", []) or [] + if not accepted: + continue + phrase = str(accepted[0]) + lowered = phrase.lower() + if "not identical" in lowered: + confusion = phrase.replace("not identical", "identical") + elif "differs from" in lowered: + confusion = phrase.replace("differs from", "is identical to") + else: + continue + confusions.append( + { + "id": f"{caveat['id']}-confusion", + "patterns": [_normalize_phrase(confusion)], + } + ) + return confusions[:6] + + +def generate_multilingual_qa_seed( + source_dir: str | Path, + *, + languages: list[str] | None = None, +) -> dict: + source_dir = Path(source_dir) + loaded = load_pack_artifacts(source_dir) + if not loaded["ok"]: + raise ValueError(f"Cannot seed multilingual QA for invalid pack directory: {source_dir}") + + concepts = loaded["artifacts"]["concepts"].get("concepts", []) or [] + source_corpus_path = source_dir / "source_corpus.json" + source_corpus = json.loads(source_corpus_path.read_text(encoding="utf-8")) if source_corpus_path.exists() else {"fragments": []} + required_terms = _seed_required_terms(concepts) + required_caveats = _seed_required_caveats(source_corpus) + forbidden_confusions = _seed_forbidden_confusions(required_caveats) + + targets = {} + for language in _candidate_languages(languages): + targets[language] = { + "required_terms": required_terms, + "required_caveats": required_caveats, + "forbidden_confusions": forbidden_confusions, + } + + return { + "source_language": "en", + "generated_by": "didactopus.multilingual_qa_seed", + "review_status": "draft-seed", + "targets": targets, + } + + +def write_multilingual_qa_seed( + source_dir: str | Path, + *, + out_path: str | Path | None = None, + languages: list[str] | None = None, +) -> Path: + source_dir = Path(source_dir) + payload = generate_multilingual_qa_seed(source_dir, languages=languages) + out_path = Path(out_path) if out_path is not None else source_dir / "multilingual_qa.seed.yaml" + out_path.write_text(yaml.safe_dump(payload, sort_keys=False, allow_unicode=False), encoding="utf-8") + return out_path + + +def main() -> None: + import argparse + + parser = argparse.ArgumentParser(description="Generate a starter multilingual QA spec from a Didactopus pack.") + parser.add_argument("pack_dir") + parser.add_argument("--out", default=None) + parser.add_argument("--languages", nargs="*", default=None) + args = parser.parse_args() + out_path = write_multilingual_qa_seed(args.pack_dir, out_path=args.out, languages=args.languages) + print(json.dumps({"written": str(out_path)}, indent=2)) + + +if __name__ == "__main__": + main() diff --git a/src/didactopus/ocw_skill_agent_demo.py b/src/didactopus/ocw_skill_agent_demo.py index 7def545..a9178cb 100644 --- a/src/didactopus/ocw_skill_agent_demo.py +++ b/src/didactopus/ocw_skill_agent_demo.py @@ -8,6 +8,7 @@ import yaml from .evaluator_pipeline import CritiqueEvaluator, LearnerAttempt, RubricEvaluator, SymbolicRuleEvaluator, aggregate, run_pipeline from .graph_retrieval import GraphBundle, lesson_titles_for_concept, prerequisite_titles, source_fragments_for_concept +from .multilingual_qa import load_multilingual_qa_spec @dataclass @@ -21,6 +22,7 @@ class SkillContext: graph_bundle: GraphBundle capability_profile: dict run_summary: dict + multilingual_qa: dict def load_ocw_skill_context(skill_dir: str | Path) -> SkillContext: @@ -54,6 +56,7 @@ def load_ocw_skill_context(skill_dir: str | Path) -> SkillContext: ), capability_profile=json.loads((run_dir / "capability_profile.json").read_text(encoding="utf-8")), run_summary=json.loads((run_dir / "run_summary.json").read_text(encoding="utf-8")), + multilingual_qa=load_multilingual_qa_spec(pack_dir), ) diff --git a/tests/test_arena.py b/tests/test_arena.py index cb5feaa..a4db885 100644 --- a/tests/test_arena.py +++ b/tests/test_arena.py @@ -35,4 +35,6 @@ def test_run_didactopus_arena_writes_outputs(tmp_path: Path) -> None: queue = json.loads((tmp_path / "arena_review_queue.json").read_text(encoding="utf-8")) assert queue assert payload["ranked_candidates"][0]["language"] in {"en", "es", "fr"} + assert "multilingual_score" in payload["ranked_candidates"][0]["role_results"][0] + assert "round_trip" in payload["ranked_candidates"][0]["role_results"][0] assert "LLM Review Summary" in (tmp_path / "arena_report.md").read_text(encoding="utf-8") diff --git a/tests/test_language_support.py b/tests/test_language_support.py new file mode 100644 index 0000000..91b09c8 --- /dev/null +++ b/tests/test_language_support.py @@ -0,0 +1,28 @@ +from didactopus.language_support import language_alignment_score, response_language_instruction, ui_text + + +def test_response_language_instruction_is_empty_for_source_language() -> None: + assert response_language_instruction("en", "en") == "" + + +def test_response_language_instruction_mentions_target_language() -> None: + instruction = response_language_instruction("es", "en") + assert "Spanish" in instruction + assert "English" in instruction + + +def test_ui_text_uses_translated_labels() -> None: + assert ui_text("study_plan", "es") == "Plan de estudio" + assert ui_text("evaluation_summary", "fr") == "Resume de l'evaluation" + + +def test_language_alignment_score_detects_non_english_markers() -> None: + score, notes = language_alignment_score("La entropia y la capacidad del canal se comparan para el aprendiz.", "es") + assert score == 1.0 + assert notes == [] + + +def test_language_alignment_score_flags_wrong_language() -> None: + score, notes = language_alignment_score("This response is still entirely in English.", "es") + assert score == 0.0 + assert notes diff --git a/tests/test_learner_accessibility.py b/tests/test_learner_accessibility.py index 329f5c3..bc5e3bc 100644 --- a/tests/test_learner_accessibility.py +++ b/tests/test_learner_accessibility.py @@ -30,9 +30,23 @@ def test_accessible_session_text_is_linearized() -> None: assert "Learner goal:" in text assert "Source language:" in text assert "Output language:" in text - assert "Study plan:" in text + assert "Study Plan:" in text assert "Conversation:" in text - assert "Evaluation summary:" in text + assert "Evaluation Summary:" in text + + +def test_accessible_session_outputs_localize_fixed_labels() -> None: + root = Path(__file__).resolve().parents[1] + payload = run_learner_session_demo( + root / "configs" / "config.example.yaml", + root / "skills" / "ocw-information-entropy-agent", + language="es", + ) + html = build_accessible_session_html(payload) + text = build_accessible_session_text(payload) + assert "Sesion de aprendizaje de Didactopus" in html + assert "Plan de estudio" in html + assert "Objetivo del aprendiz:" in text def test_render_accessible_session_outputs_writes_files(tmp_path: Path) -> None: diff --git a/tests/test_model_bench.py b/tests/test_model_bench.py index 9064506..89e1e34 100644 --- a/tests/test_model_bench.py +++ b/tests/test_model_bench.py @@ -43,3 +43,15 @@ def test_model_benchmark_captures_response_preview_and_latency(tmp_path) -> None assert result["latency_ms"] >= 0.0 assert result["response_preview"] assert "adequacy_score" in result + assert "round_trip" in result + + +def test_model_benchmark_penalizes_stub_for_non_english_output(tmp_path) -> None: + payload = run_model_benchmark( + config_path="configs/config.example.yaml", + skill_dir="skills/ocw-information-entropy-agent", + out_dir=tmp_path, + language="es", + ) + assert payload["context"]["output_language"] == "es" + assert any(result["multilingual_score"] < 1.0 for result in payload["role_results"]) diff --git a/tests/test_multilingual_qa.py b/tests/test_multilingual_qa.py new file mode 100644 index 0000000..2f4ab60 --- /dev/null +++ b/tests/test_multilingual_qa.py @@ -0,0 +1,52 @@ +from pathlib import Path + +from didactopus.multilingual_qa import ( + load_multilingual_qa_spec, + multilingual_qa_for_pack, + multilingual_qa_for_text, + round_trip_warning_for_phrases, +) + + +def test_load_multilingual_qa_spec_reads_ocw_pack() -> None: + spec = load_multilingual_qa_spec("domain-packs/mit-ocw-information-entropy") + assert spec["source_language"] == "en" + assert "es" in spec["targets"] + assert "fr" in spec["targets"] + + +def test_multilingual_qa_for_text_accepts_spanish_preservation() -> None: + spec = load_multilingual_qa_spec("domain-packs/mit-ocw-information-entropy") + result = multilingual_qa_for_text( + spec, + language="es", + text="La entropía de Shannon no es idéntica a la entropía termodinámica, y la capacidad del canal impone otro límite.", + ) + assert result["summary"]["matched_term_count"] >= 2 + assert result["summary"]["matched_caveat_count"] == 1 + assert result["summary"]["confusion_hit_count"] == 0 + + +def test_multilingual_qa_for_text_flags_confusion() -> None: + spec = load_multilingual_qa_spec("domain-packs/mit-ocw-information-entropy") + result = multilingual_qa_for_text( + spec, + language="es", + text="La entropía de Shannon es idéntica a la entropía termodinámica.", + ) + assert result["summary"]["confusion_hit_count"] == 1 + assert any("forbidden multilingual confusion" in warning.lower() for warning in result["warnings"]) + + +def test_multilingual_qa_for_pack_handles_missing_spec(tmp_path: Path) -> None: + result = multilingual_qa_for_pack(tmp_path, language="es", text="Texto de prueba.") + assert any("no multilingual qa spec" in warning.lower() for warning in result["warnings"]) + + +def test_round_trip_warning_for_phrases_flags_drift() -> None: + result = round_trip_warning_for_phrases( + ["Shannon entropy", "channel capacity"], + "This back translation only preserved Shannon entropy.", + ) + assert result["summary"]["round_trip_warning_count"] == 1 + assert result["summary"]["drifted_phrases"] == ["channel capacity"] diff --git a/tests/test_multilingual_qa_seed.py b/tests/test_multilingual_qa_seed.py new file mode 100644 index 0000000..452f44d --- /dev/null +++ b/tests/test_multilingual_qa_seed.py @@ -0,0 +1,27 @@ +from pathlib import Path + +import yaml + +from didactopus.multilingual_qa_seed import generate_multilingual_qa_seed, write_multilingual_qa_seed + + +def test_generate_multilingual_qa_seed_uses_pack_content() -> None: + payload = generate_multilingual_qa_seed("domain-packs/mit-ocw-information-entropy", languages=["es"]) + assert payload["source_language"] == "en" + assert payload["review_status"] == "draft-seed" + assert "es" in payload["targets"] + target = payload["targets"]["es"] + assert target["required_terms"] + assert any(item["id"] == "shannon-entropy" for item in target["required_terms"]) + assert target["required_caveats"] + + +def test_write_multilingual_qa_seed_writes_yaml(tmp_path: Path) -> None: + out = write_multilingual_qa_seed( + "domain-packs/mit-ocw-information-entropy", + out_path=tmp_path / "multilingual_qa.seed.yaml", + languages=["es", "fr"], + ) + assert out.exists() + written = yaml.safe_load(out.read_text(encoding="utf-8")) + assert set(written["targets"]) == {"es", "fr"}', - ' ", 'Didactopus Learner Session
', - 'This page is structured for keyboard and screen-reader use. It presents the learner goal, study plan, grounded source fragments, and conversation turns in reading order.
', - f"Learner goal: {_escape(session.get('goal', ''))}
", - f"Source language: {_escape(session.get('source_language', 'en'))}
", - f"Output language: {_escape(session.get('output_language', 'en'))}
", + f'{_escape(ui_text("didactopus_learner_session", language))}
', + f'{_escape(ui_text("screen_reader_note", language))}
', + f"{_escape(ui_text('learner_goal', language))}: {_escape(session.get('goal', ''))}
", + f"{_escape(ui_text('source_language', language))}: {_escape(session.get('source_language', 'en'))}
", + f"{_escape(ui_text('output_language', language))}: {_escape(session.get('output_language', 'en'))}
", "', - ' ", 'Study Plan
', + f'{_escape(ui_text("study_plan", language))}
', '- ',
]
for step in steps:
body.append("
- ")
body.append(f"
{_escape(step.get('title', ''))}
") - body.append(f"Status: {_escape(step.get('status', ''))}
") + body.append(f"{_escape(ui_text('status', language))}: {_escape(step.get('status', ''))}
") body.append( - f"Prerequisites: {_escape(', '.join(step.get('prerequisite_titles', []) or ['none explicit']))}
" + f"{_escape(ui_text('prerequisites', language))}: {_escape(', '.join(step.get('prerequisite_titles', []) or ['none explicit']))}
" ) body.append( - f"Supporting lessons: {_escape(', '.join(step.get('supporting_lessons', []) or ['none listed']))}
" + f"{_escape(ui_text('supporting_lessons', language))}: {_escape(', '.join(step.get('supporting_lessons', []) or ['none listed']))}
" ) fragments = step.get("source_fragments", [])[:2] if fragments: - body.append("Grounding fragments:
") + body.append(f"{_escape(ui_text('grounding_fragments', language))}:
") body.append("- ")
for fragment in fragments:
body.append(
@@ -123,7 +126,7 @@ def build_accessible_session_html(session: dict) -> str:
"
', - ' ", 'Conversation
', + f'{_escape(ui_text("conversation", language))}
', ] ) for turn in turns: @@ -136,10 +139,10 @@ def build_accessible_session_html(session: dict) -> str: [ "', - ' ", "Evaluation Summary
', - f"Verdict: {_escape(evaluation.get('verdict', ''))}
", - f"Aggregated dimensions: {_escape(json.dumps(evaluation.get('aggregated', {}), sort_keys=True))}
", - f"Follow-up: {_escape(evaluation.get('follow_up', ''))}
", + f'{_escape(ui_text("evaluation_summary", language))}
', + f"{_escape(ui_text('verdict', language))}: {_escape(evaluation.get('verdict', ''))}
", + f"{_escape(ui_text('aggregated_dimensions', language))}: {_escape(json.dumps(evaluation.get('aggregated', {}), sort_keys=True))}
", + f"{_escape(ui_text('follow_up', language))}: {_escape(evaluation.get('follow_up', ''))}
", " - ")
body.append(f"