16 lines
811 B
Markdown
16 lines
811 B
Markdown
# Didactopus Arena Report
|
|
|
|
- Candidates: 3
|
|
|
|
## Rankings
|
|
- `stub-baseline` via `stub` / prompt variant `baseline`: borderline (0.733), language `en`
|
|
- `stub-strict-grounding` via `stub` / prompt variant `strict_grounding`: inadequate (0.547), language `es`
|
|
- `stub-trust-preserving` via `stub` / prompt variant `trust_preserving`: inadequate (0.547), language `fr`
|
|
|
|
## Human Review Queue
|
|
- `stub-baseline`: needs_human_review=True, weak_roles=['mentor', 'evaluator']
|
|
- `stub-strict-grounding`: needs_human_review=True, weak_roles=['mentor', 'evaluator']
|
|
- `stub-trust-preserving`: needs_human_review=True, weak_roles=['mentor', 'evaluator']
|
|
|
|
## LLM Review Summary
|
|
[stubbed-response] [mentor] Review these Didactopus arena results for a human reviewer. Rank the strongest candidates, identify likely prompt improv |