Commit Graph

5 Commits

Author SHA1 Message Date
welberr b4e5a1af7d P0: remove dead default_strategy field; fix benchmark quality score
Remove RoutingConfig.default_strategy: the field was never read by
resolve_route() or any other code path, creating a false impression
that routing behaviour was configurable. Also removed from all three
example config files.

Fix _benchmark_quality_score: the previous implementation used max()
for correctness signals and then *added* speed bonuses on top, allowing
the score to accumulate past 1.0 before the final clamp. Speed bonuses
were therefore dead weight whenever pass_rate or quality_score was
already ≥ 0.65. Replace with an explicit weighted average: correctness
(pass_rate / quality_score) carries 0.65 and a normalised speed
component carries 0.35. When no correctness signal is available the
speed component carries full weight. Score is always in [0, 1] without
needing a clamp.

Add test_benchmark_quality_score_stays_bounded_and_weighted to lock in
the corrected behaviour: bounded at 1.0, correctness-dominant, speed-
only case non-zero, empty input zero, speed bonus never hurts.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-27 10:45:04 -04:00
welberr a76c7e81f4 Revise architecture/roadmap docs and add LLM evaluation guide
- architecture.md: rewrite to describe the actual running system; remove
  design-phase repo-naming discussion and initial-implementation-sequence
  list; add data-flow diagram, scoring weights table, API status table
- roadmap.md: replace aspirational list with concrete completed/gap/next
  structure; document four confirmed implementation gaps (transcription
  stub, strategy field ignored, fallback_roles unimplemented, benchmark
  quality score additive overflow); prioritise fixes as P0/P1/P2/P3
- docs/local_llm_evaluation.md: new document; role taxonomy (tier 1–3),
  hardware inventory template, candidate model suggestions, three-phase
  evaluation protocol, GenieHive integration steps, results template,
  notes on Qwen3/Mistral/DeepSeek/Ollama embedding path quirks

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-27 09:25:51 -04:00
welberr e36650a017 Add benchmarked route matching and request shaping 2026-04-07 14:45:32 -04:00
welberr b9270df3e8 Initial commit 2026-04-07 13:17:28 -04:00
welsberr dabbebd3ba Initial commit 2026-04-07 13:10:24 -04:00