scripts/smoke_test.py: end-to-end validation script covering health, cluster
state, model catalog, route resolution, non-streaming chat (role + direct
asset), streaming chat (SSE validation + reasoning-strip check), embeddings,
and Ollama discovery metrics. Auto-detects targets from /v1/models; accepts
--chat-role, --chat-asset, --embed-asset overrides. Exit 0 if all pass/skip,
exit 1 on any failure.
configs/node.singlebox.ollama.example.yaml: add discover_protocol: "ollama"
to both services so the config works out of the box for Ollama discovery
testing without manual edits.
docs/llm_demo.md: update Current Readiness to reflect v1 complete feature set;
add Smoke Test section; add New Capabilities section covering streaming,
routing strategies, Ollama discovery, and role catalogs.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Remove RoutingConfig.default_strategy: the field was never read by
resolve_route() or any other code path, creating a false impression
that routing behaviour was configurable. Also removed from all three
example config files.
Fix _benchmark_quality_score: the previous implementation used max()
for correctness signals and then *added* speed bonuses on top, allowing
the score to accumulate past 1.0 before the final clamp. Speed bonuses
were therefore dead weight whenever pass_rate or quality_score was
already ≥ 0.65. Replace with an explicit weighted average: correctness
(pass_rate / quality_score) carries 0.65 and a normalised speed
component carries 0.35. When no correctness signal is available the
speed component carries full weight. Score is always in [0, 1] without
needing a clamp.
Add test_benchmark_quality_score_stays_bounded_and_weighted to lock in
the corrected behaviour: bounded at 1.0, correctness-dominant, speed-
only case non-zero, empty input zero, speed bonus never hurts.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>