Commit Graph

9 Commits

Author SHA1 Message Date
welsberr 2355cf8114 Add scientific translator role 2026-04-29 11:33:59 -04:00
wesley 9276505339 Documentation of translation processes 2026-04-29 15:30:19 +00:00
welsberr e4f8b14437 Add smoke test, enable Ollama discovery in singlebox config, update demo doc
scripts/smoke_test.py: end-to-end validation script covering health, cluster
state, model catalog, route resolution, non-streaming chat (role + direct
asset), streaming chat (SSE validation + reasoning-strip check), embeddings,
and Ollama discovery metrics. Auto-detects targets from /v1/models; accepts
--chat-role, --chat-asset, --embed-asset overrides. Exit 0 if all pass/skip,
exit 1 on any failure.

configs/node.singlebox.ollama.example.yaml: add discover_protocol: "ollama"
to both services so the config works out of the box for Ollama discovery
testing without manual edits.

docs/llm_demo.md: update Current Readiness to reflect v1 complete feature set;
add Smoke Test section; add New Capabilities section covering streaming,
routing strategies, Ollama discovery, and role catalogs.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-27 15:13:31 -04:00
welberr e2b1000198 P1–P2 complete: routing strategies, streaming, discovery, observed metrics + role catalogs
Control plane:
- fallback_roles chain in resolve_route() with cycle protection
- round_robin and least_loaded routing strategies; default_strategy dispatches all three
- Streaming chat completions: async generator, eager route resolution, SSE reasoning-strip
- POST /v1/audio/transcriptions proxy (multipart, dedicated httpx path)
- ServiceProber background task: probes /health, falls back to /v1/models for vLLM
- ServiceObserved gains loaded_model_count and vram_used_bytes
- _runtime_signals exposes loaded_model_count to route scoring

Node agent:
- discover_protocol: "ollama"|"openai"|null per-service config field
- discovery.py: discover_ollama_assets (loaded: False), _get_ollama_ps_models helper,
  query_ollama_ps, discover_openai_models, enrich_service_assets (two-phase Ollama,
  corrects stale loaded state, populates observed metrics from /api/ps)
- Heartbeat zips service dicts with config to pass protocol; allocates discovery client
  only when needed

Tests: 47 passing (up from 19)

Role catalogs (example configs):
- roles.surgical-team.example.yaml  — Brooks/Mills surgical team (surg_ prefix, 9 roles)
- roles.belbin.example.yaml         — Belbin team roles (belbin_ prefix, 9 roles)
- roles.sixhats.example.yaml        — De Bono Six Thinking Hats (sixhats_ prefix, 6 roles)
- roles.disney.example.yaml         — Disney creative strategy (disney_ prefix, 3 roles)
- roles.xp.example.yaml             — XP team roles (xp_ prefix, 5 roles)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-27 14:12:54 -04:00
welberr b4e5a1af7d P0: remove dead default_strategy field; fix benchmark quality score
Remove RoutingConfig.default_strategy: the field was never read by
resolve_route() or any other code path, creating a false impression
that routing behaviour was configurable. Also removed from all three
example config files.

Fix _benchmark_quality_score: the previous implementation used max()
for correctness signals and then *added* speed bonuses on top, allowing
the score to accumulate past 1.0 before the final clamp. Speed bonuses
were therefore dead weight whenever pass_rate or quality_score was
already ≥ 0.65. Replace with an explicit weighted average: correctness
(pass_rate / quality_score) carries 0.65 and a normalised speed
component carries 0.35. When no correctness signal is available the
speed component carries full weight. Score is always in [0, 1] without
needing a clamp.

Add test_benchmark_quality_score_stays_bounded_and_weighted to lock in
the corrected behaviour: bounded at 1.0, correctness-dominant, speed-
only case non-zero, empty input zero, speed bonus never hurts.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-27 10:45:04 -04:00
welberr a76c7e81f4 Revise architecture/roadmap docs and add LLM evaluation guide
- architecture.md: rewrite to describe the actual running system; remove
  design-phase repo-naming discussion and initial-implementation-sequence
  list; add data-flow diagram, scoring weights table, API status table
- roadmap.md: replace aspirational list with concrete completed/gap/next
  structure; document four confirmed implementation gaps (transcription
  stub, strategy field ignored, fallback_roles unimplemented, benchmark
  quality score additive overflow); prioritise fixes as P0/P1/P2/P3
- docs/local_llm_evaluation.md: new document; role taxonomy (tier 1–3),
  hardware inventory template, candidate model suggestions, three-phase
  evaluation protocol, GenieHive integration steps, results template,
  notes on Qwen3/Mistral/DeepSeek/Ollama embedding path quirks

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-27 09:25:51 -04:00
welberr e36650a017 Add benchmarked route matching and request shaping 2026-04-07 14:45:32 -04:00
welberr b9270df3e8 Initial commit 2026-04-07 13:17:28 -04:00
welsberr dabbebd3ba Initial commit 2026-04-07 13:10:24 -04:00