7.0 KiB
GenieHive Roadmap
Last updated: 2026-04-27
What Is Complete
The v1 core is implemented and tested.
Registry and cluster control:
- SQLite-backed registry with hosts, services, roles, and benchmark samples
- Node registration and heartbeat protocol with auto-re-registration on 404
- Role catalog loading from YAML
- Route resolution: direct asset/service match → role resolution → clear failure
Client-facing API:
GET /v1/models— OpenAI-compatible model list with rich metadata (loaded state, latency hints, offload classification, role aliases)POST /v1/chat/completions— proxies to upstream with request policy applicationPOST /v1/embeddings— proxies to upstream
Request policy system:
- Body defaults and overrides via deep merge
- System prompt injection (prepend / append / replace)
- Per-asset and per-role policies, merged with role winning on prompts
- Qwen3 / Qwen3.5 auto-detection with
enable_thinking: falseapplied automatically
Route matching and scoring:
POST /v1/cluster/routes/match— scored candidate list for role and service targets- Signals: text overlap, preferred family, runtime (loaded state, latency, throughput, queue depth), benchmark (workload overlap, quality score)
GET /v1/cluster/routes/resolve— quick single-model resolution
Benchmark infrastructure:
- Built-in workloads:
chat.short_reasoning,chat.concise_support run_benchmark_workload.pyexecutes workloads and emits a JSON reportingest_benchmark_report.pyposts results to the control plane- Benchmark samples feed the route scoring pipeline
Operator inspection:
GET /v1/cluster/hosts,/services,/roles,/benchmarks,/health
Auth:
- Client API key (
X-Api-Key) and node registration key (X-GenieHive-Node-Key) - Empty key lists disable auth for development
Tests:
- Registry, chat proxy, node inventory, benchmark runner, full demo flow
- All passing
Known Gaps and Issues
These are confirmed gaps in the current implementation, not aspirational items.
1. Transcription endpoint not implemented
POST /v1/audio/transcriptions is listed in the architecture and wired into
main.py, but there is no upstream proxy handler for it. upstream.py has no
transcriptions() method. The endpoint currently returns nothing useful.
2. Routing strategy field is ignored
RoutingConfig.default_strategy exists in config.py (default: "loaded_first"),
but resolve_route() in registry.py does not read it. There is effectively only
one strategy. The field is misleading.
3. Role fallback chain is not implemented
RoutingPolicy.fallback_roles is defined in models.py and appears in the schema
docs, but resolve_route() never consults it. A role that fails to match any service
fails outright rather than trying its fallbacks.
4. _benchmark_quality_score can exceed 1.0 before clamping
pass_rate and quality_score are taken as max(), then tokens_per_sec and
ttft_ms are added on top. A service with pass_rate=1.0, fast tokens, and low
TTFT accumulates a score of up to 1.6 before the final min(1.0, quality) clamp.
This means the additive bonuses have no effect once pass_rate or quality_score is
already high, which is probably not the intended behavior.
5. Health is self-reported only
Service health (healthy / unhealthy) comes entirely from node-reported state.
The control plane does not probe upstream endpoints. A service can appear healthy
while its endpoint is unreachable.
6. No active model discovery from upstream services
The node agent scans for .gguf files on disk and reads static service config.
It does not query running Ollama or vLLM instances for their loaded model list.
A freshly-pulled Ollama model will not appear until the node config is updated
and the agent restarted.
7. docs/architecture.md duplicates GENIEWARREN_SPEC.md
architecture.md contains the repo-naming rationale, name alternatives, and
implementation sequence list that are only meaningful in a design/proposal context.
These are noise in a reference architecture document.
Immediate Next Work (Priority Order)
P0 — Fix confirmed bugs
-
Remove the misleading
default_strategyfield or implement a dispatch table so the config field actually selects behavior. Simplest fix: delete the field and the dead config surface until a second strategy is implemented. -
Fix
_benchmark_quality_scoreso additive bonuses apply only when nopass_rate/quality_scoreis available, or restructure as a weighted average so the components don't stack additively.
P1 — Complete stated v1 scope
-
Implement transcription proxy — add
upstream.transcriptions()and wire the handler inchat.pyandmain.py. -
Implement role fallback chain — when
resolve_route()finds no matching service for a role, walkfallback_rolesin order before failing.
P2 — Close the most important self-reported-only gaps
-
Add active health probing — the control plane should periodically probe registered service endpoints (a lightweight
GET /healthorGET /v1/modelsis sufficient) and update health state independently of node heartbeats. -
Add upstream model discovery for Ollama — query
GET /api/tags(Ollama) orGET /v1/models(OpenAI-compatible) from the node agent and merge loaded model names into the service's asset list. This enables dynamic model tracking without config restarts.
P3 — Documentation cleanup
-
Revise
architecture.md— remove the design-phase repo-naming rationale and first-implementation-sequence list; replace with a description of the actual running system (the four layers as implemented, data flow diagram if possible). -
Update
roadmap.md— this file (done).
Near-Term Milestones (After P0–P3)
- Live LLM demo — run control + node against a real upstream (Ollama or llama.cpp) and document the end-to-end flow, including chat via role and direct asset addressing
- Validate Codex-friendly
/v1/modelsoffload — testGET /v1/modelsas a programmatic service catalog for a Claude Code or Codex client selecting a GenieHive-hosted model for lower-complexity subtasks - Richer node metrics — queue depth, in-flight count, and rolling performance averages reported from node to control on every heartbeat
- Second routing strategy — implement
round_robinorleast_loadedas a second selectable strategy, then makedefault_strategyactually dispatch
V1.5 Scope (Not Yet Started)
- mTLS between control plane and node agents
- Scoped client tokens (read-only vs. operator vs. admin)
- Active load-aware model swapping (trigger unload/load on a node based on demand)
- Image and TTS generation adapter stubs
- Streaming response passthrough for chat completions
Non-Goals (Unchanged from Original Spec)
- Peer-to-peer consensus
- Autonomous global model swapping across many nodes
- Full WAN zero-trust platform
- Distributed vector database management
- Billing or multi-tenant quota accounting