# GenieHive Architecture Status: proposed v1 architecture Drafted: 2026-04-05 ## Repo Name Chosen name: `GenieHive` Why this name: - suggestive: "genie" implies generative AI services, "hive" implies a cooperating cluster - accessible: easy to say, remember, and explain - whimsical enough to feel like a project name rather than a dry infrastructure label Tradeoff: - `GenieHive` is less search-distinct than `Geniewarren` because `hive` is a common product metaphor ## Mission GenieHive is a local-first control plane for heterogeneous generative AI services running across one or more hosts. It should: - register hosts and their available services - expose a stable client-facing API - track health, capacity, and observed performance - support direct model addressing and higher-level role addressing - route requests to healthy loaded services first - optionally coordinate loading or swapping when policy allows - remain practical for a small self-hosted deployment with two hosts ## Non-Goals For V1 Out of scope initially: - peer-to-peer consensus - autonomous global model swapping across many nodes - full WAN zero-trust platform engineering - image and TTS generation orchestration - distributed vector database management - billing or multi-tenant quota accounting ## Architectural Position GenieHive is not just an OpenAI-compatible gateway. It is a control plane with these layers: 1. Control API - authoritative registry - routing and scheduling - role catalog - operator inspection 2. Node Agent - host discovery - service discovery - telemetry reporting - optional local process management 3. Provider Adapters - OpenAI-compatible chat backends - OpenAI-compatible embedding backends - transcription backends - future adapters for image and speech synthesis 4. Client Facades - OpenAI-compatible facade for completions and embeddings - operator API for topology, health, and inventory ## Core Concepts ### Host A physical or virtual machine participating in the cluster. ### Service A concrete callable capability on a host. Examples: - chat completion endpoint - embedding endpoint - transcription endpoint ### Asset A model weight, model name, application, or runtime target that a service can serve. ### Role A reusable task profile that describes how requests should be fulfilled. A role is policy, not a concrete model. ### Route Resolution Request handling order: 1. If the requested `model` matches a currently loaded and healthy concrete asset or service alias, route directly. 2. Otherwise, if the requested `model` matches a known role, resolve the role to the best eligible service. 3. Otherwise, fail clearly. ## V1 Capability Scope V1 supports only: - chat completions - embeddings - transcription ## Topology Recommended initial topology: - 1 control plane - 2 node agents - 1 or more clients - LAN-first deployment - API key auth in v1 - VPN or mTLS in v1.5 ## API Families ### Client API - `GET /v1/models` - `POST /v1/chat/completions` - `POST /v1/embeddings` - `POST /v1/audio/transcriptions` `GET /v1/models` should expose enough metadata for programmatic clients to make routing decisions about what GenieHive can handle cheaply, especially for lower-complexity offloaded work. That metadata should include direct assets, service-backed aliases, role aliases, operation kind, health, loaded status, and observed performance hints. ### Operator API - `GET /v1/cluster/hosts` - `GET /v1/cluster/services` - `GET /v1/cluster/roles` - `GET /v1/cluster/health` - `GET /v1/cluster/routes/resolve?model=...` ### Node API - `POST /v1/nodes/register` - `POST /v1/nodes/heartbeat` - `GET /v1/node/inventory` - `POST /v1/node/services/refresh` ## Data Store V1 should use SQLite for durable state. ## Routing Rules ### Direct Model Resolution If a request names a concrete asset alias or service alias: - prefer loaded and healthy services - choose the lowest-cost healthy target if multiple matches exist - fail clearly if all matches are unhealthy ### Role Resolution If direct resolution fails, treat the requested name as a role. Role resolution should filter by: - operation kind - modality - health - auth and exposure compatibility - minimum context or memory requirements - preferred model families Then rank by: - already loaded - recent health - expected latency - queue pressure - operator priority ## First Implementation Sequence 1. Create the repo skeleton and docs. 2. Implement SQLite-backed registry models. 3. Implement node registration and heartbeat. 4. Implement operator inspection endpoints. 5. Implement client-facing chat routing. 6. Add embeddings routing. 7. Add transcription routing. 8. Add truthful readiness and health reporting. 9. Add role catalog and role-based resolution. 10. Add optional managed local runtime support.