4.8 KiB
GenieHive Architecture
Status: proposed v1 architecture Drafted: 2026-04-05
Repo Name
Chosen name: GenieHive
Why this name:
- suggestive: "genie" implies generative AI services, "hive" implies a cooperating cluster
- accessible: easy to say, remember, and explain
- whimsical enough to feel like a project name rather than a dry infrastructure label
Tradeoff:
GenieHiveis less search-distinct thanGeniewarrenbecausehiveis a common product metaphor
Mission
GenieHive is a local-first control plane for heterogeneous generative AI services running across one or more hosts.
It should:
- register hosts and their available services
- expose a stable client-facing API
- track health, capacity, and observed performance
- support direct model addressing and higher-level role addressing
- route requests to healthy loaded services first
- optionally coordinate loading or swapping when policy allows
- remain practical for a small self-hosted deployment with two hosts
Non-Goals For V1
Out of scope initially:
- peer-to-peer consensus
- autonomous global model swapping across many nodes
- full WAN zero-trust platform engineering
- image and TTS generation orchestration
- distributed vector database management
- billing or multi-tenant quota accounting
Architectural Position
GenieHive is not just an OpenAI-compatible gateway.
It is a control plane with these layers:
-
Control API
- authoritative registry
- routing and scheduling
- role catalog
- operator inspection
-
Node Agent
- host discovery
- service discovery
- telemetry reporting
- optional local process management
-
Provider Adapters
- OpenAI-compatible chat backends
- OpenAI-compatible embedding backends
- transcription backends
- future adapters for image and speech synthesis
-
Client Facades
- OpenAI-compatible facade for completions and embeddings
- operator API for topology, health, and inventory
Core Concepts
Host
A physical or virtual machine participating in the cluster.
Service
A concrete callable capability on a host. Examples:
- chat completion endpoint
- embedding endpoint
- transcription endpoint
Asset
A model weight, model name, application, or runtime target that a service can serve.
Role
A reusable task profile that describes how requests should be fulfilled. A role is policy, not a concrete model.
Route Resolution
Request handling order:
- If the requested
modelmatches a currently loaded and healthy concrete asset or service alias, route directly. - Otherwise, if the requested
modelmatches a known role, resolve the role to the best eligible service. - Otherwise, fail clearly.
V1 Capability Scope
V1 supports only:
- chat completions
- embeddings
- transcription
Topology
Recommended initial topology:
- 1 control plane
- 2 node agents
- 1 or more clients
- LAN-first deployment
- API key auth in v1
- VPN or mTLS in v1.5
API Families
Client API
GET /v1/modelsPOST /v1/chat/completionsPOST /v1/embeddingsPOST /v1/audio/transcriptions
GET /v1/models should expose enough metadata for programmatic clients to make routing decisions about what GenieHive can handle cheaply, especially for lower-complexity offloaded work. That metadata should include direct assets, service-backed aliases, role aliases, operation kind, health, loaded status, and observed performance hints.
Operator API
GET /v1/cluster/hostsGET /v1/cluster/servicesGET /v1/cluster/rolesGET /v1/cluster/healthGET /v1/cluster/routes/resolve?model=...
Node API
POST /v1/nodes/registerPOST /v1/nodes/heartbeatGET /v1/node/inventoryPOST /v1/node/services/refresh
Data Store
V1 should use SQLite for durable state.
Routing Rules
Direct Model Resolution
If a request names a concrete asset alias or service alias:
- prefer loaded and healthy services
- choose the lowest-cost healthy target if multiple matches exist
- fail clearly if all matches are unhealthy
Role Resolution
If direct resolution fails, treat the requested name as a role.
Role resolution should filter by:
- operation kind
- modality
- health
- auth and exposure compatibility
- minimum context or memory requirements
- preferred model families
Then rank by:
- already loaded
- recent health
- expected latency
- queue pressure
- operator priority
First Implementation Sequence
- Create the repo skeleton and docs.
- Implement SQLite-backed registry models.
- Implement node registration and heartbeat.
- Implement operator inspection endpoints.
- Implement client-facing chat routing.
- Add embeddings routing.
- Add transcription routing.
- Add truthful readiness and health reporting.
- Add role catalog and role-based resolution.
- Add optional managed local runtime support.