195 lines
4.8 KiB
Markdown
195 lines
4.8 KiB
Markdown
# GenieHive Architecture
|
|
|
|
Status: proposed v1 architecture
|
|
Drafted: 2026-04-05
|
|
|
|
## Repo Name
|
|
|
|
Chosen name: `GenieHive`
|
|
|
|
Why this name:
|
|
|
|
- suggestive: "genie" implies generative AI services, "hive" implies a cooperating cluster
|
|
- accessible: easy to say, remember, and explain
|
|
- whimsical enough to feel like a project name rather than a dry infrastructure label
|
|
|
|
Tradeoff:
|
|
|
|
- `GenieHive` is less search-distinct than `Geniewarren` because `hive` is a common product metaphor
|
|
|
|
## Mission
|
|
|
|
GenieHive is a local-first control plane for heterogeneous generative AI services running across one or more hosts.
|
|
|
|
It should:
|
|
|
|
- register hosts and their available services
|
|
- expose a stable client-facing API
|
|
- track health, capacity, and observed performance
|
|
- support direct model addressing and higher-level role addressing
|
|
- route requests to healthy loaded services first
|
|
- optionally coordinate loading or swapping when policy allows
|
|
- remain practical for a small self-hosted deployment with two hosts
|
|
|
|
## Non-Goals For V1
|
|
|
|
Out of scope initially:
|
|
|
|
- peer-to-peer consensus
|
|
- autonomous global model swapping across many nodes
|
|
- full WAN zero-trust platform engineering
|
|
- image and TTS generation orchestration
|
|
- distributed vector database management
|
|
- billing or multi-tenant quota accounting
|
|
|
|
## Architectural Position
|
|
|
|
GenieHive is not just an OpenAI-compatible gateway.
|
|
|
|
It is a control plane with these layers:
|
|
|
|
1. Control API
|
|
- authoritative registry
|
|
- routing and scheduling
|
|
- role catalog
|
|
- operator inspection
|
|
|
|
2. Node Agent
|
|
- host discovery
|
|
- service discovery
|
|
- telemetry reporting
|
|
- optional local process management
|
|
|
|
3. Provider Adapters
|
|
- OpenAI-compatible chat backends
|
|
- OpenAI-compatible embedding backends
|
|
- transcription backends
|
|
- future adapters for image and speech synthesis
|
|
|
|
4. Client Facades
|
|
- OpenAI-compatible facade for completions and embeddings
|
|
- operator API for topology, health, and inventory
|
|
|
|
## Core Concepts
|
|
|
|
### Host
|
|
|
|
A physical or virtual machine participating in the cluster.
|
|
|
|
### Service
|
|
|
|
A concrete callable capability on a host. Examples:
|
|
|
|
- chat completion endpoint
|
|
- embedding endpoint
|
|
- transcription endpoint
|
|
|
|
### Asset
|
|
|
|
A model weight, model name, application, or runtime target that a service can serve.
|
|
|
|
### Role
|
|
|
|
A reusable task profile that describes how requests should be fulfilled. A role is policy, not a concrete model.
|
|
|
|
### Route Resolution
|
|
|
|
Request handling order:
|
|
|
|
1. If the requested `model` matches a currently loaded and healthy concrete asset or service alias, route directly.
|
|
2. Otherwise, if the requested `model` matches a known role, resolve the role to the best eligible service.
|
|
3. Otherwise, fail clearly.
|
|
|
|
## V1 Capability Scope
|
|
|
|
V1 supports only:
|
|
|
|
- chat completions
|
|
- embeddings
|
|
- transcription
|
|
|
|
## Topology
|
|
|
|
Recommended initial topology:
|
|
|
|
- 1 control plane
|
|
- 2 node agents
|
|
- 1 or more clients
|
|
- LAN-first deployment
|
|
- API key auth in v1
|
|
- VPN or mTLS in v1.5
|
|
|
|
## API Families
|
|
|
|
### Client API
|
|
|
|
- `GET /v1/models`
|
|
- `POST /v1/chat/completions`
|
|
- `POST /v1/embeddings`
|
|
- `POST /v1/audio/transcriptions`
|
|
|
|
`GET /v1/models` should expose enough metadata for programmatic clients to make routing decisions about what GenieHive can handle cheaply, especially for lower-complexity offloaded work. That metadata should include direct assets, service-backed aliases, role aliases, operation kind, health, loaded status, and observed performance hints.
|
|
|
|
### Operator API
|
|
|
|
- `GET /v1/cluster/hosts`
|
|
- `GET /v1/cluster/services`
|
|
- `GET /v1/cluster/roles`
|
|
- `GET /v1/cluster/health`
|
|
- `GET /v1/cluster/routes/resolve?model=...`
|
|
|
|
### Node API
|
|
|
|
- `POST /v1/nodes/register`
|
|
- `POST /v1/nodes/heartbeat`
|
|
- `GET /v1/node/inventory`
|
|
- `POST /v1/node/services/refresh`
|
|
|
|
## Data Store
|
|
|
|
V1 should use SQLite for durable state.
|
|
|
|
## Routing Rules
|
|
|
|
### Direct Model Resolution
|
|
|
|
If a request names a concrete asset alias or service alias:
|
|
|
|
- prefer loaded and healthy services
|
|
- choose the lowest-cost healthy target if multiple matches exist
|
|
- fail clearly if all matches are unhealthy
|
|
|
|
### Role Resolution
|
|
|
|
If direct resolution fails, treat the requested name as a role.
|
|
|
|
Role resolution should filter by:
|
|
|
|
- operation kind
|
|
- modality
|
|
- health
|
|
- auth and exposure compatibility
|
|
- minimum context or memory requirements
|
|
- preferred model families
|
|
|
|
Then rank by:
|
|
|
|
- already loaded
|
|
- recent health
|
|
- expected latency
|
|
- queue pressure
|
|
- operator priority
|
|
|
|
## First Implementation Sequence
|
|
|
|
1. Create the repo skeleton and docs.
|
|
2. Implement SQLite-backed registry models.
|
|
3. Implement node registration and heartbeat.
|
|
4. Implement operator inspection endpoints.
|
|
5. Implement client-facing chat routing.
|
|
6. Add embeddings routing.
|
|
7. Add transcription routing.
|
|
8. Add truthful readiness and health reporting.
|
|
9. Add role catalog and role-based resolution.
|
|
10. Add optional managed local runtime support.
|