GenieHive is a generative AI router, starting with presenting an OpenAI API-compatible endpoint for clients to interact with, while their requests are routed appropriately among one or more nodes that register running servers with the control host. From running multiple LLMs on a single host to doing that across a distributed cluster, GenieHive aims to make it easier to actually use local AI.
Control plane: - fallback_roles chain in resolve_route() with cycle protection - round_robin and least_loaded routing strategies; default_strategy dispatches all three - Streaming chat completions: async generator, eager route resolution, SSE reasoning-strip - POST /v1/audio/transcriptions proxy (multipart, dedicated httpx path) - ServiceProber background task: probes /health, falls back to /v1/models for vLLM - ServiceObserved gains loaded_model_count and vram_used_bytes - _runtime_signals exposes loaded_model_count to route scoring Node agent: - discover_protocol: "ollama"|"openai"|null per-service config field - discovery.py: discover_ollama_assets (loaded: False), _get_ollama_ps_models helper, query_ollama_ps, discover_openai_models, enrich_service_assets (two-phase Ollama, corrects stale loaded state, populates observed metrics from /api/ps) - Heartbeat zips service dicts with config to pass protocol; allocates discovery client only when needed Tests: 47 passing (up from 19) Role catalogs (example configs): - roles.surgical-team.example.yaml — Brooks/Mills surgical team (surg_ prefix, 9 roles) - roles.belbin.example.yaml — Belbin team roles (belbin_ prefix, 9 roles) - roles.sixhats.example.yaml — De Bono Six Thinking Hats (sixhats_ prefix, 6 roles) - roles.disney.example.yaml — Disney creative strategy (disney_ prefix, 3 roles) - roles.xp.example.yaml — XP team roles (xp_ prefix, 5 roles) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> |
||
|---|---|---|
| configs | ||
| docs | ||
| scripts | ||
| src | ||
| tests | ||
| .gitignore | ||
| CONTRIBUTING.md | ||
| LICENSE | ||
| Makefile | ||
| README.md | ||
| pyproject.toml | ||
README.md
GenieHive
GenieHive is a local-first control plane for heterogeneous generative AI services running across one or more hosts.
V1 scope:
- chat completions
- embeddings
- transcription
Core goals:
- register hosts and services
- track health, inventory, and observed performance
- expose a stable client-facing API
- support direct model addressing and higher-level role addressing
- route requests to healthy loaded services first
Repository layout:
docs/architecture.md: system overview and v1 scopedocs/roadmap.md: current milestones and near-term prioritiesdocs/schemas.md: canonical data modelsdocs/deployment.md: intended deployment approachdocs/demo.md: first end-to-end control-plus-node demo flowdocs/llm_demo.md: detailed master/peer/client LLM demo runbookdocs/reverse_proxy.md: safer external exposure patternsconfigs/: example control-plane, node, and role configsscripts/: small launch and inspection helperssrc/geniehive_control/: control-plane packagesrc/geniehive_node/: node-agent package
There is now a documented single-machine path as well as the cluster-oriented path, so GenieHive can be exercised as a useful local router even without multiple hosts.
This repository is intended as the clean successor to narrower local gateway experiments. OpenAI-compatible routing remains important, but it is treated as one client facade within a broader cluster control-plane design.
Development
Local development setup:
cd /home/netuser/bin/geniehive
python -m venv .venv
. .venv/bin/activate
pip install -e '.[dev]'
Common commands:
make test
make smoke
make health
Benchmark workflow:
PYTHONPATH=src python scripts/run_benchmark_workload.py \
--base-url http://127.0.0.1:8800 \
--api-key change-me-client-key \
--model general_assistant \
--workload chat.short_reasoning \
--output /tmp/geniehive-bench.json
PYTHONPATH=src python scripts/ingest_benchmark_report.py /tmp/geniehive-bench.json \
--base-url http://127.0.0.1:8800 \
--api-key change-me-client-key
Repository conventions:
- local runtime state lives under
state/and should not be committed - example configs under
configs/should remain runnable - operator scripts under
scripts/are part of the supported workflow