80 lines
2.5 KiB
Markdown
80 lines
2.5 KiB
Markdown
# RoleMesh Gateway
|
|
|
|
RoleMesh Gateway is a lightweight **OpenAI-compatible** API gateway for routing chat-completions requests to multiple
|
|
locally hosted LLM backends (e.g., `llama.cpp` `llama-server`) **by role** (planner, writer, coder, reviewer, …).
|
|
|
|
It is designed for **agentic workflows** that benefit from using different models for different steps, and for
|
|
deployments where **different machines host different models** (e.g., GPU box for fast inference, big RAM CPU box for large models).
|
|
|
|
## What you get
|
|
|
|
- OpenAI-compatible endpoints:
|
|
- `GET /v1/models`
|
|
- `POST /v1/chat/completions` (streaming and non-streaming)
|
|
- `GET /health` and `GET /ready`
|
|
- Model registry from `configs/models.yaml`
|
|
- Optional **node registration** so remote machines can announce role backends to the gateway
|
|
- Robust proxying with **explicit httpx timeouts** (no “hang forever”)
|
|
- Structured logging with request IDs
|
|
|
|
## Quick start (proxy mode)
|
|
|
|
1. Create a venv and install:
|
|
|
|
```bash
|
|
python -m venv .venv
|
|
source .venv/bin/activate
|
|
pip install -e .
|
|
```
|
|
|
|
2. Copy the example config:
|
|
|
|
```bash
|
|
cp configs/models.example.yaml configs/models.yaml
|
|
```
|
|
|
|
3. Run the gateway:
|
|
|
|
```bash
|
|
ROLE_MESH_CONFIG=configs/models.yaml uvicorn rolemesh_gateway.main:app --host 0.0.0.0 --port 8000
|
|
```
|
|
|
|
4. Smoke test:
|
|
|
|
```bash
|
|
bash scripts/smoke_test.sh http://127.0.0.1:8000
|
|
```
|
|
|
|
## Multi-host (node registration)
|
|
|
|
If you want machines to host backends and “register” them dynamically, run a tiny node agent on each backend host
|
|
(or just call the registration endpoint from your own tooling).
|
|
|
|
- Gateway endpoint: `POST /v1/nodes/register`
|
|
- Node payload describes which **roles** it serves and the base URL to reach its OpenAI-compatible backend.
|
|
|
|
See: `docs/DEPLOYMENT.md` and `docs/CONFIG.md`.
|
|
|
|
## Status
|
|
|
|
This repository is a **preliminary scaffold**:
|
|
- Proxying to OpenAI-compatible upstreams works.
|
|
- Registration and load-selection are implemented (basic round-robin), but persistence and auth are TODOs.
|
|
|
|
## License
|
|
|
|
MIT. See `LICENSE`.
|
|
|
|
## Node Agent (per-host)
|
|
|
|
This repo also includes a **RoleMesh Node Agent** (`rolemesh-node-agent`) that can manage **persistent** `llama.cpp` servers (one per GPU) and report inventory/metrics back to the gateway.
|
|
|
|
- Sample config: `configs/node_agent.example.yaml`
|
|
- Docs: `docs/NODE_AGENT.md`
|
|
|
|
|
|
|
|
## Safe-by-default binding
|
|
|
|
Gateway and node-agent default to binding on `127.0.0.1` to avoid accidental exposure. Bind only to private/LAN or VPN interfaces and firewall ports if you need remote access.
|