diff --git a/README.md b/README.md index 8de7464..ae676b0 100644 --- a/README.md +++ b/README.md @@ -17,9 +17,11 @@ deployments where **different machines host different models** (e.g., GPU box fo - Robust proxying with **explicit httpx timeouts** (no “hang forever”) - Structured logging with request IDs -## Quick start (proxy mode) +## Quick Start -1. Create a venv and install: +This is the fastest path to a working local setup. + +### 1. Install ```bash python -m venv .venv @@ -27,24 +29,68 @@ source .venv/bin/activate pip install -e . ``` -2. Copy the example config: +### 2. Start two OpenAI-compatible backends + +Any backend that exposes `GET /v1/models` and `POST /v1/chat/completions` will work. +One practical option is `llamafile` in server mode: ```bash -cp configs/models.example.yaml configs/models.yaml +llamafile --server -m /path/to/planner-model.gguf --host 127.0.0.1 --port 8011 --nobrowser +llamafile --server -m /path/to/writer-model.gguf --host 127.0.0.1 --port 8012 --nobrowser ``` -3. Run the gateway: +### 3. Create a gateway config + +```yaml +version: 1 +default_model: planner +auth: + client_api_keys: + - "change-me-client-key" +models: + planner: + type: proxy + openai_model_name: planner + proxy_url: http://127.0.0.1:8011 + defaults: + temperature: 0 + max_tokens: 128 + writer: + type: proxy + openai_model_name: writer + proxy_url: http://127.0.0.1:8012 + defaults: + temperature: 0.6 + max_tokens: 256 +``` + +Save that as `configs/models.yaml`. + +### 4. Run the gateway ```bash -ROLE_MESH_CONFIG=configs/models.yaml uvicorn rolemesh_gateway.main:app --host 0.0.0.0 --port 8000 +ROLE_MESH_CONFIG=configs/models.yaml uvicorn rolemesh_gateway.main:app --host 127.0.0.1 --port 8000 ``` -4. Smoke test: +### 5. Verify it ```bash -bash scripts/smoke_test.sh http://127.0.0.1:8000 +curl -sS http://127.0.0.1:8000/v1/models \ + -H 'X-Api-Key: change-me-client-key' ``` +```bash +curl -sS -X POST http://127.0.0.1:8000/v1/chat/completions \ + -H 'Content-Type: application/json' \ + -H 'X-Api-Key: change-me-client-key' \ + -d '{ + "model": "planner", + "messages": [{"role":"user","content":"Say hello in 3 words."}] + }' +``` + +If you prefer the provided example file, copy `configs/models.example.yaml` and adjust the `proxy_url` values. + ## Multi-host (node registration) If you want machines to host backends and “register” them dynamically, run a tiny node agent on each backend host @@ -59,7 +105,9 @@ See: `docs/DEPLOYMENT.md` and `docs/CONFIG.md`. This repository is a **preliminary scaffold**: - Proxying to OpenAI-compatible upstreams works. -- Registration and load-selection are implemented (basic round-robin), but persistence and auth are TODOs. +- Registration and load-selection are implemented (basic round-robin). +- API-key auth for clients and nodes is available. +- Persistence is basic JSON-backed state, not a full service registry. ## License @@ -72,8 +120,6 @@ This repo also includes a **RoleMesh Node Agent** (`rolemesh-node-agent`) that c - Sample config: `configs/node_agent.example.yaml` - Docs: `docs/NODE_AGENT.md` - - ## Safe-by-default binding Gateway and node-agent default to binding on `127.0.0.1` to avoid accidental exposure. Bind only to private/LAN or VPN interfaces and firewall ports if you need remote access. diff --git a/docs/ARCHITECTURE.md b/docs/ARCHITECTURE.md index 2f5dbd3..ac907ae 100644 --- a/docs/ARCHITECTURE.md +++ b/docs/ARCHITECTURE.md @@ -33,8 +33,9 @@ This is deliberately small so you can swap it out later for something stronger: ## Known limitations (scaffold) -- No auth on registration or inference endpoints +- Auth is optional and config-driven rather than enforced by default - No TTL/health polling -- Round-robin selection only +- No automatic config reload +- Round-robin selection only for discovered nodes These are tracked in `docs/DEPLOYMENT.md` as next steps. diff --git a/docs/CONFIG.md b/docs/CONFIG.md index fd243d9..fa9bedc 100644 --- a/docs/CONFIG.md +++ b/docs/CONFIG.md @@ -38,6 +38,12 @@ models: temperature: 0.6 ``` +Notes: +- The model alias (`writer` above) is what the client sends in `model`. +- `openai_model_name` is what the gateway returns from `GET /v1/models`. +- `proxy_url` is the actual upstream backend to call. +- `defaults` are only applied when the incoming request does not already set those keys. + ## Discovered models Route to a dynamically registered node that claims the role: @@ -71,3 +77,28 @@ If `auth.node_api_keys` is set (non-empty), node agents calling `/v1/nodes/regis Supported headers: - Clients: `Authorization: Bearer ` or `X-Api-Key: ` - Nodes: `Authorization: Bearer ` or `X-RoleMesh-Node-Key: ` + +## Quick example + +```yaml +version: 1 +default_model: planner +auth: + client_api_keys: ["change-me-client-key"] + node_api_keys: ["change-me-node-key"] +models: + planner: + type: proxy + openai_model_name: planner + proxy_url: http://127.0.0.1:8011 + defaults: + temperature: 0 + max_tokens: 128 + writer: + type: proxy + openai_model_name: writer + proxy_url: http://127.0.0.1:8012 + defaults: + temperature: 0.6 + max_tokens: 256 +``` diff --git a/docs/DEPLOYMENT.md b/docs/DEPLOYMENT.md index 8315cd5..9f650b4 100644 --- a/docs/DEPLOYMENT.md +++ b/docs/DEPLOYMENT.md @@ -1,5 +1,57 @@ # Deployment +## Quick start: single host + +This is the recommended first deployment because it lets you verify the gateway before introducing discovery or node agents. + +### 1. Start local backends + +Example using `llamafile`: + +```bash +llamafile --server -m /path/to/planner-model.gguf --host 127.0.0.1 --port 8011 --nobrowser +llamafile --server -m /path/to/writer-model.gguf --host 127.0.0.1 --port 8012 --nobrowser +``` + +### 2. Point the gateway at those backends + +```yaml +version: 1 +default_model: planner +auth: + client_api_keys: + - "change-me-client-key" +models: + planner: + type: proxy + openai_model_name: planner + proxy_url: http://127.0.0.1:8011 + writer: + type: proxy + openai_model_name: writer + proxy_url: http://127.0.0.1:8012 +``` + +### 3. Run the gateway + +```bash +ROLE_MESH_CONFIG=configs/models.yaml uvicorn rolemesh_gateway.main:app --host 127.0.0.1 --port 8000 +``` + +### 4. Smoke test + +```bash +curl -sS http://127.0.0.1:8000/v1/models \ + -H 'X-Api-Key: change-me-client-key' + +curl -sS -X POST http://127.0.0.1:8000/v1/chat/completions \ + -H 'Content-Type: application/json' \ + -H 'X-Api-Key: change-me-client-key' \ + -d '{ + "model": "planner", + "messages": [{"role":"user","content":"Say hello in 3 words."}] + }' +``` ## Network binding and exposure (Step 2 hardening) @@ -63,8 +115,8 @@ curl -sS -X POST http://GATEWAY:8000/v1/nodes/register \ ### Hardening checklist (recommended) - Bind gateway to localhost by default, and explicitly expose it when needed -- Add API key checking (FastAPI dependency) for: - - inference endpoints - - registration endpoint +- Configure API keys for: + - inference endpoints via `auth.client_api_keys` + - node registration and heartbeat via `auth.node_api_keys` - Add TTL and periodic health checks for registered nodes - Consider mTLS if registration happens over untrusted networks