Docs touch-up

2026-03-16 09:57:23 -04:00 · 2026-03-16 09:57:23 -04:00 · ead47f4a6e
parent 87fcdaaacc
commit ead47f4a6e
4 changed files with 146 additions and 16 deletions
--- a/README.md
+++ b/README.md
@ -17,9 +17,11 @@ deployments where **different machines host different models** (e.g., GPU box fo
 - Robust proxying with **explicit httpx timeouts** (no “hang forever”)
 - Structured logging with request IDs

-## Quick start (proxy mode)
+## Quick Start

-1. Create a venv and install:
+This is the fastest path to a working local setup.
+
+### 1. Install

 ```bash
 python -m venv .venv
@ -27,24 +29,68 @@ source .venv/bin/activate
 pip install -e .
 ```

-2. Copy the example config:
+### 2. Start two OpenAI-compatible backends
+
+Any backend that exposes `GET /v1/models` and `POST /v1/chat/completions` will work.
+One practical option is `llamafile` in server mode:

 ```bash
-cp configs/models.example.yaml configs/models.yaml
+llamafile --server -m /path/to/planner-model.gguf --host 127.0.0.1 --port 8011 --nobrowser
+llamafile --server -m /path/to/writer-model.gguf  --host 127.0.0.1 --port 8012 --nobrowser
 ```

-3. Run the gateway:
+### 3. Create a gateway config
+
+```yaml
+version: 1
+default_model: planner
+auth:
+  client_api_keys:
+    - "change-me-client-key"
+models:
+  planner:
+    type: proxy
+    openai_model_name: planner
+    proxy_url: http://127.0.0.1:8011
+    defaults:
+      temperature: 0
+      max_tokens: 128
+  writer:
+    type: proxy
+    openai_model_name: writer
+    proxy_url: http://127.0.0.1:8012
+    defaults:
+      temperature: 0.6
+      max_tokens: 256
+```
+
+Save that as `configs/models.yaml`.
+
+### 4. Run the gateway

 ```bash
-ROLE_MESH_CONFIG=configs/models.yaml uvicorn rolemesh_gateway.main:app --host 0.0.0.0 --port 8000
+ROLE_MESH_CONFIG=configs/models.yaml uvicorn rolemesh_gateway.main:app --host 127.0.0.1 --port 8000
 ```

-4. Smoke test:
+### 5. Verify it

 ```bash
-bash scripts/smoke_test.sh http://127.0.0.1:8000
+curl -sS http://127.0.0.1:8000/v1/models \
+  -H 'X-Api-Key: change-me-client-key'
 ```

+```bash
+curl -sS -X POST http://127.0.0.1:8000/v1/chat/completions \
+  -H 'Content-Type: application/json' \
+  -H 'X-Api-Key: change-me-client-key' \
+  -d '{
+    "model": "planner",
+    "messages": [{"role":"user","content":"Say hello in 3 words."}]
+  }'
+```
+
+If you prefer the provided example file, copy `configs/models.example.yaml` and adjust the `proxy_url` values.
+
 ## Multi-host (node registration)

 If you want machines to host backends and “register” them dynamically, run a tiny node agent on each backend host
@ -59,7 +105,9 @@ See: `docs/DEPLOYMENT.md` and `docs/CONFIG.md`.

 This repository is a **preliminary scaffold**:
 - Proxying to OpenAI-compatible upstreams works.
- Registration and load-selection are implemented (basic round-robin), but persistence and auth are TODOs.
+- Registration and load-selection are implemented (basic round-robin).
+- API-key auth for clients and nodes is available.
+- Persistence is basic JSON-backed state, not a full service registry.

 ## License

@ -72,8 +120,6 @@ This repo also includes a **RoleMesh Node Agent** (`rolemesh-node-agent`) that c
 - Sample config: `configs/node_agent.example.yaml`
 - Docs: `docs/NODE_AGENT.md`

-
-
 ## Safe-by-default binding

 Gateway and node-agent default to binding on `127.0.0.1` to avoid accidental exposure. Bind only to private/LAN or VPN interfaces and firewall ports if you need remote access.
--- a/docs/ARCHITECTURE.md
+++ b/docs/ARCHITECTURE.md
@ -33,8 +33,9 @@ This is deliberately small so you can swap it out later for something stronger:

 ## Known limitations (scaffold)

- No auth on registration or inference endpoints
+- Auth is optional and config-driven rather than enforced by default
 - No TTL/health polling
- Round-robin selection only
+- No automatic config reload
+- Round-robin selection only for discovered nodes

 These are tracked in `docs/DEPLOYMENT.md` as next steps.
--- a/docs/CONFIG.md
+++ b/docs/CONFIG.md
@ -38,6 +38,12 @@ models:
      temperature: 0.6
 ```

+Notes:
+- The model alias (`writer` above) is what the client sends in `model`.
+- `openai_model_name` is what the gateway returns from `GET /v1/models`.
+- `proxy_url` is the actual upstream backend to call.
+- `defaults` are only applied when the incoming request does not already set those keys.
+
 ## Discovered models

 Route to a dynamically registered node that claims the role:
@ -71,3 +77,28 @@ If `auth.node_api_keys` is set (non-empty), node agents calling `/v1/nodes/regis
 Supported headers:
 - Clients: `Authorization: Bearer <key>` or `X-Api-Key: <key>`
 - Nodes: `Authorization: Bearer <node_key>` or `X-RoleMesh-Node-Key: <node_key>`
+
+## Quick example
+
+```yaml
+version: 1
+default_model: planner
+auth:
+  client_api_keys: ["change-me-client-key"]
+  node_api_keys: ["change-me-node-key"]
+models:
+  planner:
+    type: proxy
+    openai_model_name: planner
+    proxy_url: http://127.0.0.1:8011
+    defaults:
+      temperature: 0
+      max_tokens: 128
+  writer:
+    type: proxy
+    openai_model_name: writer
+    proxy_url: http://127.0.0.1:8012
+    defaults:
+      temperature: 0.6
+      max_tokens: 256
+```
--- a/docs/DEPLOYMENT.md
+++ b/docs/DEPLOYMENT.md
@ -1,5 +1,57 @@
 # Deployment

+## Quick start: single host
+
+This is the recommended first deployment because it lets you verify the gateway before introducing discovery or node agents.
+
+### 1. Start local backends
+
+Example using `llamafile`:
+
+```bash
+llamafile --server -m /path/to/planner-model.gguf --host 127.0.0.1 --port 8011 --nobrowser
+llamafile --server -m /path/to/writer-model.gguf  --host 127.0.0.1 --port 8012 --nobrowser
+```
+
+### 2. Point the gateway at those backends
+
+```yaml
+version: 1
+default_model: planner
+auth:
+  client_api_keys:
+    - "change-me-client-key"
+models:
+  planner:
+    type: proxy
+    openai_model_name: planner
+    proxy_url: http://127.0.0.1:8011
+  writer:
+    type: proxy
+    openai_model_name: writer
+    proxy_url: http://127.0.0.1:8012
+```
+
+### 3. Run the gateway
+
+```bash
+ROLE_MESH_CONFIG=configs/models.yaml uvicorn rolemesh_gateway.main:app --host 127.0.0.1 --port 8000
+```
+
+### 4. Smoke test
+
+```bash
+curl -sS http://127.0.0.1:8000/v1/models \
+  -H 'X-Api-Key: change-me-client-key'
+
+curl -sS -X POST http://127.0.0.1:8000/v1/chat/completions \
+  -H 'Content-Type: application/json' \
+  -H 'X-Api-Key: change-me-client-key' \
+  -d '{
+    "model": "planner",
+    "messages": [{"role":"user","content":"Say hello in 3 words."}]
+  }'
+```

 ## Network binding and exposure (Step 2 hardening)

@ -63,8 +115,8 @@ curl -sS -X POST http://GATEWAY:8000/v1/nodes/register \
 ### Hardening checklist (recommended)

 - Bind gateway to localhost by default, and explicitly expose it when needed
- Add API key checking (FastAPI dependency) for:
-  - inference endpoints
-  - registration endpoint
+- Configure API keys for:
+  - inference endpoints via `auth.client_api_keys`
+  - node registration and heartbeat via `auth.node_api_keys`
 - Add TTL and periodic health checks for registered nodes
 - Consider mTLS if registration happens over untrusted networks