Docs touch-up
This commit is contained in:
parent
87fcdaaacc
commit
ead47f4a6e
68
README.md
68
README.md
|
|
@ -17,9 +17,11 @@ deployments where **different machines host different models** (e.g., GPU box fo
|
||||||
- Robust proxying with **explicit httpx timeouts** (no “hang forever”)
|
- Robust proxying with **explicit httpx timeouts** (no “hang forever”)
|
||||||
- Structured logging with request IDs
|
- Structured logging with request IDs
|
||||||
|
|
||||||
## Quick start (proxy mode)
|
## Quick Start
|
||||||
|
|
||||||
1. Create a venv and install:
|
This is the fastest path to a working local setup.
|
||||||
|
|
||||||
|
### 1. Install
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
python -m venv .venv
|
python -m venv .venv
|
||||||
|
|
@ -27,24 +29,68 @@ source .venv/bin/activate
|
||||||
pip install -e .
|
pip install -e .
|
||||||
```
|
```
|
||||||
|
|
||||||
2. Copy the example config:
|
### 2. Start two OpenAI-compatible backends
|
||||||
|
|
||||||
|
Any backend that exposes `GET /v1/models` and `POST /v1/chat/completions` will work.
|
||||||
|
One practical option is `llamafile` in server mode:
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
cp configs/models.example.yaml configs/models.yaml
|
llamafile --server -m /path/to/planner-model.gguf --host 127.0.0.1 --port 8011 --nobrowser
|
||||||
|
llamafile --server -m /path/to/writer-model.gguf --host 127.0.0.1 --port 8012 --nobrowser
|
||||||
```
|
```
|
||||||
|
|
||||||
3. Run the gateway:
|
### 3. Create a gateway config
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
version: 1
|
||||||
|
default_model: planner
|
||||||
|
auth:
|
||||||
|
client_api_keys:
|
||||||
|
- "change-me-client-key"
|
||||||
|
models:
|
||||||
|
planner:
|
||||||
|
type: proxy
|
||||||
|
openai_model_name: planner
|
||||||
|
proxy_url: http://127.0.0.1:8011
|
||||||
|
defaults:
|
||||||
|
temperature: 0
|
||||||
|
max_tokens: 128
|
||||||
|
writer:
|
||||||
|
type: proxy
|
||||||
|
openai_model_name: writer
|
||||||
|
proxy_url: http://127.0.0.1:8012
|
||||||
|
defaults:
|
||||||
|
temperature: 0.6
|
||||||
|
max_tokens: 256
|
||||||
|
```
|
||||||
|
|
||||||
|
Save that as `configs/models.yaml`.
|
||||||
|
|
||||||
|
### 4. Run the gateway
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
ROLE_MESH_CONFIG=configs/models.yaml uvicorn rolemesh_gateway.main:app --host 0.0.0.0 --port 8000
|
ROLE_MESH_CONFIG=configs/models.yaml uvicorn rolemesh_gateway.main:app --host 127.0.0.1 --port 8000
|
||||||
```
|
```
|
||||||
|
|
||||||
4. Smoke test:
|
### 5. Verify it
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
bash scripts/smoke_test.sh http://127.0.0.1:8000
|
curl -sS http://127.0.0.1:8000/v1/models \
|
||||||
|
-H 'X-Api-Key: change-me-client-key'
|
||||||
```
|
```
|
||||||
|
|
||||||
|
```bash
|
||||||
|
curl -sS -X POST http://127.0.0.1:8000/v1/chat/completions \
|
||||||
|
-H 'Content-Type: application/json' \
|
||||||
|
-H 'X-Api-Key: change-me-client-key' \
|
||||||
|
-d '{
|
||||||
|
"model": "planner",
|
||||||
|
"messages": [{"role":"user","content":"Say hello in 3 words."}]
|
||||||
|
}'
|
||||||
|
```
|
||||||
|
|
||||||
|
If you prefer the provided example file, copy `configs/models.example.yaml` and adjust the `proxy_url` values.
|
||||||
|
|
||||||
## Multi-host (node registration)
|
## Multi-host (node registration)
|
||||||
|
|
||||||
If you want machines to host backends and “register” them dynamically, run a tiny node agent on each backend host
|
If you want machines to host backends and “register” them dynamically, run a tiny node agent on each backend host
|
||||||
|
|
@ -59,7 +105,9 @@ See: `docs/DEPLOYMENT.md` and `docs/CONFIG.md`.
|
||||||
|
|
||||||
This repository is a **preliminary scaffold**:
|
This repository is a **preliminary scaffold**:
|
||||||
- Proxying to OpenAI-compatible upstreams works.
|
- Proxying to OpenAI-compatible upstreams works.
|
||||||
- Registration and load-selection are implemented (basic round-robin), but persistence and auth are TODOs.
|
- Registration and load-selection are implemented (basic round-robin).
|
||||||
|
- API-key auth for clients and nodes is available.
|
||||||
|
- Persistence is basic JSON-backed state, not a full service registry.
|
||||||
|
|
||||||
## License
|
## License
|
||||||
|
|
||||||
|
|
@ -72,8 +120,6 @@ This repo also includes a **RoleMesh Node Agent** (`rolemesh-node-agent`) that c
|
||||||
- Sample config: `configs/node_agent.example.yaml`
|
- Sample config: `configs/node_agent.example.yaml`
|
||||||
- Docs: `docs/NODE_AGENT.md`
|
- Docs: `docs/NODE_AGENT.md`
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
## Safe-by-default binding
|
## Safe-by-default binding
|
||||||
|
|
||||||
Gateway and node-agent default to binding on `127.0.0.1` to avoid accidental exposure. Bind only to private/LAN or VPN interfaces and firewall ports if you need remote access.
|
Gateway and node-agent default to binding on `127.0.0.1` to avoid accidental exposure. Bind only to private/LAN or VPN interfaces and firewall ports if you need remote access.
|
||||||
|
|
|
||||||
|
|
@ -33,8 +33,9 @@ This is deliberately small so you can swap it out later for something stronger:
|
||||||
|
|
||||||
## Known limitations (scaffold)
|
## Known limitations (scaffold)
|
||||||
|
|
||||||
- No auth on registration or inference endpoints
|
- Auth is optional and config-driven rather than enforced by default
|
||||||
- No TTL/health polling
|
- No TTL/health polling
|
||||||
- Round-robin selection only
|
- No automatic config reload
|
||||||
|
- Round-robin selection only for discovered nodes
|
||||||
|
|
||||||
These are tracked in `docs/DEPLOYMENT.md` as next steps.
|
These are tracked in `docs/DEPLOYMENT.md` as next steps.
|
||||||
|
|
|
||||||
|
|
@ -38,6 +38,12 @@ models:
|
||||||
temperature: 0.6
|
temperature: 0.6
|
||||||
```
|
```
|
||||||
|
|
||||||
|
Notes:
|
||||||
|
- The model alias (`writer` above) is what the client sends in `model`.
|
||||||
|
- `openai_model_name` is what the gateway returns from `GET /v1/models`.
|
||||||
|
- `proxy_url` is the actual upstream backend to call.
|
||||||
|
- `defaults` are only applied when the incoming request does not already set those keys.
|
||||||
|
|
||||||
## Discovered models
|
## Discovered models
|
||||||
|
|
||||||
Route to a dynamically registered node that claims the role:
|
Route to a dynamically registered node that claims the role:
|
||||||
|
|
@ -71,3 +77,28 @@ If `auth.node_api_keys` is set (non-empty), node agents calling `/v1/nodes/regis
|
||||||
Supported headers:
|
Supported headers:
|
||||||
- Clients: `Authorization: Bearer <key>` or `X-Api-Key: <key>`
|
- Clients: `Authorization: Bearer <key>` or `X-Api-Key: <key>`
|
||||||
- Nodes: `Authorization: Bearer <node_key>` or `X-RoleMesh-Node-Key: <node_key>`
|
- Nodes: `Authorization: Bearer <node_key>` or `X-RoleMesh-Node-Key: <node_key>`
|
||||||
|
|
||||||
|
## Quick example
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
version: 1
|
||||||
|
default_model: planner
|
||||||
|
auth:
|
||||||
|
client_api_keys: ["change-me-client-key"]
|
||||||
|
node_api_keys: ["change-me-node-key"]
|
||||||
|
models:
|
||||||
|
planner:
|
||||||
|
type: proxy
|
||||||
|
openai_model_name: planner
|
||||||
|
proxy_url: http://127.0.0.1:8011
|
||||||
|
defaults:
|
||||||
|
temperature: 0
|
||||||
|
max_tokens: 128
|
||||||
|
writer:
|
||||||
|
type: proxy
|
||||||
|
openai_model_name: writer
|
||||||
|
proxy_url: http://127.0.0.1:8012
|
||||||
|
defaults:
|
||||||
|
temperature: 0.6
|
||||||
|
max_tokens: 256
|
||||||
|
```
|
||||||
|
|
|
||||||
|
|
@ -1,5 +1,57 @@
|
||||||
# Deployment
|
# Deployment
|
||||||
|
|
||||||
|
## Quick start: single host
|
||||||
|
|
||||||
|
This is the recommended first deployment because it lets you verify the gateway before introducing discovery or node agents.
|
||||||
|
|
||||||
|
### 1. Start local backends
|
||||||
|
|
||||||
|
Example using `llamafile`:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
llamafile --server -m /path/to/planner-model.gguf --host 127.0.0.1 --port 8011 --nobrowser
|
||||||
|
llamafile --server -m /path/to/writer-model.gguf --host 127.0.0.1 --port 8012 --nobrowser
|
||||||
|
```
|
||||||
|
|
||||||
|
### 2. Point the gateway at those backends
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
version: 1
|
||||||
|
default_model: planner
|
||||||
|
auth:
|
||||||
|
client_api_keys:
|
||||||
|
- "change-me-client-key"
|
||||||
|
models:
|
||||||
|
planner:
|
||||||
|
type: proxy
|
||||||
|
openai_model_name: planner
|
||||||
|
proxy_url: http://127.0.0.1:8011
|
||||||
|
writer:
|
||||||
|
type: proxy
|
||||||
|
openai_model_name: writer
|
||||||
|
proxy_url: http://127.0.0.1:8012
|
||||||
|
```
|
||||||
|
|
||||||
|
### 3. Run the gateway
|
||||||
|
|
||||||
|
```bash
|
||||||
|
ROLE_MESH_CONFIG=configs/models.yaml uvicorn rolemesh_gateway.main:app --host 127.0.0.1 --port 8000
|
||||||
|
```
|
||||||
|
|
||||||
|
### 4. Smoke test
|
||||||
|
|
||||||
|
```bash
|
||||||
|
curl -sS http://127.0.0.1:8000/v1/models \
|
||||||
|
-H 'X-Api-Key: change-me-client-key'
|
||||||
|
|
||||||
|
curl -sS -X POST http://127.0.0.1:8000/v1/chat/completions \
|
||||||
|
-H 'Content-Type: application/json' \
|
||||||
|
-H 'X-Api-Key: change-me-client-key' \
|
||||||
|
-d '{
|
||||||
|
"model": "planner",
|
||||||
|
"messages": [{"role":"user","content":"Say hello in 3 words."}]
|
||||||
|
}'
|
||||||
|
```
|
||||||
|
|
||||||
## Network binding and exposure (Step 2 hardening)
|
## Network binding and exposure (Step 2 hardening)
|
||||||
|
|
||||||
|
|
@ -63,8 +115,8 @@ curl -sS -X POST http://GATEWAY:8000/v1/nodes/register \
|
||||||
### Hardening checklist (recommended)
|
### Hardening checklist (recommended)
|
||||||
|
|
||||||
- Bind gateway to localhost by default, and explicitly expose it when needed
|
- Bind gateway to localhost by default, and explicitly expose it when needed
|
||||||
- Add API key checking (FastAPI dependency) for:
|
- Configure API keys for:
|
||||||
- inference endpoints
|
- inference endpoints via `auth.client_api_keys`
|
||||||
- registration endpoint
|
- node registration and heartbeat via `auth.node_api_keys`
|
||||||
- Add TTL and periodic health checks for registered nodes
|
- Add TTL and periodic health checks for registered nodes
|
||||||
- Consider mTLS if registration happens over untrusted networks
|
- Consider mTLS if registration happens over untrusted networks
|
||||||
|
|
|
||||||
Loading…
Reference in New Issue