1.4 KiB
1.4 KiB
Deployment
This scaffold supports two patterns.
Pattern A: Single host, proxy to localhost backends
- Run
llama-server(or other OpenAI-compatible servers) on the host:- planner →
http://127.0.0.1:8011 - writer →
http://127.0.0.1:8012
- planner →
- Run gateway:
- either directly on host (recommended for simplicity), or
- in Docker with
network_mode: host(Linux) if upstream binds to 127.0.0.1
Pattern B: Multi-host (roles distributed across machines)
- Choose one machine to run the gateway (or run multiple gateways)
- Each backend host exposes an OpenAI-compatible server on LAN, e.g.:
http://10.0.0.12:8012(writer)http://10.0.0.13:8011(planner)
- Update
proxy_urlentries to those LAN URLs, or use discovery:- Set model to
type: discoveredwithrole: writer, etc. - Each host registers itself with the gateway.
- Set model to
Minimal registration call
curl -sS -X POST http://GATEWAY:8000/v1/nodes/register \
-H 'Content-Type: application/json' \
-H 'X-RoleMesh-Node-Key: <node-key>' \
-d '{"node_id":"gpu-box-1","base_url":"http://10.0.0.12:8012","roles":["writer"]}'
Hardening checklist (recommended)
- Bind gateway to localhost by default, and explicitly expose it when needed
- Add API key checking (FastAPI dependency) for:
- inference endpoints
- registration endpoint
- Add TTL and periodic health checks for registered nodes
- Consider mTLS if registration happens over untrusted networks