# Deployment This scaffold supports two patterns. ## Pattern A: Single host, proxy to localhost backends - Run `llama-server` (or other OpenAI-compatible servers) on the host: - planner → `http://127.0.0.1:8011` - writer → `http://127.0.0.1:8012` - Run gateway: - either directly on host (recommended for simplicity), or - in Docker with `network_mode: host` (Linux) if upstream binds to 127.0.0.1 ## Pattern B: Multi-host (roles distributed across machines) - Choose one machine to run the gateway (or run multiple gateways) - Each backend host exposes an OpenAI-compatible server on LAN, e.g.: - `http://10.0.0.12:8012` (writer) - `http://10.0.0.13:8011` (planner) - Update `proxy_url` entries to those LAN URLs, **or** use discovery: - Set model to `type: discovered` with `role: writer`, etc. - Each host registers itself with the gateway. ### Minimal registration call ```bash curl -sS -X POST http://GATEWAY:8000/v1/nodes/register \ -H 'Content-Type: application/json' \ -d '{"node_id":"gpu-box-1","base_url":"http://10.0.0.12:8012","roles":["writer"]}' ``` ### Hardening checklist (recommended) - Bind gateway to localhost by default, and explicitly expose it when needed - Add API key checking (FastAPI dependency) for: - inference endpoints - registration endpoint - Add TTL and periodic health checks for registered nodes - Consider mTLS if registration happens over untrusted networks