40 lines
1.4 KiB
Markdown
40 lines
1.4 KiB
Markdown
# Deployment
|
|
|
|
This scaffold supports two patterns.
|
|
|
|
## Pattern A: Single host, proxy to localhost backends
|
|
|
|
- Run `llama-server` (or other OpenAI-compatible servers) on the host:
|
|
- planner → `http://127.0.0.1:8011`
|
|
- writer → `http://127.0.0.1:8012`
|
|
- Run gateway:
|
|
- either directly on host (recommended for simplicity), or
|
|
- in Docker with `network_mode: host` (Linux) if upstream binds to 127.0.0.1
|
|
|
|
## Pattern B: Multi-host (roles distributed across machines)
|
|
|
|
- Choose one machine to run the gateway (or run multiple gateways)
|
|
- Each backend host exposes an OpenAI-compatible server on LAN, e.g.:
|
|
- `http://10.0.0.12:8012` (writer)
|
|
- `http://10.0.0.13:8011` (planner)
|
|
- Update `proxy_url` entries to those LAN URLs, **or** use discovery:
|
|
- Set model to `type: discovered` with `role: writer`, etc.
|
|
- Each host registers itself with the gateway.
|
|
|
|
### Minimal registration call
|
|
|
|
```bash
|
|
curl -sS -X POST http://GATEWAY:8000/v1/nodes/register \
|
|
-H 'Content-Type: application/json' \
|
|
-d '{"node_id":"gpu-box-1","base_url":"http://10.0.0.12:8012","roles":["writer"]}'
|
|
```
|
|
|
|
### Hardening checklist (recommended)
|
|
|
|
- Bind gateway to localhost by default, and explicitly expose it when needed
|
|
- Add API key checking (FastAPI dependency) for:
|
|
- inference endpoints
|
|
- registration endpoint
|
|
- Add TTL and periodic health checks for registered nodes
|
|
- Consider mTLS if registration happens over untrusted networks
|