RoleMesh-Gateway/docs/DEPLOYMENT.md

1.4 KiB

Deployment

This scaffold supports two patterns.

Pattern A: Single host, proxy to localhost backends

  • Run llama-server (or other OpenAI-compatible servers) on the host:
    • planner → http://127.0.0.1:8011
    • writer → http://127.0.0.1:8012
  • Run gateway:
    • either directly on host (recommended for simplicity), or
    • in Docker with network_mode: host (Linux) if upstream binds to 127.0.0.1

Pattern B: Multi-host (roles distributed across machines)

  • Choose one machine to run the gateway (or run multiple gateways)
  • Each backend host exposes an OpenAI-compatible server on LAN, e.g.:
    • http://10.0.0.12:8012 (writer)
    • http://10.0.0.13:8011 (planner)
  • Update proxy_url entries to those LAN URLs, or use discovery:
    • Set model to type: discovered with role: writer, etc.
    • Each host registers itself with the gateway.

Minimal registration call

curl -sS -X POST http://GATEWAY:8000/v1/nodes/register \
  -H 'Content-Type: application/json' \
  -H 'X-RoleMesh-Node-Key: <node-key>' \
  -d '{"node_id":"gpu-box-1","base_url":"http://10.0.0.12:8012","roles":["writer"]}'
  • Bind gateway to localhost by default, and explicitly expose it when needed
  • Add API key checking (FastAPI dependency) for:
    • inference endpoints
    • registration endpoint
  • Add TTL and periodic health checks for registered nodes
  • Consider mTLS if registration happens over untrusted networks