2.6 KiB

Raw Blame History

Deployment

Network binding and exposure (Step 2 hardening)

Defaults are safe-by-default: the gateway and node-agent CLIs default to binding on 127.0.0.1 (localhost). This prevents accidental public exposure during development.

If you need remote access:

Bind only to a LAN/private interface (e.g. 192.168.x.y, 10.x.y.z) and restrict ingress with a firewall/VPN.
Do not bind to 0.0.0.0 on an Internet-routable host.

Recommended firewall policy (examples)

Linux (UFW), allow only a private subnet to reach the gateway (8080) and node agents (8091):

sudo ufw allow from 192.168.0.0/16 to any port 8080 proto tcp
sudo ufw allow from 192.168.0.0/16 to any port 8091 proto tcp
sudo ufw deny 8080/tcp
sudo ufw deny 8091/tcp

If you're using Tailscale/WireGuard, prefer binding to the VPN interface address and limiting rules to that interface/subnet.

Llama.cpp servers

The node agent starts persistent llama-server processes bound to localhost only (127.0.0.1). This is intentional: the llama servers should never be reachable directly from the network; only the node agent should proxy to them.

This scaffold supports two patterns.

Pattern A: Single host, proxy to localhost backends

Run llama-server (or other OpenAI-compatible servers) on the host:
- planner → http://127.0.0.1:8011
- writer → http://127.0.0.1:8012
Run gateway:
- either directly on host (recommended for simplicity), or
- in Docker with network_mode: host (Linux) if upstream binds to 127.0.0.1

Pattern B: Multi-host (roles distributed across machines)

Choose one machine to run the gateway (or run multiple gateways)
Each backend host exposes an OpenAI-compatible server on LAN, e.g.:
- http://10.0.0.12:8012 (writer)
- http://10.0.0.13:8011 (planner)
Update proxy_url entries to those LAN URLs, or use discovery:
- Set model to type: discovered with role: writer, etc.
- Each host registers itself with the gateway.

Minimal registration call

curl -sS -X POST http://GATEWAY:8000/v1/nodes/register \
  -H 'Content-Type: application/json' \
  -H 'X-RoleMesh-Node-Key: <node-key>' \
  -d '{"node_id":"gpu-box-1","base_url":"http://10.0.0.12:8012","roles":["writer"]}'

Hardening checklist (recommended)

Bind gateway to localhost by default, and explicitly expose it when needed
Add API key checking (FastAPI dependency) for:
- inference endpoints
- registration endpoint
Add TTL and periodic health checks for registered nodes
Consider mTLS if registration happens over untrusted networks

2.6 KiB Raw Blame History