RoleMesh-Gateway/docs/DEPLOYMENT.md

# Deployment


## Network binding and exposure (Step 2 hardening)

**Defaults are safe-by-default:** the gateway and node-agent CLIs default to binding on `127.0.0.1` (localhost).
This prevents accidental public exposure during development.

If you need remote access:

- Bind **only** to a **LAN/private** interface (e.g. `192.168.x.y`, `10.x.y.z`) and restrict ingress with a firewall/VPN.
- Do **not** bind to `0.0.0.0` on an Internet-routable host.

### Recommended firewall policy (examples)

Linux (UFW), allow only a private subnet to reach the gateway (8080) and node agents (8091):

```bash
sudo ufw allow from 192.168.0.0/16 to any port 8080 proto tcp
sudo ufw allow from 192.168.0.0/16 to any port 8091 proto tcp
sudo ufw deny 8080/tcp
sudo ufw deny 8091/tcp
```

If you're using Tailscale/WireGuard, prefer binding to the VPN interface address and limiting rules to that interface/subnet.

### Llama.cpp servers

The node agent starts persistent `llama-server` processes bound to **localhost only** (`127.0.0.1`).
This is intentional: the llama servers should never be reachable directly from the network; only the node agent should proxy to them.


This scaffold supports two patterns.

## Pattern A: Single host, proxy to localhost backends

- Run `llama-server` (or other OpenAI-compatible servers) on the host:
  - planner → `http://127.0.0.1:8011`
  - writer  → `http://127.0.0.1:8012`
- Run gateway:
  - either directly on host (recommended for simplicity), or
  - in Docker with `network_mode: host` (Linux) if upstream binds to 127.0.0.1

## Pattern B: Multi-host (roles distributed across machines)

- Choose one machine to run the gateway (or run multiple gateways)
- Each backend host exposes an OpenAI-compatible server on LAN, e.g.:
  - `http://10.0.0.12:8012` (writer)
  - `http://10.0.0.13:8011` (planner)
- Update `proxy_url` entries to those LAN URLs, **or** use discovery:
  - Set model to `type: discovered` with `role: writer`, etc.
  - Each host registers itself with the gateway.

### Minimal registration call

```bash
curl -sS -X POST http://GATEWAY:8000/v1/nodes/register \
  -H 'Content-Type: application/json' \
  -H 'X-RoleMesh-Node-Key: <node-key>' \
  -d '{"node_id":"gpu-box-1","base_url":"http://10.0.0.12:8012","roles":["writer"]}'
```

### Hardening checklist (recommended)

- Bind gateway to localhost by default, and explicitly expose it when needed
- Add API key checking (FastAPI dependency) for:
  - inference endpoints
  - registration endpoint
- Add TTL and periodic health checks for registered nodes
- Consider mTLS if registration happens over untrusted networks