RoleMesh-Gateway/docs/CONFIG.md

196 lines
5.4 KiB
Markdown

# Configuration
RoleMesh Gateway loads configuration from a YAML file (default: `configs/models.yaml`).
Set `ROLE_MESH_CONFIG` to override.
## Top-level schema
```yaml
version: 1
default_model: writer
gateway:
host: 0.0.0.0
port: 8000
auth:
client_api_keys: ["..."]
node_api_keys: ["..."]
models:
<alias>:
type: proxy | discovered
openai_model_name: <string>
...
```
- `<alias>` is what clients pass as `model` in `/v1/chat/completions`.
- `openai_model_name` is the model id returned by `/v1/models` (usually same as alias).
## Roles are aliases, not a fixed list
RoleMesh Gateway does not reserve a built-in set of roles.
- The keys under `models:` are your project-specific role names
- Clients send those keys in the OpenAI `model` field
- You can rename or replace the sample roles entirely
- Different projects can use different role layouts with the same gateway
Example custom role set:
```yaml
models:
researcher:
type: proxy
openai_model_name: researcher
proxy_url: http://127.0.0.1:8011
summarizer:
type: proxy
openai_model_name: summarizer
proxy_url: http://127.0.0.1:8012
security-reviewer:
type: proxy
openai_model_name: security-reviewer
proxy_url: http://127.0.0.1:8013
```
## Where the actual model weights are selected
This depends on the backend pattern.
### For `type: proxy`
The gateway alias does **not** point directly to a weight file. It points to an already-running inference server:
```yaml
models:
writer:
type: proxy
proxy_url: http://127.0.0.1:8012
```
The actual model weights are chosen by that upstream server, not by RoleMesh Gateway.
Examples:
- `llamafile --server -m /path/to/model.gguf ...`
- `llama-server -m /path/to/model.gguf ...`
- Ollama with `defaults.model: dolphin3:latest`
| Upstream type | Where weights/model are chosen | RoleMesh fields involved |
| --- | --- | --- |
| `llamafile --server` | backend startup CLI, usually `-m /path/to/model.gguf` | `proxy_url` |
| `llama-server` | backend startup CLI, usually `-m /path/to/model.gguf` | `proxy_url` |
| Ollama | request JSON `model`, optionally injected by the gateway | `proxy_url`, `defaults.model` |
### For `type: discovered`
The gateway still does not point directly to a weight file. It points to a role served by a registered node.
The actual weight file is defined on the node side, usually in the node-agent config:
```yaml
models:
- model_id: "planner-gguf"
path: "/models/SomePlannerModel.Q5_K_M.gguf"
roles: ["planner"]
```
In that setup:
- gateway alias -> discovered role
- discovered role -> registered node
- node-agent `path` -> actual weight file on disk
## Proxy models
Route to a fixed upstream (any host reachable from the gateway):
```yaml
models:
writer:
type: proxy
openai_model_name: writer
proxy_url: http://127.0.0.1:8012
defaults:
temperature: 0.6
```
Notes:
- The model alias (`writer` above) is what the client sends in `model`.
- `openai_model_name` is what the gateway returns from `GET /v1/models`.
- `proxy_url` is the actual upstream backend to call.
- `defaults` are only applied when the incoming request does not already set those keys.
## Discovered models
Route to a dynamically registered node that claims the role:
```yaml
models:
reviewer:
type: discovered
openai_model_name: reviewer
role: reviewer
strategy: round_robin
```
Supported discovered-node strategies:
- `round_robin`: rotate requests across fresh matching nodes
- `random`: choose a fresh matching node at random for each request
### Registering nodes
Nodes register to `POST /v1/nodes/register`:
```json
{
"node_id": "gpu-box-1",
"base_url": "http://10.0.0.12:8014",
"roles": ["reviewer", "planner"],
"meta": {"gpu": "Tesla P40", "notes": "llama-server on GPU0"}
}
```
If `auth.client_api_keys` is set (non-empty), callers of `/v1/models` and `/v1/chat/completions` must provide an API key.
If `auth.node_api_keys` is set (non-empty), node agents calling `/v1/nodes/register` and `/v1/nodes/heartbeat` must provide a node key.
Supported headers:
- Clients: `Authorization: Bearer <key>` or `X-Api-Key: <key>`
- Nodes: `Authorization: Bearer <node_key>` or `X-RoleMesh-Node-Key: <node_key>`
## Availability behavior
Configured aliases are not automatically assumed healthy.
- `GET /v1/models` probes configured upstreams and only returns aliases that are currently reachable
- unavailable aliases are included separately in `rolemesh.unavailable_models`
- `GET /ready` returns success only when the configured `default_model` is currently usable
- discovered nodes are only considered routable while they are fresh
- gateway metadata marks stale registered nodes so operators can distinguish them from healthy nodes
This is especially important when a config contains multiple optional roles but only some backends are up.
For discovered-node freshness, the gateway uses the `ROLE_MESH_NODE_STALE_AFTER_S` environment variable.
Default: `30`.
## Quick example
```yaml
version: 1
default_model: planner
auth:
client_api_keys: ["change-me-client-key"]
node_api_keys: ["change-me-node-key"]
models:
planner:
type: proxy
openai_model_name: planner
proxy_url: http://127.0.0.1:8011
defaults:
temperature: 0
max_tokens: 128
writer:
type: proxy
openai_model_name: writer
proxy_url: http://127.0.0.1:8012
defaults:
temperature: 0.6
max_tokens: 256
```