248 lines
6.7 KiB
Markdown
248 lines
6.7 KiB
Markdown
# Configuration
|
|
|
|
RoleMesh Gateway loads configuration from a YAML file (default: `configs/models.yaml`).
|
|
Set `ROLE_MESH_CONFIG` to override.
|
|
|
|
## Top-level schema
|
|
|
|
```yaml
|
|
version: 1
|
|
default_model: writer
|
|
gateway:
|
|
host: 0.0.0.0
|
|
port: 8000
|
|
auth:
|
|
client_api_keys: ["..."]
|
|
node_api_keys: ["..."]
|
|
models:
|
|
<alias>:
|
|
type: proxy | discovered
|
|
openai_model_name: <string>
|
|
...
|
|
```
|
|
|
|
- `<alias>` is what clients pass as `model` in `/v1/chat/completions`.
|
|
- `openai_model_name` is the model id returned by `/v1/models` (usually same as alias).
|
|
|
|
## Roles are aliases, not a fixed list
|
|
|
|
RoleMesh Gateway does not reserve a built-in set of roles.
|
|
|
|
- The keys under `models:` are your project-specific role names
|
|
- Clients send those keys in the OpenAI `model` field
|
|
- You can rename or replace the sample roles entirely
|
|
- Different projects can use different role layouts with the same gateway
|
|
|
|
Example custom role set:
|
|
|
|
```yaml
|
|
models:
|
|
researcher:
|
|
type: proxy
|
|
openai_model_name: researcher
|
|
proxy_url: http://127.0.0.1:8011
|
|
summarizer:
|
|
type: proxy
|
|
openai_model_name: summarizer
|
|
proxy_url: http://127.0.0.1:8012
|
|
security-reviewer:
|
|
type: proxy
|
|
openai_model_name: security-reviewer
|
|
proxy_url: http://127.0.0.1:8013
|
|
```
|
|
|
|
## Where the actual model weights are selected
|
|
|
|
This depends on the backend pattern.
|
|
|
|
### For `type: proxy`
|
|
|
|
The gateway alias does **not** point directly to a weight file. It points to an already-running inference server:
|
|
|
|
```yaml
|
|
models:
|
|
writer:
|
|
type: proxy
|
|
proxy_url: http://127.0.0.1:8012
|
|
```
|
|
|
|
The actual model weights are chosen by that upstream server, not by RoleMesh Gateway.
|
|
|
|
Examples:
|
|
- `llamafile --server -m /path/to/model.gguf ...`
|
|
- `llama-server -m /path/to/model.gguf ...`
|
|
- Ollama with `defaults.model: dolphin3:latest`
|
|
|
|
| Upstream type | Where weights/model are chosen | RoleMesh fields involved |
|
|
| --- | --- | --- |
|
|
| `llamafile --server` | backend startup CLI, usually `-m /path/to/model.gguf` | `proxy_url` |
|
|
| `llama-server` | backend startup CLI, usually `-m /path/to/model.gguf` | `proxy_url` |
|
|
| Ollama | request JSON `model`, optionally injected by the gateway | `proxy_url`, `defaults.model` |
|
|
|
|
### For `type: discovered`
|
|
|
|
The gateway still does not point directly to a weight file. It points to a role served by a registered node.
|
|
The actual weight file is defined on the node side, usually in the node-agent config:
|
|
|
|
```yaml
|
|
models:
|
|
- model_id: "planner-gguf"
|
|
path: "/models/SomePlannerModel.Q5_K_M.gguf"
|
|
roles: ["planner"]
|
|
```
|
|
|
|
In that setup:
|
|
- gateway alias -> discovered role
|
|
- discovered role -> registered node + concrete upstream model ID
|
|
- node-agent `path` -> actual weight file on disk
|
|
|
|
## Proxy models
|
|
|
|
Route to a fixed upstream (any host reachable from the gateway):
|
|
|
|
```yaml
|
|
models:
|
|
writer:
|
|
type: proxy
|
|
openai_model_name: writer
|
|
proxy_url: http://127.0.0.1:8012
|
|
defaults:
|
|
temperature: 0.6
|
|
```
|
|
|
|
Notes:
|
|
- The model alias (`writer` above) is what the client sends in `model`.
|
|
- `openai_model_name` is what the gateway returns from `GET /v1/models`.
|
|
- `proxy_url` is the actual upstream backend to call.
|
|
- `defaults` are only applied when the incoming request does not already set those keys.
|
|
|
|
## Discovered models
|
|
|
|
Route to a dynamically registered model instance that can satisfy the role:
|
|
|
|
```yaml
|
|
models:
|
|
reviewer:
|
|
type: discovered
|
|
openai_model_name: reviewer
|
|
role: reviewer
|
|
strategy: round_robin
|
|
```
|
|
|
|
Supported discovered-node strategies:
|
|
- `round_robin`: rotate requests across fresh matching nodes
|
|
- `random`: choose a fresh matching node at random for each request
|
|
|
|
### Registering nodes
|
|
|
|
Nodes register to `POST /v1/nodes/register`:
|
|
|
|
```json
|
|
{
|
|
"node_id": "gpu-box-1",
|
|
"base_url": "http://10.0.0.12:8014",
|
|
"served_models": [
|
|
{
|
|
"model_id": "qwen3-8b",
|
|
"roles": ["reviewer", "planner"],
|
|
"meta": {"family": "Qwen3", "quant": "Q5_K_M"}
|
|
},
|
|
{
|
|
"model_id": "qwen2.5-coder-14b",
|
|
"roles": ["coder"],
|
|
"meta": {"family": "Qwen2.5-Coder", "quant": "Q5_K_M"}
|
|
}
|
|
],
|
|
"meta": {"gpu": "Tesla P40", "notes": "llama-server on GPU0"}
|
|
}
|
|
```
|
|
|
|
`served_models` is now the preferred registration schema.
|
|
|
|
- `model_id`: concrete model name the upstream node expects in the forwarded OpenAI request
|
|
- `roles`: workflow roles that this model can satisfy
|
|
- `meta`: optional operator-facing metadata
|
|
|
|
Legacy flat `roles` registration is still accepted for compatibility, but it is treated as a fallback where
|
|
`model_id == role`.
|
|
|
|
If `auth.client_api_keys` is set (non-empty), callers of `/v1/models` and `/v1/chat/completions` must provide an API key.
|
|
|
|
If `auth.node_api_keys` is set (non-empty), node agents calling `/v1/nodes/register` and `/v1/nodes/heartbeat` must provide a node key.
|
|
|
|
Supported headers:
|
|
- Clients: `Authorization: Bearer <key>` or `X-Api-Key: <key>`
|
|
- Nodes: `Authorization: Bearer <node_key>` or `X-RoleMesh-Node-Key: <node_key>`
|
|
|
|
## Availability behavior
|
|
|
|
Configured aliases are not automatically assumed healthy.
|
|
|
|
- `GET /v1/models` probes configured upstreams and only returns aliases that are currently reachable
|
|
- unavailable aliases are included separately in `rolemesh.unavailable_models`
|
|
- `GET /ready` returns success only when the configured `default_model` is currently usable
|
|
- discovered nodes are only considered routable while they are fresh
|
|
- gateway metadata marks stale registered nodes so operators can distinguish them from healthy nodes
|
|
|
|
This is especially important when a config contains multiple optional roles but only some backends are up.
|
|
|
|
For discovered-node freshness, the gateway uses the `ROLE_MESH_NODE_STALE_AFTER_S` environment variable.
|
|
Default: `30`.
|
|
|
|
## Quick example
|
|
|
|
```yaml
|
|
version: 1
|
|
default_model: planner
|
|
auth:
|
|
client_api_keys: ["change-me-client-key"]
|
|
node_api_keys: ["change-me-node-key"]
|
|
models:
|
|
planner:
|
|
type: proxy
|
|
openai_model_name: planner
|
|
proxy_url: http://127.0.0.1:8011
|
|
defaults:
|
|
temperature: 0
|
|
max_tokens: 128
|
|
writer:
|
|
type: proxy
|
|
openai_model_name: writer
|
|
proxy_url: http://127.0.0.1:8012
|
|
defaults:
|
|
temperature: 0.6
|
|
max_tokens: 256
|
|
```
|
|
|
|
## Base config plus local override
|
|
|
|
Recommended pattern:
|
|
|
|
- keep tracked repo config generic
|
|
- keep machine-specific values in a separate local YAML
|
|
- merge the local YAML at launch
|
|
|
|
Gateway:
|
|
|
|
```bash
|
|
rolemesh-gateway --config configs/models.example.yaml --config-override configs/models.local.yaml
|
|
```
|
|
|
|
Node agent:
|
|
|
|
```bash
|
|
rolemesh-node-agent --config configs/node_agent.example.yaml --config-override configs/node_agent.local.yaml
|
|
```
|
|
|
|
The merge is recursive for mappings:
|
|
|
|
- nested dictionaries are merged
|
|
- lists and scalar values are replaced by the override file
|
|
|
|
This is useful for separating:
|
|
|
|
- real model weight paths
|
|
- local host IPs
|
|
- local API keys
|
|
- local `llama-server` paths
|