6.7 KiB

Raw Blame History

Configuration

RoleMesh Gateway loads configuration from a YAML file (default: configs/models.yaml). Set ROLE_MESH_CONFIG to override.

Top-level schema

version: 1
default_model: writer
gateway:
  host: 0.0.0.0
  port: 8000
auth:
  client_api_keys: ["..."]
  node_api_keys: ["..."]
models:
  <alias>:
    type: proxy | discovered
    openai_model_name: <string>
    ...

<alias> is what clients pass as model in /v1/chat/completions.
openai_model_name is the model id returned by /v1/models (usually same as alias).

Roles are aliases, not a fixed list

RoleMesh Gateway does not reserve a built-in set of roles.

The keys under models: are your project-specific role names
Clients send those keys in the OpenAI model field
You can rename or replace the sample roles entirely
Different projects can use different role layouts with the same gateway

Example custom role set:

models:
  researcher:
    type: proxy
    openai_model_name: researcher
    proxy_url: http://127.0.0.1:8011
  summarizer:
    type: proxy
    openai_model_name: summarizer
    proxy_url: http://127.0.0.1:8012
  security-reviewer:
    type: proxy
    openai_model_name: security-reviewer
    proxy_url: http://127.0.0.1:8013

Where the actual model weights are selected

This depends on the backend pattern.

For `type: proxy`

The gateway alias does not point directly to a weight file. It points to an already-running inference server:

models:
  writer:
    type: proxy
    proxy_url: http://127.0.0.1:8012

The actual model weights are chosen by that upstream server, not by RoleMesh Gateway.

Examples:

llamafile --server -m /path/to/model.gguf ...
llama-server -m /path/to/model.gguf ...
Ollama with defaults.model: dolphin3:latest

Upstream type	Where weights/model are chosen	RoleMesh fields involved
`llamafile --server`	backend startup CLI, usually `-m /path/to/model.gguf`	`proxy_url`
`llama-server`	backend startup CLI, usually `-m /path/to/model.gguf`	`proxy_url`
Ollama	request JSON `model`, optionally injected by the gateway	`proxy_url`, `defaults.model`

For `type: discovered`

The gateway still does not point directly to a weight file. It points to a role served by a registered node. The actual weight file is defined on the node side, usually in the node-agent config:

models:
  - model_id: "planner-gguf"
    path: "/models/SomePlannerModel.Q5_K_M.gguf"
    roles: ["planner"]

In that setup:

gateway alias -> discovered role
discovered role -> registered node + concrete upstream model ID
node-agent path -> actual weight file on disk

Proxy models

Route to a fixed upstream (any host reachable from the gateway):

models:
  writer:
    type: proxy
    openai_model_name: writer
    proxy_url: http://127.0.0.1:8012
    defaults:
      temperature: 0.6

Notes:

The model alias (writer above) is what the client sends in model.
openai_model_name is what the gateway returns from GET /v1/models.
proxy_url is the actual upstream backend to call.
defaults are only applied when the incoming request does not already set those keys.

Discovered models

Route to a dynamically registered model instance that can satisfy the role:

models:
  reviewer:
    type: discovered
    openai_model_name: reviewer
    role: reviewer
    strategy: round_robin

Supported discovered-node strategies:

round_robin: rotate requests across fresh matching nodes
random: choose a fresh matching node at random for each request

Registering nodes

Nodes register to POST /v1/nodes/register:

{
  "node_id": "gpu-box-1",
  "base_url": "http://10.0.0.12:8014",
  "served_models": [
    {
      "model_id": "qwen3-8b",
      "roles": ["reviewer", "planner"],
      "meta": {"family": "Qwen3", "quant": "Q5_K_M"}
    },
    {
      "model_id": "qwen2.5-coder-14b",
      "roles": ["coder"],
      "meta": {"family": "Qwen2.5-Coder", "quant": "Q5_K_M"}
    }
  ],
  "meta": {"gpu": "Tesla P40", "notes": "llama-server on GPU0"}
}

served_models is now the preferred registration schema.

model_id: concrete model name the upstream node expects in the forwarded OpenAI request
roles: workflow roles that this model can satisfy
meta: optional operator-facing metadata

Legacy flat roles registration is still accepted for compatibility, but it is treated as a fallback where model_id == role.

If auth.client_api_keys is set (non-empty), callers of /v1/models and /v1/chat/completions must provide an API key.

If auth.node_api_keys is set (non-empty), node agents calling /v1/nodes/register and /v1/nodes/heartbeat must provide a node key.

Supported headers:

Clients: Authorization: Bearer <key> or X-Api-Key: <key>
Nodes: Authorization: Bearer <node_key> or X-RoleMesh-Node-Key: <node_key>

Availability behavior

Configured aliases are not automatically assumed healthy.

GET /v1/models probes configured upstreams and only returns aliases that are currently reachable
unavailable aliases are included separately in rolemesh.unavailable_models
GET /ready returns success only when the configured default_model is currently usable
discovered nodes are only considered routable while they are fresh
gateway metadata marks stale registered nodes so operators can distinguish them from healthy nodes

This is especially important when a config contains multiple optional roles but only some backends are up.

For discovered-node freshness, the gateway uses the ROLE_MESH_NODE_STALE_AFTER_S environment variable. Default: 30.

Quick example

version: 1
default_model: planner
auth:
  client_api_keys: ["change-me-client-key"]
  node_api_keys: ["change-me-node-key"]
models:
  planner:
    type: proxy
    openai_model_name: planner
    proxy_url: http://127.0.0.1:8011
    defaults:
      temperature: 0
      max_tokens: 128
  writer:
    type: proxy
    openai_model_name: writer
    proxy_url: http://127.0.0.1:8012
    defaults:
      temperature: 0.6
      max_tokens: 256

Base config plus local override

Recommended pattern:

keep tracked repo config generic
keep machine-specific values in a separate local YAML
merge the local YAML at launch

Gateway:

rolemesh-gateway --config configs/models.example.yaml --config-override configs/models.local.yaml

Node agent:

rolemesh-node-agent --config configs/node_agent.example.yaml --config-override configs/node_agent.local.yaml

The merge is recursive for mappings:

nested dictionaries are merged
lists and scalar values are replaced by the override file

This is useful for separating:

real model weight paths
local host IPs
local API keys
local llama-server paths

6.7 KiB Raw Blame History