Software to run multiple LLMs for agentic tasks over hosts on a defined network.

Go to file

welsberr 5befa6d7f6 Node agent and docs revisions.		2026-03-16 10:59:21 -04:00
artwork	Add logo	2026-03-16 10:13:54 -04:00
configs	Node agent and docs revisions.	2026-03-16 10:59:21 -04:00
docker	Still trying to get the initial commits done	2026-02-06 16:20:34 -05:00
docs	Node agent and docs revisions.	2026-03-16 10:59:21 -04:00
scripts	Change to having node agents and dispatcher structure	2026-02-07 11:41:04 -05:00
src	Node agent and docs revisions.	2026-03-16 10:59:21 -04:00
tests	Node agent and docs revisions.	2026-03-16 10:59:21 -04:00
.editorconfig	Still trying to get the initial commits done	2026-02-06 16:20:34 -05:00
.gitignore	Node agent and docs revisions.	2026-03-16 10:59:21 -04:00
LICENSE	Still trying to get the initial commits done	2026-02-06 16:20:34 -05:00
README.md	Node agent and docs revisions.	2026-03-16 10:59:21 -04:00
docker-compose.yml	Still trying to get the initial commits done	2026-02-06 16:20:34 -05:00
pyproject.toml	Change to having node agents and dispatcher structure	2026-02-07 11:41:04 -05:00

README.md

RoleMesh Gateway

RoleMesh Gateway is a lightweight OpenAI-compatible API gateway for routing chat-completions requests to multiple locally hosted LLM backends (e.g., llama.cpp llama-server) by role (planner, writer, coder, reviewer, …).

It is designed for agentic workflows that benefit from using different models for different steps, and for deployments where different machines host different models (e.g., GPU box for fast inference, big RAM CPU box for large models).

What you get

OpenAI-compatible endpoints:
- GET /v1/models
- POST /v1/chat/completions (streaming and non-streaming)
- GET /health and GET /ready
Model registry from configs/models.yaml
Optional node registration so remote machines can announce role backends to the gateway
Robust proxying with explicit httpx timeouts (no “hang forever”)
Structured logging with request IDs

Quick Start

This is the fastest path to a working local setup.

1. Install

python -m venv .venv
source .venv/bin/activate
pip install -e .

2. Start two OpenAI-compatible backends

Any backend that exposes GET /v1/models and POST /v1/chat/completions will work. One practical option is llamafile in server mode:

llamafile --server -m /path/to/planner-model.gguf --host 127.0.0.1 --port 8011 --nobrowser
llamafile --server -m /path/to/writer-model.gguf  --host 127.0.0.1 --port 8012 --nobrowser

3. Create a gateway config

version: 1
default_model: planner
auth:
  client_api_keys:
    - "change-me-client-key"
models:
  planner:
    type: proxy
    openai_model_name: planner
    proxy_url: http://127.0.0.1:8011
    defaults:
      temperature: 0
      max_tokens: 128
  writer:
    type: proxy
    openai_model_name: writer
    proxy_url: http://127.0.0.1:8012
    defaults:
      temperature: 0.6
      max_tokens: 256

Save that as configs/models.yaml.

4. Run the gateway

ROLE_MESH_CONFIG=configs/models.yaml uvicorn rolemesh_gateway.main:app --host 127.0.0.1 --port 8000

5. Verify it

curl -sS http://127.0.0.1:8000/v1/models \
  -H 'X-Api-Key: change-me-client-key'

curl -sS -X POST http://127.0.0.1:8000/v1/chat/completions \
  -H 'Content-Type: application/json' \
  -H 'X-Api-Key: change-me-client-key' \
  -d '{
    "model": "planner",
    "messages": [{"role":"user","content":"Say hello in 3 words."}]
  }'

If you prefer the provided example file, copy configs/models.example.yaml and adjust the proxy_url values.

Known Good Inference Backends

The gateway is designed to work with any backend that exposes OpenAI-compatible GET /v1/models and POST /v1/chat/completions endpoints. The following applications have been exercised successfully in this repository.

Ollama

Verified directly against http://127.0.0.1:11434
Verified through RoleMesh Gateway proxy routing
Tested with model dolphin3:latest

Example upstream:

models:
  planner:
    type: proxy
    openai_model_name: planner
    proxy_url: http://127.0.0.1:11434
    defaults:
      model: dolphin3:latest

Note: when proxying to Ollama's OpenAI-compatible API, the upstream Ollama model name still needs to be supplied. One simple pattern is to set it in defaults.model and let the gateway inject it.

Llamafile

Verified directly with the newer llamafile runner in tmp-codex/llamafile
Verified through RoleMesh Gateway proxy routing
Verified role switching between two live backends
Tested successfully with:
- phi-2.Q5_K_M.llamafile
- rocket-3b.Q5_K_M.llamafile

Example launch:

./llamafile --server -m /path/to/model.gguf --host 127.0.0.1 --port 8011 --nobrowser

llama.cpp / llama-server

Verified live through the RoleMesh Node Agent on NVIDIA GPUs
Tested with /home/netuser/bin/llama.cpp/build/bin/llama-server
Tested model load and model switching on Tesla P40 GPUs
Tested successfully with:
- gemma-2b-it-q8_0.gguf
- Mistral-7B-Instruct-v0.3-Q5_K_M.gguf

The node agent now waits for llama-server readiness during model load or model switch before proxying the first request, which avoids transient "Loading model" failures on cold start.

Multi-host (node registration)

If you want machines to host backends and “register” them dynamically, run a tiny node agent on each backend host (or just call the registration endpoint from your own tooling).

Gateway endpoint: POST /v1/nodes/register
Node payload describes which roles it serves and the base URL to reach its OpenAI-compatible backend.

See: docs/DEPLOYMENT.md and docs/CONFIG.md.

Status

This repository is a preliminary scaffold:

Proxying to OpenAI-compatible upstreams works.
Registration and load-selection are implemented (basic round-robin).
API-key auth for clients and nodes is available.
Persistence is basic JSON-backed state, not a full service registry.
Gateway proxying has been exercised live with Ollama and llamafile.
Node-agent managed inference has been exercised live with llama-server on CUDA hardware.

License

MIT. See LICENSE.

Node Agent (per-host)

This repo also includes a RoleMesh Node Agent (rolemesh-node-agent) that can manage persistent llama.cpp servers (one per GPU) and report inventory/metrics back to the gateway.

Sample config: configs/node_agent.example.yaml
Docs: docs/NODE_AGENT.md

Safe-by-default binding

Gateway and node-agent default to binding on 127.0.0.1 to avoid accidental exposure. Bind only to private/LAN or VPN interfaces and firewall ports if you need remote access.