Clarified model weight selection in different modes

2026-03-16 22:35:09 -04:00 · 2026-03-16 22:35:09 -04:00 · 0226f7526d
parent 79923983a0
commit 0226f7526d
5 changed files with 119 additions and 0 deletions
--- a/README.md
+++ b/README.md
@ -39,6 +39,59 @@ Examples of project-specific roles:
 If your workflow changes, update the `models:` section in config rather than treating the example roles as required.
 ## Where model weights are defined
 There are two different patterns in this project, and the model-weight location is defined in different places depending on which one you use.
 ### Proxy mode
 In gateway proxy mode, the gateway does **not** point directly to a GGUF or other weight file.
 It only points to an upstream inference server:
 ```yaml
 models:
  planner:
    type: proxy
    proxy_url: http://127.0.0.1:8011
 ```
 In that setup, the actual model weights are chosen by the upstream server itself.
 Examples:
 - `llamafile --server -m /path/to/model.gguf ...`
 - `llama-server -m /path/to/model.gguf ...`
 - Ollama with `defaults.model: some-model-name`
 So in proxy mode:
 - RoleMesh alias `planner` -> upstream server at `proxy_url`
 - upstream server -> actual weight file or model name
 | Upstream type | Where weights/model are chosen | What RoleMesh config provides |
 | --- | --- | --- |
 | `llamafile --server` | CLI `-m /path/to/model.gguf` when the server starts | `proxy_url` |
 | `llama-server` | CLI `-m /path/to/model.gguf` when the server starts | `proxy_url` |
 | Ollama OpenAI-compatible API | request body `model`, often injected via `defaults.model` | `proxy_url` plus optional `defaults.model` |
 ### Node-agent mode
 In node-agent mode, the weight file is defined explicitly in the node-agent config:
 ```yaml
 models:
  - model_id: "planner-gguf"
    path: "/models/SomePlannerModel.Q5_K_M.gguf"
    roles: ["planner"]
 ```
 In that setup:
 - `model_id` is the model name exposed by the node agent
 - `path` is the actual GGUF weight file to load
 - `roles` are the role labels that node can serve if used with discovery
 So in node-agent mode:
 - node-agent `model_id` -> exact weight file path via `path`
 - gateway discovered alias -> node role -> node-agent model load
 ## Quick Start
 This is the fastest path to a working local setup.
@ -89,6 +142,7 @@ models:
 Save that as `configs/models.yaml`.
 You are not limited to `planner` and `writer`. Those are just placeholders for whatever roles your project needs.
 In this proxy example, the actual weight files are defined by the two backend processes started in step 2, not by the gateway config.
 ### 4. Run the gateway
@ -156,6 +210,8 @@ Example launch:
 ./llamafile --server -m /path/to/model.gguf --host 127.0.0.1 --port 8011 --nobrowser
 ```
 In this case, `/path/to/model.gguf` is where the actual weights are chosen, and RoleMesh only points to that running server.
 ### llama.cpp / llama-server
 - Verified live through the RoleMesh Node Agent on NVIDIA GPUs
--- a/configs/models.example.yaml
+++ b/configs/models.example.yaml
@ -19,6 +19,8 @@ auth:
 #  - type: discovered  (resolved from registered nodes by role)
 # The names under "models" are project-defined role aliases, not a fixed built-in list.
 # Rename or replace planner/writer/coder/reviewer with whatever your workflow needs.
 # In proxy mode, the actual weight file is chosen by the upstream server behind proxy_url.
 # In discovered mode, the actual weight file is chosen on the node side (for example via node-agent models[].path).
 models:
  planner:
    type: proxy
--- a/configs/node_agent.example.yaml
+++ b/configs/node_agent.example.yaml
@ -18,6 +18,7 @@ model_roots:
 models:
  - model_id: "planner-gguf"
    # path is the exact GGUF file that this model_id will load when requested
    path: "/models/SomePlannerModel.Q5_K_M.gguf"
    roles: ["planner"]
    default_ctx: 8192
--- a/docs/CONFIG.md
+++ b/docs/CONFIG.md
@ -51,6 +51,51 @@ models:
    proxy_url: http://127.0.0.1:8013
 ```
 ## Where the actual model weights are selected
 This depends on the backend pattern.
 ### For `type: proxy`
 The gateway alias does **not** point directly to a weight file. It points to an already-running inference server:
 ```yaml
 models:
  writer:
    type: proxy
    proxy_url: http://127.0.0.1:8012
 ```
 The actual model weights are chosen by that upstream server, not by RoleMesh Gateway.
 Examples:
 - `llamafile --server -m /path/to/model.gguf ...`
 - `llama-server -m /path/to/model.gguf ...`
 - Ollama with `defaults.model: dolphin3:latest`
 | Upstream type | Where weights/model are chosen | RoleMesh fields involved |
 | --- | --- | --- |
 | `llamafile --server` | backend startup CLI, usually `-m /path/to/model.gguf` | `proxy_url` |
 | `llama-server` | backend startup CLI, usually `-m /path/to/model.gguf` | `proxy_url` |
 | Ollama | request JSON `model`, optionally injected by the gateway | `proxy_url`, `defaults.model` |
 ### For `type: discovered`
 The gateway still does not point directly to a weight file. It points to a role served by a registered node.
 The actual weight file is defined on the node side, usually in the node-agent config:
 ```yaml
 models:
  - model_id: "planner-gguf"
    path: "/models/SomePlannerModel.Q5_K_M.gguf"
    roles: ["planner"]
 ```
 In that setup:
 - gateway alias -> discovered role
 - discovered role -> registered node
 - node-agent `path` -> actual weight file on disk
 ## Proxy models
 Route to a fixed upstream (any host reachable from the gateway):
--- a/docs/NODE_AGENT.md
+++ b/docs/NODE_AGENT.md
@ -7,6 +7,21 @@ The **RoleMesh Node Agent** runs on each compute host and manages **persistent**
 - register + heartbeat to the Dispatcher/Gateway (`/v1/nodes/register`, `/v1/nodes/heartbeat`)
 - report inventory + utilization (`/v1/node/inventory`)
 ## Where the weight file is configured
 For the node agent, the actual model weights are specified directly in the node-agent config under `models[].path`:
 ```yaml
 models:
  - model_id: "planner-gguf"
    path: "/models/SomePlannerModel.Q5_K_M.gguf"
    roles: ["planner"]
 ```
 - `model_id`: name exposed by the node agent API
 - `path`: exact GGUF file to load
 - `roles`: role labels this model can satisfy when the node registers with a gateway
 ## Persistent server model
 For each GPU device, the node agent starts a dedicated `llama-server` process, pinned via