Clarified model weight selection in different modes
This commit is contained in:
parent
79923983a0
commit
0226f7526d
56
README.md
56
README.md
|
|
@ -39,6 +39,59 @@ Examples of project-specific roles:
|
||||||
|
|
||||||
If your workflow changes, update the `models:` section in config rather than treating the example roles as required.
|
If your workflow changes, update the `models:` section in config rather than treating the example roles as required.
|
||||||
|
|
||||||
|
## Where model weights are defined
|
||||||
|
|
||||||
|
There are two different patterns in this project, and the model-weight location is defined in different places depending on which one you use.
|
||||||
|
|
||||||
|
### Proxy mode
|
||||||
|
|
||||||
|
In gateway proxy mode, the gateway does **not** point directly to a GGUF or other weight file.
|
||||||
|
It only points to an upstream inference server:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
models:
|
||||||
|
planner:
|
||||||
|
type: proxy
|
||||||
|
proxy_url: http://127.0.0.1:8011
|
||||||
|
```
|
||||||
|
|
||||||
|
In that setup, the actual model weights are chosen by the upstream server itself.
|
||||||
|
Examples:
|
||||||
|
|
||||||
|
- `llamafile --server -m /path/to/model.gguf ...`
|
||||||
|
- `llama-server -m /path/to/model.gguf ...`
|
||||||
|
- Ollama with `defaults.model: some-model-name`
|
||||||
|
|
||||||
|
So in proxy mode:
|
||||||
|
- RoleMesh alias `planner` -> upstream server at `proxy_url`
|
||||||
|
- upstream server -> actual weight file or model name
|
||||||
|
|
||||||
|
| Upstream type | Where weights/model are chosen | What RoleMesh config provides |
|
||||||
|
| --- | --- | --- |
|
||||||
|
| `llamafile --server` | CLI `-m /path/to/model.gguf` when the server starts | `proxy_url` |
|
||||||
|
| `llama-server` | CLI `-m /path/to/model.gguf` when the server starts | `proxy_url` |
|
||||||
|
| Ollama OpenAI-compatible API | request body `model`, often injected via `defaults.model` | `proxy_url` plus optional `defaults.model` |
|
||||||
|
|
||||||
|
### Node-agent mode
|
||||||
|
|
||||||
|
In node-agent mode, the weight file is defined explicitly in the node-agent config:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
models:
|
||||||
|
- model_id: "planner-gguf"
|
||||||
|
path: "/models/SomePlannerModel.Q5_K_M.gguf"
|
||||||
|
roles: ["planner"]
|
||||||
|
```
|
||||||
|
|
||||||
|
In that setup:
|
||||||
|
- `model_id` is the model name exposed by the node agent
|
||||||
|
- `path` is the actual GGUF weight file to load
|
||||||
|
- `roles` are the role labels that node can serve if used with discovery
|
||||||
|
|
||||||
|
So in node-agent mode:
|
||||||
|
- node-agent `model_id` -> exact weight file path via `path`
|
||||||
|
- gateway discovered alias -> node role -> node-agent model load
|
||||||
|
|
||||||
## Quick Start
|
## Quick Start
|
||||||
|
|
||||||
This is the fastest path to a working local setup.
|
This is the fastest path to a working local setup.
|
||||||
|
|
@ -89,6 +142,7 @@ models:
|
||||||
Save that as `configs/models.yaml`.
|
Save that as `configs/models.yaml`.
|
||||||
|
|
||||||
You are not limited to `planner` and `writer`. Those are just placeholders for whatever roles your project needs.
|
You are not limited to `planner` and `writer`. Those are just placeholders for whatever roles your project needs.
|
||||||
|
In this proxy example, the actual weight files are defined by the two backend processes started in step 2, not by the gateway config.
|
||||||
|
|
||||||
### 4. Run the gateway
|
### 4. Run the gateway
|
||||||
|
|
||||||
|
|
@ -156,6 +210,8 @@ Example launch:
|
||||||
./llamafile --server -m /path/to/model.gguf --host 127.0.0.1 --port 8011 --nobrowser
|
./llamafile --server -m /path/to/model.gguf --host 127.0.0.1 --port 8011 --nobrowser
|
||||||
```
|
```
|
||||||
|
|
||||||
|
In this case, `/path/to/model.gguf` is where the actual weights are chosen, and RoleMesh only points to that running server.
|
||||||
|
|
||||||
### llama.cpp / llama-server
|
### llama.cpp / llama-server
|
||||||
|
|
||||||
- Verified live through the RoleMesh Node Agent on NVIDIA GPUs
|
- Verified live through the RoleMesh Node Agent on NVIDIA GPUs
|
||||||
|
|
|
||||||
|
|
@ -19,6 +19,8 @@ auth:
|
||||||
# - type: discovered (resolved from registered nodes by role)
|
# - type: discovered (resolved from registered nodes by role)
|
||||||
# The names under "models" are project-defined role aliases, not a fixed built-in list.
|
# The names under "models" are project-defined role aliases, not a fixed built-in list.
|
||||||
# Rename or replace planner/writer/coder/reviewer with whatever your workflow needs.
|
# Rename or replace planner/writer/coder/reviewer with whatever your workflow needs.
|
||||||
|
# In proxy mode, the actual weight file is chosen by the upstream server behind proxy_url.
|
||||||
|
# In discovered mode, the actual weight file is chosen on the node side (for example via node-agent models[].path).
|
||||||
models:
|
models:
|
||||||
planner:
|
planner:
|
||||||
type: proxy
|
type: proxy
|
||||||
|
|
|
||||||
|
|
@ -18,6 +18,7 @@ model_roots:
|
||||||
|
|
||||||
models:
|
models:
|
||||||
- model_id: "planner-gguf"
|
- model_id: "planner-gguf"
|
||||||
|
# path is the exact GGUF file that this model_id will load when requested
|
||||||
path: "/models/SomePlannerModel.Q5_K_M.gguf"
|
path: "/models/SomePlannerModel.Q5_K_M.gguf"
|
||||||
roles: ["planner"]
|
roles: ["planner"]
|
||||||
default_ctx: 8192
|
default_ctx: 8192
|
||||||
|
|
|
||||||
|
|
@ -51,6 +51,51 @@ models:
|
||||||
proxy_url: http://127.0.0.1:8013
|
proxy_url: http://127.0.0.1:8013
|
||||||
```
|
```
|
||||||
|
|
||||||
|
## Where the actual model weights are selected
|
||||||
|
|
||||||
|
This depends on the backend pattern.
|
||||||
|
|
||||||
|
### For `type: proxy`
|
||||||
|
|
||||||
|
The gateway alias does **not** point directly to a weight file. It points to an already-running inference server:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
models:
|
||||||
|
writer:
|
||||||
|
type: proxy
|
||||||
|
proxy_url: http://127.0.0.1:8012
|
||||||
|
```
|
||||||
|
|
||||||
|
The actual model weights are chosen by that upstream server, not by RoleMesh Gateway.
|
||||||
|
|
||||||
|
Examples:
|
||||||
|
- `llamafile --server -m /path/to/model.gguf ...`
|
||||||
|
- `llama-server -m /path/to/model.gguf ...`
|
||||||
|
- Ollama with `defaults.model: dolphin3:latest`
|
||||||
|
|
||||||
|
| Upstream type | Where weights/model are chosen | RoleMesh fields involved |
|
||||||
|
| --- | --- | --- |
|
||||||
|
| `llamafile --server` | backend startup CLI, usually `-m /path/to/model.gguf` | `proxy_url` |
|
||||||
|
| `llama-server` | backend startup CLI, usually `-m /path/to/model.gguf` | `proxy_url` |
|
||||||
|
| Ollama | request JSON `model`, optionally injected by the gateway | `proxy_url`, `defaults.model` |
|
||||||
|
|
||||||
|
### For `type: discovered`
|
||||||
|
|
||||||
|
The gateway still does not point directly to a weight file. It points to a role served by a registered node.
|
||||||
|
The actual weight file is defined on the node side, usually in the node-agent config:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
models:
|
||||||
|
- model_id: "planner-gguf"
|
||||||
|
path: "/models/SomePlannerModel.Q5_K_M.gguf"
|
||||||
|
roles: ["planner"]
|
||||||
|
```
|
||||||
|
|
||||||
|
In that setup:
|
||||||
|
- gateway alias -> discovered role
|
||||||
|
- discovered role -> registered node
|
||||||
|
- node-agent `path` -> actual weight file on disk
|
||||||
|
|
||||||
## Proxy models
|
## Proxy models
|
||||||
|
|
||||||
Route to a fixed upstream (any host reachable from the gateway):
|
Route to a fixed upstream (any host reachable from the gateway):
|
||||||
|
|
|
||||||
|
|
@ -7,6 +7,21 @@ The **RoleMesh Node Agent** runs on each compute host and manages **persistent**
|
||||||
- register + heartbeat to the Dispatcher/Gateway (`/v1/nodes/register`, `/v1/nodes/heartbeat`)
|
- register + heartbeat to the Dispatcher/Gateway (`/v1/nodes/register`, `/v1/nodes/heartbeat`)
|
||||||
- report inventory + utilization (`/v1/node/inventory`)
|
- report inventory + utilization (`/v1/node/inventory`)
|
||||||
|
|
||||||
|
## Where the weight file is configured
|
||||||
|
|
||||||
|
For the node agent, the actual model weights are specified directly in the node-agent config under `models[].path`:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
models:
|
||||||
|
- model_id: "planner-gguf"
|
||||||
|
path: "/models/SomePlannerModel.Q5_K_M.gguf"
|
||||||
|
roles: ["planner"]
|
||||||
|
```
|
||||||
|
|
||||||
|
- `model_id`: name exposed by the node agent API
|
||||||
|
- `path`: exact GGUF file to load
|
||||||
|
- `roles`: role labels this model can satisfy when the node registers with a gateway
|
||||||
|
|
||||||
## Persistent server model
|
## Persistent server model
|
||||||
|
|
||||||
For each GPU device, the node agent starts a dedicated `llama-server` process, pinned via
|
For each GPU device, the node agent starts a dedicated `llama-server` process, pinned via
|
||||||
|
|
|
||||||
Loading…
Reference in New Issue