1.4 KiB
1.4 KiB
Node Agent
The RoleMesh Node Agent runs on each compute host and manages persistent llama.cpp servers
(one per device, e.g. one per GPU). It can:
- expose OpenAI-compatible endpoints locally (
/v1/models,/v1/chat/completions) - register + heartbeat to the Dispatcher/Gateway (
/v1/nodes/register,/v1/nodes/heartbeat) - report inventory + utilization (
/v1/node/inventory)
Persistent server model
For each GPU device, the node agent starts a dedicated llama-server process, pinned via
environment variables (e.g. CUDA_VISIBLE_DEVICES=0 for gpu:0) and bound to 127.0.0.1:<port>.
Model switching is handled by restart in the scaffold.
Backends
Adapters are implemented as runtime backends:
cuda: scaffold implementation (NVIDIA vianvidia-smi)metal,rocm,sycl,vulkan: stubs with placeholders for device discovery and metrics
The framework keeps scheduling decisions backend-agnostic by standardizing on:
DeviceRef + DeviceMetrics + ensure_server(...).
Running
pip install -e .
rolemesh-node-agent --config configs/node_agent.example.yaml
Registering
If dispatcher_base_url is set in the node-agent config, the node agent will periodically call:
POST <dispatcher>/v1/nodes/heartbeatwith latest device metrics.
Registration is currently manual from the node side (or can be added as a startup step).