GenieHive/docs/translation_support.md

4.6 KiB

GenieHive Translation Support

This note describes the control-plane and node configuration needed to support translation clients such as SciSiteForge.

GenieHive already exposes the core transport needed for translation:

  • POST /v1/chat/completions
  • client API keys
  • role-based routing
  • OpenAI-compatible upstream services

Translation support is mostly a matter of configuration discipline.

Control Plane

The control plane should provide a translation-oriented role or directly addressable model that the client can target.

Recommended control-plane changes:

  1. Add a dedicated role, for example scientific_translator.
  2. Keep it as operation: "chat".
  3. Use a conservative prompt policy that returns translation only.
  4. Prefer a stable, instruction-following model family.
  5. Keep the role in the loaded role catalog so it appears in /v1/models and route-resolution output.

Example role entry:

- role_id: "scientific_translator"
  display_name: "Scientific Translator"
  description: "Translation-oriented chat route for site localization"
  operation: "chat"
  modality: "text"
  prompt_policy:
    system_prompt: "Translate faithfully. Preserve meaning, structure, citations, and technical terms. Return only the translation."
    request_policy:
      body_defaults:
        temperature: 0.1
  routing_policy:
    preferred_families: ["Qwen3", "Mistral", "Llama"]
    min_context: 8192
    require_loaded: true

What matters operationally:

  • the role must resolve to a healthy chat service
  • the role should stay loaded on a model with enough context for page-sized paragraph batches
  • the control plane should not silently route translation requests to a low-context or partially loaded fallback unless that is explicitly intended

Auth and Exposure

Keep the same separation used for other GenieHive clients:

  • X-Api-Key for client requests
  • X-GenieHive-Node-Key for node registration and heartbeats

If the control plane is exposed beyond localhost, prefer a reverse proxy and keep the upstream control port private.

Node Requirements

A node that is meant to serve translation traffic should expose one or more healthy chat services that can accept small, repeated requests.

Recommended node configuration:

  • chat service kind: chat
  • runtime: any OpenAI-compatible upstream that GenieHive can route to
  • assets: a loaded instruction-following model
  • observed latency and throughput: populated so scoring can prefer the right node
  • accept_requests: true

Example service snippet:

services:
  - service_id: "atlas-01/chat/qwen3-8b"
    kind: "chat"
    endpoint: "http://127.0.0.1:18091"
    runtime:
      engine: "llama.cpp"
      launcher: "managed"
    assets:
      - asset_id: "qwen3-8b-q4km"
        loaded: true
    state:
      health: "healthy"
      load_state: "loaded"
      accept_requests: true
    observed:
      p50_latency_ms: 900
      tokens_per_sec: 40

For translation, loaded state matters more than raw capacity. A node that is nominally available but not loaded is a poor default target for a localization job that will touch many pages.

Node-Side Practices

Use the node agent to keep the registry current:

  • heartbeat frequently enough that the control plane sees the service as fresh
  • publish loaded assets honestly
  • keep queue and latency metrics current when possible
  • separate translation services from other high-latency or experimental routes

If a translation model is available through multiple runtimes, prefer the one that keeps response shape stable and context handling predictable.

Routing Advice

For translation clients, the most useful route behavior is usually:

  • a translation role with a stable model family preference
  • require_loaded: true
  • enough context to keep paragraph-level requests coherent
  • predictable prompt policy, not aggressive prompt rewriting

That keeps the client config simple. The client can point to a role alias and let GenieHive pick the actual service.

What Not to Do

  • Do not rely on a transient model alias unless you are willing to update the client config when the alias changes.
  • Do not expose raw upstream model endpoints directly to the translation client if GenieHive is already in the path.
  • Do not route translation through a node that cannot maintain enough context for the content size you expect.

Minimal Support Checklist

  • translation role present in the role catalog
  • client API key enabled
  • node API key enabled
  • at least one healthy chat service with a loaded model
  • route resolution confirms the translation role resolves to that service
  • client can reach the control plane or reverse proxy