Documentation of translation processes

2026-04-29 15:30:19 +00:00 · 2026-04-29 15:30:19 +00:00 · 9276505339
parent e4f8b14437
commit 9276505339
2 changed files with 147 additions and 0 deletions
--- a/README.md
+++ b/README.md
@ -22,6 +22,7 @@ Repository layout:
 - `docs/roadmap.md`: current milestones and near-term priorities
 - `docs/schemas.md`: canonical data models
 - `docs/deployment.md`: intended deployment approach
 - `docs/translation_support.md`: translation-oriented control-plane and node notes
 - `docs/demo.md`: first end-to-end control-plus-node demo flow
 - `docs/llm_demo.md`: detailed master/peer/client LLM demo runbook
 - `docs/reverse_proxy.md`: safer external exposure patterns
--- a/docs/translation_support.md
+++ b/docs/translation_support.md
@ -0,0 +1,146 @@
 # GenieHive Translation Support
 This note describes the control-plane and node configuration needed to support
 translation clients such as SciSiteForge.
 GenieHive already exposes the core transport needed for translation:
 - `POST /v1/chat/completions`
 - client API keys
 - role-based routing
 - OpenAI-compatible upstream services
 Translation support is mostly a matter of configuration discipline.
 ## Control Plane
 The control plane should provide a translation-oriented role or directly
 addressable model that the client can target.
 Recommended control-plane changes:
 1. Add a dedicated role, for example `scientific_translator`.
 2. Keep it as `operation: "chat"`.
 3. Use a conservative prompt policy that returns translation only.
 4. Prefer a stable, instruction-following model family.
 5. Keep the role in the loaded role catalog so it appears in `/v1/models`
   and route-resolution output.
 Example role entry:
 ```yaml
 - role_id: "scientific_translator"
  display_name: "Scientific Translator"
  description: "Translation-oriented chat route for site localization"
  operation: "chat"
  modality: "text"
  prompt_policy:
    system_prompt: "Translate faithfully. Preserve meaning, structure, citations, and technical terms. Return only the translation."
    request_policy:
      body_defaults:
        temperature: 0.1
  routing_policy:
    preferred_families: ["Qwen3", "Mistral", "Llama"]
    min_context: 8192
    require_loaded: true
 ```
 What matters operationally:
 - the role must resolve to a healthy chat service
 - the role should stay loaded on a model with enough context for page-sized
  paragraph batches
 - the control plane should not silently route translation requests to a
  low-context or partially loaded fallback unless that is explicitly intended
 ## Auth and Exposure
 Keep the same separation used for other GenieHive clients:
 - `X-Api-Key` for client requests
 - `X-GenieHive-Node-Key` for node registration and heartbeats
 If the control plane is exposed beyond localhost, prefer a reverse proxy and
 keep the upstream control port private.
 ## Node Requirements
 A node that is meant to serve translation traffic should expose one or more
 healthy chat services that can accept small, repeated requests.
 Recommended node configuration:
 - chat service kind: `chat`
 - runtime: any OpenAI-compatible upstream that GenieHive can route to
 - assets: a loaded instruction-following model
 - observed latency and throughput: populated so scoring can prefer the right
  node
 - `accept_requests: true`
 Example service snippet:
 ```yaml
 services:
  - service_id: "atlas-01/chat/qwen3-8b"
    kind: "chat"
    endpoint: "http://127.0.0.1:18091"
    runtime:
      engine: "llama.cpp"
      launcher: "managed"
    assets:
      - asset_id: "qwen3-8b-q4km"
        loaded: true
    state:
      health: "healthy"
      load_state: "loaded"
      accept_requests: true
    observed:
      p50_latency_ms: 900
      tokens_per_sec: 40
 ```
 For translation, loaded state matters more than raw capacity. A node that is
 nominally available but not loaded is a poor default target for a localization
 job that will touch many pages.
 ## Node-Side Practices
 Use the node agent to keep the registry current:
 - heartbeat frequently enough that the control plane sees the service as fresh
 - publish loaded assets honestly
 - keep queue and latency metrics current when possible
 - separate translation services from other high-latency or experimental routes
 If a translation model is available through multiple runtimes, prefer the one
 that keeps response shape stable and context handling predictable.
 ## Routing Advice
 For translation clients, the most useful route behavior is usually:
 - a translation role with a stable model family preference
 - `require_loaded: true`
 - enough context to keep paragraph-level requests coherent
 - predictable prompt policy, not aggressive prompt rewriting
 That keeps the client config simple. The client can point to a role alias and
 let GenieHive pick the actual service.
 ## What Not to Do
 - Do not rely on a transient model alias unless you are willing to update the
  client config when the alias changes.
 - Do not expose raw upstream model endpoints directly to the translation client
  if GenieHive is already in the path.
 - Do not route translation through a node that cannot maintain enough context
  for the content size you expect.
 ## Minimal Support Checklist
 - translation role present in the role catalog
 - client API key enabled
 - node API key enabled
 - at least one healthy chat service with a loaded model
 - route resolution confirms the translation role resolves to that service
 - client can reach the control plane or reverse proxy