Documentation of translation processes
This commit is contained in:
parent
e4f8b14437
commit
9276505339
|
|
@ -22,6 +22,7 @@ Repository layout:
|
||||||
- `docs/roadmap.md`: current milestones and near-term priorities
|
- `docs/roadmap.md`: current milestones and near-term priorities
|
||||||
- `docs/schemas.md`: canonical data models
|
- `docs/schemas.md`: canonical data models
|
||||||
- `docs/deployment.md`: intended deployment approach
|
- `docs/deployment.md`: intended deployment approach
|
||||||
|
- `docs/translation_support.md`: translation-oriented control-plane and node notes
|
||||||
- `docs/demo.md`: first end-to-end control-plus-node demo flow
|
- `docs/demo.md`: first end-to-end control-plus-node demo flow
|
||||||
- `docs/llm_demo.md`: detailed master/peer/client LLM demo runbook
|
- `docs/llm_demo.md`: detailed master/peer/client LLM demo runbook
|
||||||
- `docs/reverse_proxy.md`: safer external exposure patterns
|
- `docs/reverse_proxy.md`: safer external exposure patterns
|
||||||
|
|
|
||||||
|
|
@ -0,0 +1,146 @@
|
||||||
|
# GenieHive Translation Support
|
||||||
|
|
||||||
|
This note describes the control-plane and node configuration needed to support
|
||||||
|
translation clients such as SciSiteForge.
|
||||||
|
|
||||||
|
GenieHive already exposes the core transport needed for translation:
|
||||||
|
|
||||||
|
- `POST /v1/chat/completions`
|
||||||
|
- client API keys
|
||||||
|
- role-based routing
|
||||||
|
- OpenAI-compatible upstream services
|
||||||
|
|
||||||
|
Translation support is mostly a matter of configuration discipline.
|
||||||
|
|
||||||
|
## Control Plane
|
||||||
|
|
||||||
|
The control plane should provide a translation-oriented role or directly
|
||||||
|
addressable model that the client can target.
|
||||||
|
|
||||||
|
Recommended control-plane changes:
|
||||||
|
|
||||||
|
1. Add a dedicated role, for example `scientific_translator`.
|
||||||
|
2. Keep it as `operation: "chat"`.
|
||||||
|
3. Use a conservative prompt policy that returns translation only.
|
||||||
|
4. Prefer a stable, instruction-following model family.
|
||||||
|
5. Keep the role in the loaded role catalog so it appears in `/v1/models`
|
||||||
|
and route-resolution output.
|
||||||
|
|
||||||
|
Example role entry:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
- role_id: "scientific_translator"
|
||||||
|
display_name: "Scientific Translator"
|
||||||
|
description: "Translation-oriented chat route for site localization"
|
||||||
|
operation: "chat"
|
||||||
|
modality: "text"
|
||||||
|
prompt_policy:
|
||||||
|
system_prompt: "Translate faithfully. Preserve meaning, structure, citations, and technical terms. Return only the translation."
|
||||||
|
request_policy:
|
||||||
|
body_defaults:
|
||||||
|
temperature: 0.1
|
||||||
|
routing_policy:
|
||||||
|
preferred_families: ["Qwen3", "Mistral", "Llama"]
|
||||||
|
min_context: 8192
|
||||||
|
require_loaded: true
|
||||||
|
```
|
||||||
|
|
||||||
|
What matters operationally:
|
||||||
|
|
||||||
|
- the role must resolve to a healthy chat service
|
||||||
|
- the role should stay loaded on a model with enough context for page-sized
|
||||||
|
paragraph batches
|
||||||
|
- the control plane should not silently route translation requests to a
|
||||||
|
low-context or partially loaded fallback unless that is explicitly intended
|
||||||
|
|
||||||
|
## Auth and Exposure
|
||||||
|
|
||||||
|
Keep the same separation used for other GenieHive clients:
|
||||||
|
|
||||||
|
- `X-Api-Key` for client requests
|
||||||
|
- `X-GenieHive-Node-Key` for node registration and heartbeats
|
||||||
|
|
||||||
|
If the control plane is exposed beyond localhost, prefer a reverse proxy and
|
||||||
|
keep the upstream control port private.
|
||||||
|
|
||||||
|
## Node Requirements
|
||||||
|
|
||||||
|
A node that is meant to serve translation traffic should expose one or more
|
||||||
|
healthy chat services that can accept small, repeated requests.
|
||||||
|
|
||||||
|
Recommended node configuration:
|
||||||
|
|
||||||
|
- chat service kind: `chat`
|
||||||
|
- runtime: any OpenAI-compatible upstream that GenieHive can route to
|
||||||
|
- assets: a loaded instruction-following model
|
||||||
|
- observed latency and throughput: populated so scoring can prefer the right
|
||||||
|
node
|
||||||
|
- `accept_requests: true`
|
||||||
|
|
||||||
|
Example service snippet:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
services:
|
||||||
|
- service_id: "atlas-01/chat/qwen3-8b"
|
||||||
|
kind: "chat"
|
||||||
|
endpoint: "http://127.0.0.1:18091"
|
||||||
|
runtime:
|
||||||
|
engine: "llama.cpp"
|
||||||
|
launcher: "managed"
|
||||||
|
assets:
|
||||||
|
- asset_id: "qwen3-8b-q4km"
|
||||||
|
loaded: true
|
||||||
|
state:
|
||||||
|
health: "healthy"
|
||||||
|
load_state: "loaded"
|
||||||
|
accept_requests: true
|
||||||
|
observed:
|
||||||
|
p50_latency_ms: 900
|
||||||
|
tokens_per_sec: 40
|
||||||
|
```
|
||||||
|
|
||||||
|
For translation, loaded state matters more than raw capacity. A node that is
|
||||||
|
nominally available but not loaded is a poor default target for a localization
|
||||||
|
job that will touch many pages.
|
||||||
|
|
||||||
|
## Node-Side Practices
|
||||||
|
|
||||||
|
Use the node agent to keep the registry current:
|
||||||
|
|
||||||
|
- heartbeat frequently enough that the control plane sees the service as fresh
|
||||||
|
- publish loaded assets honestly
|
||||||
|
- keep queue and latency metrics current when possible
|
||||||
|
- separate translation services from other high-latency or experimental routes
|
||||||
|
|
||||||
|
If a translation model is available through multiple runtimes, prefer the one
|
||||||
|
that keeps response shape stable and context handling predictable.
|
||||||
|
|
||||||
|
## Routing Advice
|
||||||
|
|
||||||
|
For translation clients, the most useful route behavior is usually:
|
||||||
|
|
||||||
|
- a translation role with a stable model family preference
|
||||||
|
- `require_loaded: true`
|
||||||
|
- enough context to keep paragraph-level requests coherent
|
||||||
|
- predictable prompt policy, not aggressive prompt rewriting
|
||||||
|
|
||||||
|
That keeps the client config simple. The client can point to a role alias and
|
||||||
|
let GenieHive pick the actual service.
|
||||||
|
|
||||||
|
## What Not to Do
|
||||||
|
|
||||||
|
- Do not rely on a transient model alias unless you are willing to update the
|
||||||
|
client config when the alias changes.
|
||||||
|
- Do not expose raw upstream model endpoints directly to the translation client
|
||||||
|
if GenieHive is already in the path.
|
||||||
|
- Do not route translation through a node that cannot maintain enough context
|
||||||
|
for the content size you expect.
|
||||||
|
|
||||||
|
## Minimal Support Checklist
|
||||||
|
|
||||||
|
- translation role present in the role catalog
|
||||||
|
- client API key enabled
|
||||||
|
- node API key enabled
|
||||||
|
- at least one healthy chat service with a loaded model
|
||||||
|
- route resolution confirms the translation role resolves to that service
|
||||||
|
- client can reach the control plane or reverse proxy
|
||||||
Loading…
Reference in New Issue