ThreeGate/docs/threat-model.md

5.5 KiB

Threat Model

This document defines the threat model for ThreeGate, including assets, adversaries, attack surfaces, mitigations, and explicit out-of-scope threats.

ThreeGate is designed for single-user local operation and prioritizes structural containment over behavioral promises.


1. Assets to Protect

Primary Assets

  • User data: notes, drafts, PDFs, research corpora, local documents
  • Secrets: API keys, tokens, credentials, SSH keys, cookies
  • System integrity: host OS, container images, configs, policy files
  • Assistant integrity: component separation, network isolation, validation pipelines
  • Provenance: citations, source traces, execution logs (auditability)

Secondary Assets

  • Model weights and caches (integrity and confidentiality)
  • Execution results and intermediate artifacts
  • System availability (denial of service is relevant but not primary)

2. Adversaries and Capabilities

A. Malicious Content Provider

  • Controls a webpage, PDF, or document that FETCH retrieves or user ingests
  • Attempts indirect prompt injection to cause unsafe actions

Capabilities:

  • Embed malicious instructions and deceptive content
  • Craft content to manipulate citations and reasoning
  • Provide poisoned research artifacts

B. Malicious User (or User Mistake)

  • Provides prompts that request unsafe actions
  • Pastes untrusted code for execution
  • Misconfigures allowlists or mounts

Capabilities:

  • Trigger tool requests
  • Place files into ingestion directories
  • Approve execution unintentionally

C. Supply-Chain Attacker

  • Tampered container images, dependencies, ERA binary, or model weights

Capabilities:

  • Replace artifacts at build or update time
  • Introduce malicious binaries or scripts

D. Network Attacker

  • Attempts MITM, DNS poisoning, or proxy abuse
  • Tries to induce exfiltration through allowed domains

Capabilities:

  • Manipulate network paths
  • Exploit weak TLS validation or DNS configuration

3. Security Goals

G1: Prevent Untrusted Content from Triggering Action

Untrusted documents must not cause execution, installation, persistence, or exfiltration.

G2: Minimize Blast Radius of Compromise

A compromise of any single component must not yield end-to-end authority.

G3: Preserve Auditability

Key actions must be attributable, logged, and reviewable:

  • Fetch operations and sources
  • Packets accepted vs quarantined
  • Execution requests and approvals
  • Execution results and metadata

G4: Enforce Least Privilege by Construction

Topology and filesystem permissions must ensure least privilege even if the model misbehaves.


4. Attack Surfaces

CORE

  • Prompt injection via Research Packets and local documents
  • Attempts to coerce policy violations (“ignore rules”, “run commands”, etc.)
  • Attempts to encode tool requests to bypass human review

FETCH

  • Malicious websites attempting instruction injection
  • Response content masquerading as policy, commands, or credentials
  • Proxy bypass attempts, domain confusion attacks

TOOL-EXEC

  • Malicious code in execution requests (intended or unintended)
  • Attempted sandbox escape (microVM/container breakout)
  • Attempts to write unexpected outputs or encode exfiltration payloads

Shared

  • Handoff directories (malformed artifacts, schema bypass)
  • Proxy allowlist and DNS resolution
  • Container runtime configuration drift

5. Key Mitigations (Mapped to Threats)

M1: Compartmentalization (CORE/FETCH/TOOL-EXEC)

Mitigates end-to-end compromise by ensuring no single component:

  • both browses and executes
  • both reasons and acts

M2: Network Topology Enforcement

  • CORE has no internet route
  • FETCH only via allowlisted proxy
  • TOOL-EXEC no network by default

Mitigates exfiltration and unauthorized retrieval.

M3: Deterministic Validation + Quarantine

  • Research Packets must match strict schema
  • Tool results must match strict schema
  • Rejections go to quarantine; CORE never consumes them

Mitigates indirect injection and “format smuggling.”

M4: Human Approval Gate for Execution

  • CORE may draft requests, but cannot execute
  • Human must promote execution requests into TOOL-EXEC
  • Every execution is logged

Mitigates automated tool abuse.

M5: Read-Only Policy Mounts and Immutable Configuration

  • Policy files mounted read-only into containers
  • Configuration changes require explicit operator action

Mitigates self-modification and persistence via prompt.

  • Pin image digests
  • Verify releases (hash/signature where possible)
  • Keep minimal base images
  • Prefer reproducible builds

Mitigates tampered artifacts.


6. Explicit Out-of-Scope Threats

ThreeGate does not attempt to mitigate:

  • Hardware fault induction (e.g., RowHammer)
  • Microarchitectural side channels
  • Kernel/firmware compromise
  • Hostile multi-tenant co-residency scenarios

These threats are not aligned with the intended single-user local operating assumptions.


7. Residual Risks

Even with compartmentalization, residual risks include:

  • User approving unsafe execution requests
  • Allowlist misconfiguration enabling exfiltration channels
  • Supply-chain compromise of container images or binaries
  • Weak local host hygiene (unpatched kernel, insecure Docker daemon)

ThreeGate reduces consequences, but cannot replace operator diligence.


8. Security Posture Summary

ThreeGate assumes model fallibility and focuses on:

  • strict separation of duties
  • deterministic validation
  • constrained connectivity
  • human-gated execution
  • auditable workflows