Initial docs

2026-02-09 11:57:19 -05:00 · 2026-02-09 11:57:19 -05:00 · ded73126b8
parent 51e8ea8c60
commit ded73126b8
3 changed files with 374 additions and 1 deletions
--- a/README.md
+++ b/README.md
@ -1,3 +1,109 @@
 # ThreeGate
 **ThreeGate** is a compartmentalized architecture for building **secure, local AI assistants** that perform goal-directed tasks *without* relying on autonomous agents or trusting large language models to behave safely.
 ThreeGate separates **thinking**, **retrieval**, and **execution** into distinct, least-privileged components with enforced trust boundaries.
 > If prompt injection is inevitable, safety must come from structure.
 ---
 ## What ThreeGate Is
 ThreeGate is:
 - A **reference architecture** for secure local assistants
 - A **defense-in-depth design** against prompt injection, tool abuse, and data exfiltration
 - A **human-governed system**, not an autonomous agent
 - Designed for **single-user, local operation**
 - Explicitly extensible to multiple roles (research, policy analysis, data science, auditing)
 ---
 ## What ThreeGate Is Not
 ThreeGate is **not**:
 - An autonomous agent framework
 - A self-modifying system
 - A browsing-and-executing AI loop
 - A cloud-first or multi-tenant platform
 - A system that trusts LLM outputs without validation
 ---
 ## Core Insight
 Most unsafe AI systems fail because they allow a single component to:
 > **Read untrusted input, reason about it, and immediately act on the world.**
 ThreeGate prevents this by enforcing **three independent gates**:
 1. **FETCH** — retrieves untrusted external content
 2. **CORE** — performs reasoning and synthesis
 3. **TOOL-EXEC** — executes code, only when explicitly approved
 No component crosses more than one gate.
 ---
 ## High-Level Architecture
     Internet
         ↑
    [ Managed Proxy ]
         ↑
     FETCH (retrieval)
         ↓
    Research Packets
         ↓
     CORE (analysis)
         ↓
 (optional, human-approved)
 ↓
 TOOL-EXEC (sandboxed execution)
 ---
 ## Initial Target Role
 The first concrete role implemented using ThreeGate is a:
 **Secure Local Research Assistant**
 Capabilities:
 - Scholarly retrieval (controlled, allowlisted)
 - Analysis and writing
 - Optional sandboxed computation
 - No autonomous browsing or execution
 ---
 ## Repository Structure (Initial)
 ThreeGate/
 ├── README.md
 ├── docs/
 │ ├── architecture.md
 │ ├── threat-model.md
 │ └── why-this-is-safer.md
 ---
 ## Status
 This repository is in **early specification and reference implementation phase**.
 The design is intentionally conservative. Convenience features are added *only* when they preserve trust boundaries.
 ---
 ## License & Philosophy
 ThreeGate favors:
 - Explicit over implicit authority
 - Structural safety over behavioral promises
 - Human-in-the-loop over automation
 If a feature weakens a trust boundary, it does not belong here.
--- a/docs/architecture.md
+++ b/docs/architecture.md
@ -0,0 +1,150 @@
 # ThreeGate Architecture
 This document specifies the **ThreeGate architecture**, including components, trust boundaries, data flow, and enforcement mechanisms.
 ---
 ## Design Objective
 Enable powerful, local, goal-directed AI assistance while preventing:
 - Prompt injection (direct and indirect)
 - Tool abuse
 - Data exfiltration
 - Accidental or malicious system modification
 This is achieved by **compartmentalization**, not by trusting model behavior.
 ---
 ## Core Components
 ThreeGate consists of **three isolated components**, each with a distinct role and privilege level.
 ### 1. CORE — Analysis & Synthesis
 **Responsibilities**
 - Reasoning
 - Synthesis
 - Writing
 - Policy interpretation
 - Drafting requests for retrieval or execution
 **Explicit Restrictions**
 - No internet access
 - No shell
 - No execution
 - No package installation
 - No modification of policy files
 CORE is the **most prompt-exposed component** and therefore the **least powerful**.
 ---
 ### 2. FETCH — Controlled Retrieval
 **Responsibilities**
 - Retrieve external information
 - Normalize content into *Research Packets*
 **Capabilities**
 - HTTPS access only
 - Internet access only via managed proxy
 - Domain allowlists (e.g., academic sources)
 **Explicit Restrictions**
 - No execution
 - No shell
 - No persistence beyond packet output
 - No access to CORE state
 FETCH treats all retrieved content as hostile by default.
 ---
 ### 3. TOOL-EXEC — Optional Execution Sandbox
 **Responsibilities**
 - Execute explicitly approved code or commands
 - Perform computations that cannot be done textually
 **Implementation**
 - Backed by sandboxed execution (e.g., microVMs such as ERA)
 - Ephemeral by default
 - No network unless explicitly approved
 **Explicit Restrictions**
 - No direct access to CORE or FETCH
 - No ambient credentials
 - No persistent state by default
 Execution is the **highest-risk capability** and is therefore isolated and human-gated.
 ---
 ## Data Flow & Trust Boundaries
 All data movement is **one-way** and **validated**.
 | From | To | Direction | Validation |
 |----|----|-----------|------------|
 | FETCH | CORE | One-way | Required |
 | CORE | TOOL-EXEC | Draft only | Human approval |
 | TOOL-EXEC | CORE | One-way | Required |
 There is **no shared mutable state** between components.
 ---
 ## Network Topology
 - CORE: no internet route
 - FETCH: internet access only via managed proxy
 - TOOL-EXEC: no network by default
 Network restrictions are enforced at:
 - Container network level
 - Host firewall level
 - Explicit proxy allowlists
 ---
 ## Policy Enforcement
 - Policies are mounted read-only
 - Instruction hierarchy is explicit
 - Tool usage requires justification and approval
 - Outputs are validated before reuse
 ---
 ## Failure Containment
 If any component is compromised:
 - FETCH cannot execute or persist
 - CORE cannot browse or execute
 - TOOL-EXEC cannot exfiltrate or persist by default
 Failures are **observable, contained, and reversible**.
 ---
 ## Architectural Invariants
 The following must never be violated:
 1. No component both reasons and acts
 2. No component both browses and executes
 3. External content is hostile by default
 4. Execution is optional and sandboxed
 5. Network access is a scarce privilege
 Any extension must preserve these invariants.
 ---
 ## Summary
 ThreeGate enforces safety by **structure**, not by instruction.  
 It assumes model fallibility and limits consequences accordingly.
--- a/docs/why-this-is-safer.md
+++ b/docs/why-this-is-safer.md
@ -0,0 +1,117 @@
 # Why ThreeGate Is Safer Than Agent-Based Systems
 This document explains **why the ThreeGate architecture materially reduces risk** compared to common agent and tool-using AI frameworks.
 ---
 ## The Core Problem with Agents
 Most agent frameworks combine:
 - Untrusted input ingestion
 - Reasoning
 - Tool execution
 - Network access
 - Persistent state
 …inside a single loop.
 If prompt injection succeeds — and it eventually will — the model can immediately act with real-world authority.
 This is known as the **confused deputy problem**.
 ---
 ## ThreeGate’s Structural Advantage
 ThreeGate prevents confused deputies by **separating authority**.
 | Capability | FETCH | CORE | TOOL-EXEC |
 |----------|-------|------|-----------|
 | Internet access | Yes (restricted) | No | No (default) |
 | Reasoning | No | Yes | No |
 | Execution | No | No | Yes (gated) |
 | Persistence | Minimal | Limited | None (default) |
 No component has enough authority to cause harm on its own.
 ---
 ## Prompt Injection Is Assumed, Not Denied
 ThreeGate assumes:
 - Prompt injection **cannot be perfectly prevented**
 - Indirect injection via documents and web pages is common
 - Behavioral safeguards alone are insufficient
 Therefore:
 - All external content is treated as data, not instructions
 - Outputs are constrained and validated
 - Consequences are limited by topology
 ---
 ## Tool Use Is the Primary Risk Multiplier
 Execution is where AI systems most often fail catastrophically.
 ThreeGate:
 - Makes execution optional
 - Requires explicit human approval
 - Sandboxes execution in an isolated environment
 - Treats execution output as hostile input
 This dramatically reduces blast radius compared to agent loops that auto-execute.
 ---
 ## Network Access Is Physically Constrained
 Many systems rely on the model to “decide responsibly” when using the network.
 ThreeGate instead:
 - Removes network access entirely from CORE
 - Forces FETCH through an allowlisted proxy
 - Defaults TOOL-EXEC to no network
 This is **security by topology**, not trust.
 ---
 ## Residual Risk Is Explicitly Scoped
 ThreeGate does **not** claim to defend against:
 - Hardware fault induction (e.g., RowHammer)
 - Microarchitectural side channels
 - Kernel or firmware exploits
 - Hostile multi-tenant environments
 The system is designed for **single-user local operation** and documents its threat boundaries clearly.
 ---
 ## Why This Matters
 ThreeGate demonstrates a crucial shift in thinking:
 > Safety does not come from making models behave better.  
 > Safety comes from making misbehavior inconsequential.
 By breaking the agent loop into gated components, ThreeGate enables powerful assistance **without granting unbounded authority**.
 ---
 ## Summary
 ThreeGate is safer because it:
 - Eliminates confused deputies
 - Treats all external input as hostile
 - Separates reasoning from action
 - Makes execution rare and auditable
 - Enforces policy at the OS and network level
 This is not an optimization of existing agent designs.  
 It is a **different class of system**.