Initial docs

2026-02-09 11:57:19 -05:00 · 2026-02-09 11:57:19 -05:00 · ded73126b8
parent 51e8ea8c60
commit ded73126b8
3 changed files with 374 additions and 1 deletions
--- a/README.md
+++ b/README.md
@ -1,3 +1,109 @@
 # ThreeGate

-**ThreeGate** is a compartmentalized architecture for building **secure, local AI assistants** that perform goal-directed tasks *without* relying on autonomous agents or trusting large language models to behave safely.
+**ThreeGate** is a compartmentalized architecture for building **secure, local AI assistants** that perform goal-directed tasks *without* relying on autonomous agents or trusting large language models to behave safely.
+
+ThreeGate separates **thinking**, **retrieval**, and **execution** into distinct, least-privileged components with enforced trust boundaries.
+
+> If prompt injection is inevitable, safety must come from structure.
+
+---
+
+## What ThreeGate Is
+
+ThreeGate is:
+
+- A **reference architecture** for secure local assistants
+- A **defense-in-depth design** against prompt injection, tool abuse, and data exfiltration
+- A **human-governed system**, not an autonomous agent
+- Designed for **single-user, local operation**
+- Explicitly extensible to multiple roles (research, policy analysis, data science, auditing)
+
+---
+
+## What ThreeGate Is Not
+
+ThreeGate is **not**:
+
+- An autonomous agent framework
+- A self-modifying system
+- A browsing-and-executing AI loop
+- A cloud-first or multi-tenant platform
+- A system that trusts LLM outputs without validation
+
+---
+
+## Core Insight
+
+Most unsafe AI systems fail because they allow a single component to:
+
+> **Read untrusted input, reason about it, and immediately act on the world.**
+
+ThreeGate prevents this by enforcing **three independent gates**:
+
+1. **FETCH** — retrieves untrusted external content
+2. **CORE** — performs reasoning and synthesis
+3. **TOOL-EXEC** — executes code, only when explicitly approved
+
+No component crosses more than one gate.
+
+---
+
+## High-Level Architecture
+
+     Internet
+         ↑
+    [ Managed Proxy ]
+         ↑
+     FETCH (retrieval)
+         ↓
+    Research Packets
+         ↓
+     CORE (analysis)
+         ↓
+(optional, human-approved)
+↓
+TOOL-EXEC (sandboxed execution)
+
+---
+
+## Initial Target Role
+
+The first concrete role implemented using ThreeGate is a:
+
+**Secure Local Research Assistant**
+
+Capabilities:
+- Scholarly retrieval (controlled, allowlisted)
+- Analysis and writing
+- Optional sandboxed computation
+- No autonomous browsing or execution
+
+---
+
+## Repository Structure (Initial)
+
+ThreeGate/
+├── README.md
+├── docs/
+│ ├── architecture.md
+│ ├── threat-model.md
+│ └── why-this-is-safer.md
+
+---
+
+## Status
+
+This repository is in **early specification and reference implementation phase**.
+
+The design is intentionally conservative. Convenience features are added *only* when they preserve trust boundaries.
+
+---
+
+## License & Philosophy
+
+ThreeGate favors:
+- Explicit over implicit authority
+- Structural safety over behavioral promises
+- Human-in-the-loop over automation
+
+If a feature weakens a trust boundary, it does not belong here.
--- a/docs/architecture.md
+++ b/docs/architecture.md
@ -0,0 +1,150 @@
+# ThreeGate Architecture
+
+This document specifies the **ThreeGate architecture**, including components, trust boundaries, data flow, and enforcement mechanisms.
+
+---
+
+## Design Objective
+
+Enable powerful, local, goal-directed AI assistance while preventing:
+
+- Prompt injection (direct and indirect)
+- Tool abuse
+- Data exfiltration
+- Accidental or malicious system modification
+
+This is achieved by **compartmentalization**, not by trusting model behavior.
+
+---
+
+## Core Components
+
+ThreeGate consists of **three isolated components**, each with a distinct role and privilege level.
+
+### 1. CORE — Analysis & Synthesis
+
+**Responsibilities**
+- Reasoning
+- Synthesis
+- Writing
+- Policy interpretation
+- Drafting requests for retrieval or execution
+
+**Explicit Restrictions**
+- No internet access
+- No shell
+- No execution
+- No package installation
+- No modification of policy files
+
+CORE is the **most prompt-exposed component** and therefore the **least powerful**.
+
+---
+
+### 2. FETCH — Controlled Retrieval
+
+**Responsibilities**
+- Retrieve external information
+- Normalize content into *Research Packets*
+
+**Capabilities**
+- HTTPS access only
+- Internet access only via managed proxy
+- Domain allowlists (e.g., academic sources)
+
+**Explicit Restrictions**
+- No execution
+- No shell
+- No persistence beyond packet output
+- No access to CORE state
+
+FETCH treats all retrieved content as hostile by default.
+
+---
+
+### 3. TOOL-EXEC — Optional Execution Sandbox
+
+**Responsibilities**
+- Execute explicitly approved code or commands
+- Perform computations that cannot be done textually
+
+**Implementation**
+- Backed by sandboxed execution (e.g., microVMs such as ERA)
+- Ephemeral by default
+- No network unless explicitly approved
+
+**Explicit Restrictions**
+- No direct access to CORE or FETCH
+- No ambient credentials
+- No persistent state by default
+
+Execution is the **highest-risk capability** and is therefore isolated and human-gated.
+
+---
+
+## Data Flow & Trust Boundaries
+
+All data movement is **one-way** and **validated**.
+
+| From | To | Direction | Validation |
+|----|----|-----------|------------|
+| FETCH | CORE | One-way | Required |
+| CORE | TOOL-EXEC | Draft only | Human approval |
+| TOOL-EXEC | CORE | One-way | Required |
+
+There is **no shared mutable state** between components.
+
+---
+
+## Network Topology
+
+- CORE: no internet route
+- FETCH: internet access only via managed proxy
+- TOOL-EXEC: no network by default
+
+Network restrictions are enforced at:
+- Container network level
+- Host firewall level
+- Explicit proxy allowlists
+
+---
+
+## Policy Enforcement
+
+- Policies are mounted read-only
+- Instruction hierarchy is explicit
+- Tool usage requires justification and approval
+- Outputs are validated before reuse
+
+---
+
+## Failure Containment
+
+If any component is compromised:
+
+- FETCH cannot execute or persist
+- CORE cannot browse or execute
+- TOOL-EXEC cannot exfiltrate or persist by default
+
+Failures are **observable, contained, and reversible**.
+
+---
+
+## Architectural Invariants
+
+The following must never be violated:
+
+1. No component both reasons and acts
+2. No component both browses and executes
+3. External content is hostile by default
+4. Execution is optional and sandboxed
+5. Network access is a scarce privilege
+
+Any extension must preserve these invariants.
+
+---
+
+## Summary
+
+ThreeGate enforces safety by **structure**, not by instruction.  
+It assumes model fallibility and limits consequences accordingly.
--- a/docs/why-this-is-safer.md
+++ b/docs/why-this-is-safer.md
@ -0,0 +1,117 @@
+# Why ThreeGate Is Safer Than Agent-Based Systems
+
+This document explains **why the ThreeGate architecture materially reduces risk** compared to common agent and tool-using AI frameworks.
+
+---
+
+## The Core Problem with Agents
+
+Most agent frameworks combine:
+
+- Untrusted input ingestion
+- Reasoning
+- Tool execution
+- Network access
+- Persistent state
+
+…inside a single loop.
+
+If prompt injection succeeds — and it eventually will — the model can immediately act with real-world authority.
+
+This is known as the **confused deputy problem**.
+
+---
+
+## ThreeGate’s Structural Advantage
+
+ThreeGate prevents confused deputies by **separating authority**.
+
+| Capability | FETCH | CORE | TOOL-EXEC |
+|----------|-------|------|-----------|
+| Internet access | Yes (restricted) | No | No (default) |
+| Reasoning | No | Yes | No |
+| Execution | No | No | Yes (gated) |
+| Persistence | Minimal | Limited | None (default) |
+
+No component has enough authority to cause harm on its own.
+
+---
+
+## Prompt Injection Is Assumed, Not Denied
+
+ThreeGate assumes:
+
+- Prompt injection **cannot be perfectly prevented**
+- Indirect injection via documents and web pages is common
+- Behavioral safeguards alone are insufficient
+
+Therefore:
+- All external content is treated as data, not instructions
+- Outputs are constrained and validated
+- Consequences are limited by topology
+
+---
+
+## Tool Use Is the Primary Risk Multiplier
+
+Execution is where AI systems most often fail catastrophically.
+
+ThreeGate:
+- Makes execution optional
+- Requires explicit human approval
+- Sandboxes execution in an isolated environment
+- Treats execution output as hostile input
+
+This dramatically reduces blast radius compared to agent loops that auto-execute.
+
+---
+
+## Network Access Is Physically Constrained
+
+Many systems rely on the model to “decide responsibly” when using the network.
+
+ThreeGate instead:
+- Removes network access entirely from CORE
+- Forces FETCH through an allowlisted proxy
+- Defaults TOOL-EXEC to no network
+
+This is **security by topology**, not trust.
+
+---
+
+## Residual Risk Is Explicitly Scoped
+
+ThreeGate does **not** claim to defend against:
+
+- Hardware fault induction (e.g., RowHammer)
+- Microarchitectural side channels
+- Kernel or firmware exploits
+- Hostile multi-tenant environments
+
+The system is designed for **single-user local operation** and documents its threat boundaries clearly.
+
+---
+
+## Why This Matters
+
+ThreeGate demonstrates a crucial shift in thinking:
+
+> Safety does not come from making models behave better.  
+> Safety comes from making misbehavior inconsequential.
+
+By breaking the agent loop into gated components, ThreeGate enables powerful assistance **without granting unbounded authority**.
+
+---
+
+## Summary
+
+ThreeGate is safer because it:
+
+- Eliminates confused deputies
+- Treats all external input as hostile
+- Separates reasoning from action
+- Makes execution rare and auditable
+- Enforces policy at the OS and network level
+
+This is not an optimization of existing agent designs.  
+It is a **different class of system**.