diff --git a/README.md b/README.md index eb02f00..c37b5ca 100644 --- a/README.md +++ b/README.md @@ -1,3 +1,109 @@ # ThreeGate -**ThreeGate** is a compartmentalized architecture for building **secure, local AI assistants** that perform goal-directed tasks *without* relying on autonomous agents or trusting large language models to behave safely. \ No newline at end of file +**ThreeGate** is a compartmentalized architecture for building **secure, local AI assistants** that perform goal-directed tasks *without* relying on autonomous agents or trusting large language models to behave safely. + +ThreeGate separates **thinking**, **retrieval**, and **execution** into distinct, least-privileged components with enforced trust boundaries. + +> If prompt injection is inevitable, safety must come from structure. + +--- + +## What ThreeGate Is + +ThreeGate is: + +- A **reference architecture** for secure local assistants +- A **defense-in-depth design** against prompt injection, tool abuse, and data exfiltration +- A **human-governed system**, not an autonomous agent +- Designed for **single-user, local operation** +- Explicitly extensible to multiple roles (research, policy analysis, data science, auditing) + +--- + +## What ThreeGate Is Not + +ThreeGate is **not**: + +- An autonomous agent framework +- A self-modifying system +- A browsing-and-executing AI loop +- A cloud-first or multi-tenant platform +- A system that trusts LLM outputs without validation + +--- + +## Core Insight + +Most unsafe AI systems fail because they allow a single component to: + +> **Read untrusted input, reason about it, and immediately act on the world.** + +ThreeGate prevents this by enforcing **three independent gates**: + +1. **FETCH** — retrieves untrusted external content +2. **CORE** — performs reasoning and synthesis +3. **TOOL-EXEC** — executes code, only when explicitly approved + +No component crosses more than one gate. + +--- + +## High-Level Architecture + + Internet + ↑ + [ Managed Proxy ] + ↑ + FETCH (retrieval) + ↓ + Research Packets + ↓ + CORE (analysis) + ↓ +(optional, human-approved) +↓ +TOOL-EXEC (sandboxed execution) + +--- + +## Initial Target Role + +The first concrete role implemented using ThreeGate is a: + +**Secure Local Research Assistant** + +Capabilities: +- Scholarly retrieval (controlled, allowlisted) +- Analysis and writing +- Optional sandboxed computation +- No autonomous browsing or execution + +--- + +## Repository Structure (Initial) + +ThreeGate/ +├── README.md +├── docs/ +│ ├── architecture.md +│ ├── threat-model.md +│ └── why-this-is-safer.md + +--- + +## Status + +This repository is in **early specification and reference implementation phase**. + +The design is intentionally conservative. Convenience features are added *only* when they preserve trust boundaries. + +--- + +## License & Philosophy + +ThreeGate favors: +- Explicit over implicit authority +- Structural safety over behavioral promises +- Human-in-the-loop over automation + +If a feature weakens a trust boundary, it does not belong here. diff --git a/docs/architecture.md b/docs/architecture.md new file mode 100644 index 0000000..3adf340 --- /dev/null +++ b/docs/architecture.md @@ -0,0 +1,150 @@ +# ThreeGate Architecture + +This document specifies the **ThreeGate architecture**, including components, trust boundaries, data flow, and enforcement mechanisms. + +--- + +## Design Objective + +Enable powerful, local, goal-directed AI assistance while preventing: + +- Prompt injection (direct and indirect) +- Tool abuse +- Data exfiltration +- Accidental or malicious system modification + +This is achieved by **compartmentalization**, not by trusting model behavior. + +--- + +## Core Components + +ThreeGate consists of **three isolated components**, each with a distinct role and privilege level. + +### 1. CORE — Analysis & Synthesis + +**Responsibilities** +- Reasoning +- Synthesis +- Writing +- Policy interpretation +- Drafting requests for retrieval or execution + +**Explicit Restrictions** +- No internet access +- No shell +- No execution +- No package installation +- No modification of policy files + +CORE is the **most prompt-exposed component** and therefore the **least powerful**. + +--- + +### 2. FETCH — Controlled Retrieval + +**Responsibilities** +- Retrieve external information +- Normalize content into *Research Packets* + +**Capabilities** +- HTTPS access only +- Internet access only via managed proxy +- Domain allowlists (e.g., academic sources) + +**Explicit Restrictions** +- No execution +- No shell +- No persistence beyond packet output +- No access to CORE state + +FETCH treats all retrieved content as hostile by default. + +--- + +### 3. TOOL-EXEC — Optional Execution Sandbox + +**Responsibilities** +- Execute explicitly approved code or commands +- Perform computations that cannot be done textually + +**Implementation** +- Backed by sandboxed execution (e.g., microVMs such as ERA) +- Ephemeral by default +- No network unless explicitly approved + +**Explicit Restrictions** +- No direct access to CORE or FETCH +- No ambient credentials +- No persistent state by default + +Execution is the **highest-risk capability** and is therefore isolated and human-gated. + +--- + +## Data Flow & Trust Boundaries + +All data movement is **one-way** and **validated**. + +| From | To | Direction | Validation | +|----|----|-----------|------------| +| FETCH | CORE | One-way | Required | +| CORE | TOOL-EXEC | Draft only | Human approval | +| TOOL-EXEC | CORE | One-way | Required | + +There is **no shared mutable state** between components. + +--- + +## Network Topology + +- CORE: no internet route +- FETCH: internet access only via managed proxy +- TOOL-EXEC: no network by default + +Network restrictions are enforced at: +- Container network level +- Host firewall level +- Explicit proxy allowlists + +--- + +## Policy Enforcement + +- Policies are mounted read-only +- Instruction hierarchy is explicit +- Tool usage requires justification and approval +- Outputs are validated before reuse + +--- + +## Failure Containment + +If any component is compromised: + +- FETCH cannot execute or persist +- CORE cannot browse or execute +- TOOL-EXEC cannot exfiltrate or persist by default + +Failures are **observable, contained, and reversible**. + +--- + +## Architectural Invariants + +The following must never be violated: + +1. No component both reasons and acts +2. No component both browses and executes +3. External content is hostile by default +4. Execution is optional and sandboxed +5. Network access is a scarce privilege + +Any extension must preserve these invariants. + +--- + +## Summary + +ThreeGate enforces safety by **structure**, not by instruction. +It assumes model fallibility and limits consequences accordingly. diff --git a/docs/why-this-is-safer.md b/docs/why-this-is-safer.md new file mode 100644 index 0000000..16ce357 --- /dev/null +++ b/docs/why-this-is-safer.md @@ -0,0 +1,117 @@ +# Why ThreeGate Is Safer Than Agent-Based Systems + +This document explains **why the ThreeGate architecture materially reduces risk** compared to common agent and tool-using AI frameworks. + +--- + +## The Core Problem with Agents + +Most agent frameworks combine: + +- Untrusted input ingestion +- Reasoning +- Tool execution +- Network access +- Persistent state + +…inside a single loop. + +If prompt injection succeeds — and it eventually will — the model can immediately act with real-world authority. + +This is known as the **confused deputy problem**. + +--- + +## ThreeGate’s Structural Advantage + +ThreeGate prevents confused deputies by **separating authority**. + +| Capability | FETCH | CORE | TOOL-EXEC | +|----------|-------|------|-----------| +| Internet access | Yes (restricted) | No | No (default) | +| Reasoning | No | Yes | No | +| Execution | No | No | Yes (gated) | +| Persistence | Minimal | Limited | None (default) | + +No component has enough authority to cause harm on its own. + +--- + +## Prompt Injection Is Assumed, Not Denied + +ThreeGate assumes: + +- Prompt injection **cannot be perfectly prevented** +- Indirect injection via documents and web pages is common +- Behavioral safeguards alone are insufficient + +Therefore: +- All external content is treated as data, not instructions +- Outputs are constrained and validated +- Consequences are limited by topology + +--- + +## Tool Use Is the Primary Risk Multiplier + +Execution is where AI systems most often fail catastrophically. + +ThreeGate: +- Makes execution optional +- Requires explicit human approval +- Sandboxes execution in an isolated environment +- Treats execution output as hostile input + +This dramatically reduces blast radius compared to agent loops that auto-execute. + +--- + +## Network Access Is Physically Constrained + +Many systems rely on the model to “decide responsibly” when using the network. + +ThreeGate instead: +- Removes network access entirely from CORE +- Forces FETCH through an allowlisted proxy +- Defaults TOOL-EXEC to no network + +This is **security by topology**, not trust. + +--- + +## Residual Risk Is Explicitly Scoped + +ThreeGate does **not** claim to defend against: + +- Hardware fault induction (e.g., RowHammer) +- Microarchitectural side channels +- Kernel or firmware exploits +- Hostile multi-tenant environments + +The system is designed for **single-user local operation** and documents its threat boundaries clearly. + +--- + +## Why This Matters + +ThreeGate demonstrates a crucial shift in thinking: + +> Safety does not come from making models behave better. +> Safety comes from making misbehavior inconsequential. + +By breaking the agent loop into gated components, ThreeGate enables powerful assistance **without granting unbounded authority**. + +--- + +## Summary + +ThreeGate is safer because it: + +- Eliminates confused deputies +- Treats all external input as hostile +- Separates reasoning from action +- Makes execution rare and auditable +- Enforces policy at the OS and network level + +This is not an optimization of existing agent designs. +It is a **different class of system**.