Initial docs

This commit is contained in:
welsberr 2026-02-09 11:57:19 -05:00
parent 51e8ea8c60
commit ded73126b8
3 changed files with 374 additions and 1 deletions

106
README.md
View File

@ -1,3 +1,109 @@
# ThreeGate # ThreeGate
**ThreeGate** is a compartmentalized architecture for building **secure, local AI assistants** that perform goal-directed tasks *without* relying on autonomous agents or trusting large language models to behave safely. **ThreeGate** is a compartmentalized architecture for building **secure, local AI assistants** that perform goal-directed tasks *without* relying on autonomous agents or trusting large language models to behave safely.
ThreeGate separates **thinking**, **retrieval**, and **execution** into distinct, least-privileged components with enforced trust boundaries.
> If prompt injection is inevitable, safety must come from structure.
---
## What ThreeGate Is
ThreeGate is:
- A **reference architecture** for secure local assistants
- A **defense-in-depth design** against prompt injection, tool abuse, and data exfiltration
- A **human-governed system**, not an autonomous agent
- Designed for **single-user, local operation**
- Explicitly extensible to multiple roles (research, policy analysis, data science, auditing)
---
## What ThreeGate Is Not
ThreeGate is **not**:
- An autonomous agent framework
- A self-modifying system
- A browsing-and-executing AI loop
- A cloud-first or multi-tenant platform
- A system that trusts LLM outputs without validation
---
## Core Insight
Most unsafe AI systems fail because they allow a single component to:
> **Read untrusted input, reason about it, and immediately act on the world.**
ThreeGate prevents this by enforcing **three independent gates**:
1. **FETCH** — retrieves untrusted external content
2. **CORE** — performs reasoning and synthesis
3. **TOOL-EXEC** — executes code, only when explicitly approved
No component crosses more than one gate.
---
## High-Level Architecture
Internet
[ Managed Proxy ]
FETCH (retrieval)
Research Packets
CORE (analysis)
(optional, human-approved)
TOOL-EXEC (sandboxed execution)
---
## Initial Target Role
The first concrete role implemented using ThreeGate is a:
**Secure Local Research Assistant**
Capabilities:
- Scholarly retrieval (controlled, allowlisted)
- Analysis and writing
- Optional sandboxed computation
- No autonomous browsing or execution
---
## Repository Structure (Initial)
ThreeGate/
├── README.md
├── docs/
│ ├── architecture.md
│ ├── threat-model.md
│ └── why-this-is-safer.md
---
## Status
This repository is in **early specification and reference implementation phase**.
The design is intentionally conservative. Convenience features are added *only* when they preserve trust boundaries.
---
## License & Philosophy
ThreeGate favors:
- Explicit over implicit authority
- Structural safety over behavioral promises
- Human-in-the-loop over automation
If a feature weakens a trust boundary, it does not belong here.

150
docs/architecture.md Normal file
View File

@ -0,0 +1,150 @@
# ThreeGate Architecture
This document specifies the **ThreeGate architecture**, including components, trust boundaries, data flow, and enforcement mechanisms.
---
## Design Objective
Enable powerful, local, goal-directed AI assistance while preventing:
- Prompt injection (direct and indirect)
- Tool abuse
- Data exfiltration
- Accidental or malicious system modification
This is achieved by **compartmentalization**, not by trusting model behavior.
---
## Core Components
ThreeGate consists of **three isolated components**, each with a distinct role and privilege level.
### 1. CORE — Analysis & Synthesis
**Responsibilities**
- Reasoning
- Synthesis
- Writing
- Policy interpretation
- Drafting requests for retrieval or execution
**Explicit Restrictions**
- No internet access
- No shell
- No execution
- No package installation
- No modification of policy files
CORE is the **most prompt-exposed component** and therefore the **least powerful**.
---
### 2. FETCH — Controlled Retrieval
**Responsibilities**
- Retrieve external information
- Normalize content into *Research Packets*
**Capabilities**
- HTTPS access only
- Internet access only via managed proxy
- Domain allowlists (e.g., academic sources)
**Explicit Restrictions**
- No execution
- No shell
- No persistence beyond packet output
- No access to CORE state
FETCH treats all retrieved content as hostile by default.
---
### 3. TOOL-EXEC — Optional Execution Sandbox
**Responsibilities**
- Execute explicitly approved code or commands
- Perform computations that cannot be done textually
**Implementation**
- Backed by sandboxed execution (e.g., microVMs such as ERA)
- Ephemeral by default
- No network unless explicitly approved
**Explicit Restrictions**
- No direct access to CORE or FETCH
- No ambient credentials
- No persistent state by default
Execution is the **highest-risk capability** and is therefore isolated and human-gated.
---
## Data Flow & Trust Boundaries
All data movement is **one-way** and **validated**.
| From | To | Direction | Validation |
|----|----|-----------|------------|
| FETCH | CORE | One-way | Required |
| CORE | TOOL-EXEC | Draft only | Human approval |
| TOOL-EXEC | CORE | One-way | Required |
There is **no shared mutable state** between components.
---
## Network Topology
- CORE: no internet route
- FETCH: internet access only via managed proxy
- TOOL-EXEC: no network by default
Network restrictions are enforced at:
- Container network level
- Host firewall level
- Explicit proxy allowlists
---
## Policy Enforcement
- Policies are mounted read-only
- Instruction hierarchy is explicit
- Tool usage requires justification and approval
- Outputs are validated before reuse
---
## Failure Containment
If any component is compromised:
- FETCH cannot execute or persist
- CORE cannot browse or execute
- TOOL-EXEC cannot exfiltrate or persist by default
Failures are **observable, contained, and reversible**.
---
## Architectural Invariants
The following must never be violated:
1. No component both reasons and acts
2. No component both browses and executes
3. External content is hostile by default
4. Execution is optional and sandboxed
5. Network access is a scarce privilege
Any extension must preserve these invariants.
---
## Summary
ThreeGate enforces safety by **structure**, not by instruction.
It assumes model fallibility and limits consequences accordingly.

117
docs/why-this-is-safer.md Normal file
View File

@ -0,0 +1,117 @@
# Why ThreeGate Is Safer Than Agent-Based Systems
This document explains **why the ThreeGate architecture materially reduces risk** compared to common agent and tool-using AI frameworks.
---
## The Core Problem with Agents
Most agent frameworks combine:
- Untrusted input ingestion
- Reasoning
- Tool execution
- Network access
- Persistent state
…inside a single loop.
If prompt injection succeeds — and it eventually will — the model can immediately act with real-world authority.
This is known as the **confused deputy problem**.
---
## ThreeGates Structural Advantage
ThreeGate prevents confused deputies by **separating authority**.
| Capability | FETCH | CORE | TOOL-EXEC |
|----------|-------|------|-----------|
| Internet access | Yes (restricted) | No | No (default) |
| Reasoning | No | Yes | No |
| Execution | No | No | Yes (gated) |
| Persistence | Minimal | Limited | None (default) |
No component has enough authority to cause harm on its own.
---
## Prompt Injection Is Assumed, Not Denied
ThreeGate assumes:
- Prompt injection **cannot be perfectly prevented**
- Indirect injection via documents and web pages is common
- Behavioral safeguards alone are insufficient
Therefore:
- All external content is treated as data, not instructions
- Outputs are constrained and validated
- Consequences are limited by topology
---
## Tool Use Is the Primary Risk Multiplier
Execution is where AI systems most often fail catastrophically.
ThreeGate:
- Makes execution optional
- Requires explicit human approval
- Sandboxes execution in an isolated environment
- Treats execution output as hostile input
This dramatically reduces blast radius compared to agent loops that auto-execute.
---
## Network Access Is Physically Constrained
Many systems rely on the model to “decide responsibly” when using the network.
ThreeGate instead:
- Removes network access entirely from CORE
- Forces FETCH through an allowlisted proxy
- Defaults TOOL-EXEC to no network
This is **security by topology**, not trust.
---
## Residual Risk Is Explicitly Scoped
ThreeGate does **not** claim to defend against:
- Hardware fault induction (e.g., RowHammer)
- Microarchitectural side channels
- Kernel or firmware exploits
- Hostile multi-tenant environments
The system is designed for **single-user local operation** and documents its threat boundaries clearly.
---
## Why This Matters
ThreeGate demonstrates a crucial shift in thinking:
> Safety does not come from making models behave better.
> Safety comes from making misbehavior inconsequential.
By breaking the agent loop into gated components, ThreeGate enables powerful assistance **without granting unbounded authority**.
---
## Summary
ThreeGate is safer because it:
- Eliminates confused deputies
- Treats all external input as hostile
- Separates reasoning from action
- Makes execution rare and auditable
- Enforces policy at the OS and network level
This is not an optimization of existing agent designs.
It is a **different class of system**.