Initial docs
This commit is contained in:
parent
51e8ea8c60
commit
ded73126b8
108
README.md
108
README.md
|
|
@ -1,3 +1,109 @@
|
|||
# ThreeGate
|
||||
|
||||
**ThreeGate** is a compartmentalized architecture for building **secure, local AI assistants** that perform goal-directed tasks *without* relying on autonomous agents or trusting large language models to behave safely.
|
||||
**ThreeGate** is a compartmentalized architecture for building **secure, local AI assistants** that perform goal-directed tasks *without* relying on autonomous agents or trusting large language models to behave safely.
|
||||
|
||||
ThreeGate separates **thinking**, **retrieval**, and **execution** into distinct, least-privileged components with enforced trust boundaries.
|
||||
|
||||
> If prompt injection is inevitable, safety must come from structure.
|
||||
|
||||
---
|
||||
|
||||
## What ThreeGate Is
|
||||
|
||||
ThreeGate is:
|
||||
|
||||
- A **reference architecture** for secure local assistants
|
||||
- A **defense-in-depth design** against prompt injection, tool abuse, and data exfiltration
|
||||
- A **human-governed system**, not an autonomous agent
|
||||
- Designed for **single-user, local operation**
|
||||
- Explicitly extensible to multiple roles (research, policy analysis, data science, auditing)
|
||||
|
||||
---
|
||||
|
||||
## What ThreeGate Is Not
|
||||
|
||||
ThreeGate is **not**:
|
||||
|
||||
- An autonomous agent framework
|
||||
- A self-modifying system
|
||||
- A browsing-and-executing AI loop
|
||||
- A cloud-first or multi-tenant platform
|
||||
- A system that trusts LLM outputs without validation
|
||||
|
||||
---
|
||||
|
||||
## Core Insight
|
||||
|
||||
Most unsafe AI systems fail because they allow a single component to:
|
||||
|
||||
> **Read untrusted input, reason about it, and immediately act on the world.**
|
||||
|
||||
ThreeGate prevents this by enforcing **three independent gates**:
|
||||
|
||||
1. **FETCH** — retrieves untrusted external content
|
||||
2. **CORE** — performs reasoning and synthesis
|
||||
3. **TOOL-EXEC** — executes code, only when explicitly approved
|
||||
|
||||
No component crosses more than one gate.
|
||||
|
||||
---
|
||||
|
||||
## High-Level Architecture
|
||||
|
||||
Internet
|
||||
↑
|
||||
[ Managed Proxy ]
|
||||
↑
|
||||
FETCH (retrieval)
|
||||
↓
|
||||
Research Packets
|
||||
↓
|
||||
CORE (analysis)
|
||||
↓
|
||||
(optional, human-approved)
|
||||
↓
|
||||
TOOL-EXEC (sandboxed execution)
|
||||
|
||||
---
|
||||
|
||||
## Initial Target Role
|
||||
|
||||
The first concrete role implemented using ThreeGate is a:
|
||||
|
||||
**Secure Local Research Assistant**
|
||||
|
||||
Capabilities:
|
||||
- Scholarly retrieval (controlled, allowlisted)
|
||||
- Analysis and writing
|
||||
- Optional sandboxed computation
|
||||
- No autonomous browsing or execution
|
||||
|
||||
---
|
||||
|
||||
## Repository Structure (Initial)
|
||||
|
||||
ThreeGate/
|
||||
├── README.md
|
||||
├── docs/
|
||||
│ ├── architecture.md
|
||||
│ ├── threat-model.md
|
||||
│ └── why-this-is-safer.md
|
||||
|
||||
---
|
||||
|
||||
## Status
|
||||
|
||||
This repository is in **early specification and reference implementation phase**.
|
||||
|
||||
The design is intentionally conservative. Convenience features are added *only* when they preserve trust boundaries.
|
||||
|
||||
---
|
||||
|
||||
## License & Philosophy
|
||||
|
||||
ThreeGate favors:
|
||||
- Explicit over implicit authority
|
||||
- Structural safety over behavioral promises
|
||||
- Human-in-the-loop over automation
|
||||
|
||||
If a feature weakens a trust boundary, it does not belong here.
|
||||
|
|
|
|||
|
|
@ -0,0 +1,150 @@
|
|||
# ThreeGate Architecture
|
||||
|
||||
This document specifies the **ThreeGate architecture**, including components, trust boundaries, data flow, and enforcement mechanisms.
|
||||
|
||||
---
|
||||
|
||||
## Design Objective
|
||||
|
||||
Enable powerful, local, goal-directed AI assistance while preventing:
|
||||
|
||||
- Prompt injection (direct and indirect)
|
||||
- Tool abuse
|
||||
- Data exfiltration
|
||||
- Accidental or malicious system modification
|
||||
|
||||
This is achieved by **compartmentalization**, not by trusting model behavior.
|
||||
|
||||
---
|
||||
|
||||
## Core Components
|
||||
|
||||
ThreeGate consists of **three isolated components**, each with a distinct role and privilege level.
|
||||
|
||||
### 1. CORE — Analysis & Synthesis
|
||||
|
||||
**Responsibilities**
|
||||
- Reasoning
|
||||
- Synthesis
|
||||
- Writing
|
||||
- Policy interpretation
|
||||
- Drafting requests for retrieval or execution
|
||||
|
||||
**Explicit Restrictions**
|
||||
- No internet access
|
||||
- No shell
|
||||
- No execution
|
||||
- No package installation
|
||||
- No modification of policy files
|
||||
|
||||
CORE is the **most prompt-exposed component** and therefore the **least powerful**.
|
||||
|
||||
---
|
||||
|
||||
### 2. FETCH — Controlled Retrieval
|
||||
|
||||
**Responsibilities**
|
||||
- Retrieve external information
|
||||
- Normalize content into *Research Packets*
|
||||
|
||||
**Capabilities**
|
||||
- HTTPS access only
|
||||
- Internet access only via managed proxy
|
||||
- Domain allowlists (e.g., academic sources)
|
||||
|
||||
**Explicit Restrictions**
|
||||
- No execution
|
||||
- No shell
|
||||
- No persistence beyond packet output
|
||||
- No access to CORE state
|
||||
|
||||
FETCH treats all retrieved content as hostile by default.
|
||||
|
||||
---
|
||||
|
||||
### 3. TOOL-EXEC — Optional Execution Sandbox
|
||||
|
||||
**Responsibilities**
|
||||
- Execute explicitly approved code or commands
|
||||
- Perform computations that cannot be done textually
|
||||
|
||||
**Implementation**
|
||||
- Backed by sandboxed execution (e.g., microVMs such as ERA)
|
||||
- Ephemeral by default
|
||||
- No network unless explicitly approved
|
||||
|
||||
**Explicit Restrictions**
|
||||
- No direct access to CORE or FETCH
|
||||
- No ambient credentials
|
||||
- No persistent state by default
|
||||
|
||||
Execution is the **highest-risk capability** and is therefore isolated and human-gated.
|
||||
|
||||
---
|
||||
|
||||
## Data Flow & Trust Boundaries
|
||||
|
||||
All data movement is **one-way** and **validated**.
|
||||
|
||||
| From | To | Direction | Validation |
|
||||
|----|----|-----------|------------|
|
||||
| FETCH | CORE | One-way | Required |
|
||||
| CORE | TOOL-EXEC | Draft only | Human approval |
|
||||
| TOOL-EXEC | CORE | One-way | Required |
|
||||
|
||||
There is **no shared mutable state** between components.
|
||||
|
||||
---
|
||||
|
||||
## Network Topology
|
||||
|
||||
- CORE: no internet route
|
||||
- FETCH: internet access only via managed proxy
|
||||
- TOOL-EXEC: no network by default
|
||||
|
||||
Network restrictions are enforced at:
|
||||
- Container network level
|
||||
- Host firewall level
|
||||
- Explicit proxy allowlists
|
||||
|
||||
---
|
||||
|
||||
## Policy Enforcement
|
||||
|
||||
- Policies are mounted read-only
|
||||
- Instruction hierarchy is explicit
|
||||
- Tool usage requires justification and approval
|
||||
- Outputs are validated before reuse
|
||||
|
||||
---
|
||||
|
||||
## Failure Containment
|
||||
|
||||
If any component is compromised:
|
||||
|
||||
- FETCH cannot execute or persist
|
||||
- CORE cannot browse or execute
|
||||
- TOOL-EXEC cannot exfiltrate or persist by default
|
||||
|
||||
Failures are **observable, contained, and reversible**.
|
||||
|
||||
---
|
||||
|
||||
## Architectural Invariants
|
||||
|
||||
The following must never be violated:
|
||||
|
||||
1. No component both reasons and acts
|
||||
2. No component both browses and executes
|
||||
3. External content is hostile by default
|
||||
4. Execution is optional and sandboxed
|
||||
5. Network access is a scarce privilege
|
||||
|
||||
Any extension must preserve these invariants.
|
||||
|
||||
---
|
||||
|
||||
## Summary
|
||||
|
||||
ThreeGate enforces safety by **structure**, not by instruction.
|
||||
It assumes model fallibility and limits consequences accordingly.
|
||||
|
|
@ -0,0 +1,117 @@
|
|||
# Why ThreeGate Is Safer Than Agent-Based Systems
|
||||
|
||||
This document explains **why the ThreeGate architecture materially reduces risk** compared to common agent and tool-using AI frameworks.
|
||||
|
||||
---
|
||||
|
||||
## The Core Problem with Agents
|
||||
|
||||
Most agent frameworks combine:
|
||||
|
||||
- Untrusted input ingestion
|
||||
- Reasoning
|
||||
- Tool execution
|
||||
- Network access
|
||||
- Persistent state
|
||||
|
||||
…inside a single loop.
|
||||
|
||||
If prompt injection succeeds — and it eventually will — the model can immediately act with real-world authority.
|
||||
|
||||
This is known as the **confused deputy problem**.
|
||||
|
||||
---
|
||||
|
||||
## ThreeGate’s Structural Advantage
|
||||
|
||||
ThreeGate prevents confused deputies by **separating authority**.
|
||||
|
||||
| Capability | FETCH | CORE | TOOL-EXEC |
|
||||
|----------|-------|------|-----------|
|
||||
| Internet access | Yes (restricted) | No | No (default) |
|
||||
| Reasoning | No | Yes | No |
|
||||
| Execution | No | No | Yes (gated) |
|
||||
| Persistence | Minimal | Limited | None (default) |
|
||||
|
||||
No component has enough authority to cause harm on its own.
|
||||
|
||||
---
|
||||
|
||||
## Prompt Injection Is Assumed, Not Denied
|
||||
|
||||
ThreeGate assumes:
|
||||
|
||||
- Prompt injection **cannot be perfectly prevented**
|
||||
- Indirect injection via documents and web pages is common
|
||||
- Behavioral safeguards alone are insufficient
|
||||
|
||||
Therefore:
|
||||
- All external content is treated as data, not instructions
|
||||
- Outputs are constrained and validated
|
||||
- Consequences are limited by topology
|
||||
|
||||
---
|
||||
|
||||
## Tool Use Is the Primary Risk Multiplier
|
||||
|
||||
Execution is where AI systems most often fail catastrophically.
|
||||
|
||||
ThreeGate:
|
||||
- Makes execution optional
|
||||
- Requires explicit human approval
|
||||
- Sandboxes execution in an isolated environment
|
||||
- Treats execution output as hostile input
|
||||
|
||||
This dramatically reduces blast radius compared to agent loops that auto-execute.
|
||||
|
||||
---
|
||||
|
||||
## Network Access Is Physically Constrained
|
||||
|
||||
Many systems rely on the model to “decide responsibly” when using the network.
|
||||
|
||||
ThreeGate instead:
|
||||
- Removes network access entirely from CORE
|
||||
- Forces FETCH through an allowlisted proxy
|
||||
- Defaults TOOL-EXEC to no network
|
||||
|
||||
This is **security by topology**, not trust.
|
||||
|
||||
---
|
||||
|
||||
## Residual Risk Is Explicitly Scoped
|
||||
|
||||
ThreeGate does **not** claim to defend against:
|
||||
|
||||
- Hardware fault induction (e.g., RowHammer)
|
||||
- Microarchitectural side channels
|
||||
- Kernel or firmware exploits
|
||||
- Hostile multi-tenant environments
|
||||
|
||||
The system is designed for **single-user local operation** and documents its threat boundaries clearly.
|
||||
|
||||
---
|
||||
|
||||
## Why This Matters
|
||||
|
||||
ThreeGate demonstrates a crucial shift in thinking:
|
||||
|
||||
> Safety does not come from making models behave better.
|
||||
> Safety comes from making misbehavior inconsequential.
|
||||
|
||||
By breaking the agent loop into gated components, ThreeGate enables powerful assistance **without granting unbounded authority**.
|
||||
|
||||
---
|
||||
|
||||
## Summary
|
||||
|
||||
ThreeGate is safer because it:
|
||||
|
||||
- Eliminates confused deputies
|
||||
- Treats all external input as hostile
|
||||
- Separates reasoning from action
|
||||
- Makes execution rare and auditable
|
||||
- Enforces policy at the OS and network level
|
||||
|
||||
This is not an optimization of existing agent designs.
|
||||
It is a **different class of system**.
|
||||
Loading…
Reference in New Issue