Initial docs
This commit is contained in:
parent
51e8ea8c60
commit
ded73126b8
108
README.md
108
README.md
|
|
@ -1,3 +1,109 @@
|
||||||
# ThreeGate
|
# ThreeGate
|
||||||
|
|
||||||
**ThreeGate** is a compartmentalized architecture for building **secure, local AI assistants** that perform goal-directed tasks *without* relying on autonomous agents or trusting large language models to behave safely.
|
**ThreeGate** is a compartmentalized architecture for building **secure, local AI assistants** that perform goal-directed tasks *without* relying on autonomous agents or trusting large language models to behave safely.
|
||||||
|
|
||||||
|
ThreeGate separates **thinking**, **retrieval**, and **execution** into distinct, least-privileged components with enforced trust boundaries.
|
||||||
|
|
||||||
|
> If prompt injection is inevitable, safety must come from structure.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## What ThreeGate Is
|
||||||
|
|
||||||
|
ThreeGate is:
|
||||||
|
|
||||||
|
- A **reference architecture** for secure local assistants
|
||||||
|
- A **defense-in-depth design** against prompt injection, tool abuse, and data exfiltration
|
||||||
|
- A **human-governed system**, not an autonomous agent
|
||||||
|
- Designed for **single-user, local operation**
|
||||||
|
- Explicitly extensible to multiple roles (research, policy analysis, data science, auditing)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## What ThreeGate Is Not
|
||||||
|
|
||||||
|
ThreeGate is **not**:
|
||||||
|
|
||||||
|
- An autonomous agent framework
|
||||||
|
- A self-modifying system
|
||||||
|
- A browsing-and-executing AI loop
|
||||||
|
- A cloud-first or multi-tenant platform
|
||||||
|
- A system that trusts LLM outputs without validation
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Core Insight
|
||||||
|
|
||||||
|
Most unsafe AI systems fail because they allow a single component to:
|
||||||
|
|
||||||
|
> **Read untrusted input, reason about it, and immediately act on the world.**
|
||||||
|
|
||||||
|
ThreeGate prevents this by enforcing **three independent gates**:
|
||||||
|
|
||||||
|
1. **FETCH** — retrieves untrusted external content
|
||||||
|
2. **CORE** — performs reasoning and synthesis
|
||||||
|
3. **TOOL-EXEC** — executes code, only when explicitly approved
|
||||||
|
|
||||||
|
No component crosses more than one gate.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## High-Level Architecture
|
||||||
|
|
||||||
|
Internet
|
||||||
|
↑
|
||||||
|
[ Managed Proxy ]
|
||||||
|
↑
|
||||||
|
FETCH (retrieval)
|
||||||
|
↓
|
||||||
|
Research Packets
|
||||||
|
↓
|
||||||
|
CORE (analysis)
|
||||||
|
↓
|
||||||
|
(optional, human-approved)
|
||||||
|
↓
|
||||||
|
TOOL-EXEC (sandboxed execution)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Initial Target Role
|
||||||
|
|
||||||
|
The first concrete role implemented using ThreeGate is a:
|
||||||
|
|
||||||
|
**Secure Local Research Assistant**
|
||||||
|
|
||||||
|
Capabilities:
|
||||||
|
- Scholarly retrieval (controlled, allowlisted)
|
||||||
|
- Analysis and writing
|
||||||
|
- Optional sandboxed computation
|
||||||
|
- No autonomous browsing or execution
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Repository Structure (Initial)
|
||||||
|
|
||||||
|
ThreeGate/
|
||||||
|
├── README.md
|
||||||
|
├── docs/
|
||||||
|
│ ├── architecture.md
|
||||||
|
│ ├── threat-model.md
|
||||||
|
│ └── why-this-is-safer.md
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Status
|
||||||
|
|
||||||
|
This repository is in **early specification and reference implementation phase**.
|
||||||
|
|
||||||
|
The design is intentionally conservative. Convenience features are added *only* when they preserve trust boundaries.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## License & Philosophy
|
||||||
|
|
||||||
|
ThreeGate favors:
|
||||||
|
- Explicit over implicit authority
|
||||||
|
- Structural safety over behavioral promises
|
||||||
|
- Human-in-the-loop over automation
|
||||||
|
|
||||||
|
If a feature weakens a trust boundary, it does not belong here.
|
||||||
|
|
|
||||||
|
|
@ -0,0 +1,150 @@
|
||||||
|
# ThreeGate Architecture
|
||||||
|
|
||||||
|
This document specifies the **ThreeGate architecture**, including components, trust boundaries, data flow, and enforcement mechanisms.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Design Objective
|
||||||
|
|
||||||
|
Enable powerful, local, goal-directed AI assistance while preventing:
|
||||||
|
|
||||||
|
- Prompt injection (direct and indirect)
|
||||||
|
- Tool abuse
|
||||||
|
- Data exfiltration
|
||||||
|
- Accidental or malicious system modification
|
||||||
|
|
||||||
|
This is achieved by **compartmentalization**, not by trusting model behavior.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Core Components
|
||||||
|
|
||||||
|
ThreeGate consists of **three isolated components**, each with a distinct role and privilege level.
|
||||||
|
|
||||||
|
### 1. CORE — Analysis & Synthesis
|
||||||
|
|
||||||
|
**Responsibilities**
|
||||||
|
- Reasoning
|
||||||
|
- Synthesis
|
||||||
|
- Writing
|
||||||
|
- Policy interpretation
|
||||||
|
- Drafting requests for retrieval or execution
|
||||||
|
|
||||||
|
**Explicit Restrictions**
|
||||||
|
- No internet access
|
||||||
|
- No shell
|
||||||
|
- No execution
|
||||||
|
- No package installation
|
||||||
|
- No modification of policy files
|
||||||
|
|
||||||
|
CORE is the **most prompt-exposed component** and therefore the **least powerful**.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### 2. FETCH — Controlled Retrieval
|
||||||
|
|
||||||
|
**Responsibilities**
|
||||||
|
- Retrieve external information
|
||||||
|
- Normalize content into *Research Packets*
|
||||||
|
|
||||||
|
**Capabilities**
|
||||||
|
- HTTPS access only
|
||||||
|
- Internet access only via managed proxy
|
||||||
|
- Domain allowlists (e.g., academic sources)
|
||||||
|
|
||||||
|
**Explicit Restrictions**
|
||||||
|
- No execution
|
||||||
|
- No shell
|
||||||
|
- No persistence beyond packet output
|
||||||
|
- No access to CORE state
|
||||||
|
|
||||||
|
FETCH treats all retrieved content as hostile by default.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### 3. TOOL-EXEC — Optional Execution Sandbox
|
||||||
|
|
||||||
|
**Responsibilities**
|
||||||
|
- Execute explicitly approved code or commands
|
||||||
|
- Perform computations that cannot be done textually
|
||||||
|
|
||||||
|
**Implementation**
|
||||||
|
- Backed by sandboxed execution (e.g., microVMs such as ERA)
|
||||||
|
- Ephemeral by default
|
||||||
|
- No network unless explicitly approved
|
||||||
|
|
||||||
|
**Explicit Restrictions**
|
||||||
|
- No direct access to CORE or FETCH
|
||||||
|
- No ambient credentials
|
||||||
|
- No persistent state by default
|
||||||
|
|
||||||
|
Execution is the **highest-risk capability** and is therefore isolated and human-gated.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Data Flow & Trust Boundaries
|
||||||
|
|
||||||
|
All data movement is **one-way** and **validated**.
|
||||||
|
|
||||||
|
| From | To | Direction | Validation |
|
||||||
|
|----|----|-----------|------------|
|
||||||
|
| FETCH | CORE | One-way | Required |
|
||||||
|
| CORE | TOOL-EXEC | Draft only | Human approval |
|
||||||
|
| TOOL-EXEC | CORE | One-way | Required |
|
||||||
|
|
||||||
|
There is **no shared mutable state** between components.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Network Topology
|
||||||
|
|
||||||
|
- CORE: no internet route
|
||||||
|
- FETCH: internet access only via managed proxy
|
||||||
|
- TOOL-EXEC: no network by default
|
||||||
|
|
||||||
|
Network restrictions are enforced at:
|
||||||
|
- Container network level
|
||||||
|
- Host firewall level
|
||||||
|
- Explicit proxy allowlists
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Policy Enforcement
|
||||||
|
|
||||||
|
- Policies are mounted read-only
|
||||||
|
- Instruction hierarchy is explicit
|
||||||
|
- Tool usage requires justification and approval
|
||||||
|
- Outputs are validated before reuse
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Failure Containment
|
||||||
|
|
||||||
|
If any component is compromised:
|
||||||
|
|
||||||
|
- FETCH cannot execute or persist
|
||||||
|
- CORE cannot browse or execute
|
||||||
|
- TOOL-EXEC cannot exfiltrate or persist by default
|
||||||
|
|
||||||
|
Failures are **observable, contained, and reversible**.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Architectural Invariants
|
||||||
|
|
||||||
|
The following must never be violated:
|
||||||
|
|
||||||
|
1. No component both reasons and acts
|
||||||
|
2. No component both browses and executes
|
||||||
|
3. External content is hostile by default
|
||||||
|
4. Execution is optional and sandboxed
|
||||||
|
5. Network access is a scarce privilege
|
||||||
|
|
||||||
|
Any extension must preserve these invariants.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Summary
|
||||||
|
|
||||||
|
ThreeGate enforces safety by **structure**, not by instruction.
|
||||||
|
It assumes model fallibility and limits consequences accordingly.
|
||||||
|
|
@ -0,0 +1,117 @@
|
||||||
|
# Why ThreeGate Is Safer Than Agent-Based Systems
|
||||||
|
|
||||||
|
This document explains **why the ThreeGate architecture materially reduces risk** compared to common agent and tool-using AI frameworks.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## The Core Problem with Agents
|
||||||
|
|
||||||
|
Most agent frameworks combine:
|
||||||
|
|
||||||
|
- Untrusted input ingestion
|
||||||
|
- Reasoning
|
||||||
|
- Tool execution
|
||||||
|
- Network access
|
||||||
|
- Persistent state
|
||||||
|
|
||||||
|
…inside a single loop.
|
||||||
|
|
||||||
|
If prompt injection succeeds — and it eventually will — the model can immediately act with real-world authority.
|
||||||
|
|
||||||
|
This is known as the **confused deputy problem**.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## ThreeGate’s Structural Advantage
|
||||||
|
|
||||||
|
ThreeGate prevents confused deputies by **separating authority**.
|
||||||
|
|
||||||
|
| Capability | FETCH | CORE | TOOL-EXEC |
|
||||||
|
|----------|-------|------|-----------|
|
||||||
|
| Internet access | Yes (restricted) | No | No (default) |
|
||||||
|
| Reasoning | No | Yes | No |
|
||||||
|
| Execution | No | No | Yes (gated) |
|
||||||
|
| Persistence | Minimal | Limited | None (default) |
|
||||||
|
|
||||||
|
No component has enough authority to cause harm on its own.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Prompt Injection Is Assumed, Not Denied
|
||||||
|
|
||||||
|
ThreeGate assumes:
|
||||||
|
|
||||||
|
- Prompt injection **cannot be perfectly prevented**
|
||||||
|
- Indirect injection via documents and web pages is common
|
||||||
|
- Behavioral safeguards alone are insufficient
|
||||||
|
|
||||||
|
Therefore:
|
||||||
|
- All external content is treated as data, not instructions
|
||||||
|
- Outputs are constrained and validated
|
||||||
|
- Consequences are limited by topology
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Tool Use Is the Primary Risk Multiplier
|
||||||
|
|
||||||
|
Execution is where AI systems most often fail catastrophically.
|
||||||
|
|
||||||
|
ThreeGate:
|
||||||
|
- Makes execution optional
|
||||||
|
- Requires explicit human approval
|
||||||
|
- Sandboxes execution in an isolated environment
|
||||||
|
- Treats execution output as hostile input
|
||||||
|
|
||||||
|
This dramatically reduces blast radius compared to agent loops that auto-execute.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Network Access Is Physically Constrained
|
||||||
|
|
||||||
|
Many systems rely on the model to “decide responsibly” when using the network.
|
||||||
|
|
||||||
|
ThreeGate instead:
|
||||||
|
- Removes network access entirely from CORE
|
||||||
|
- Forces FETCH through an allowlisted proxy
|
||||||
|
- Defaults TOOL-EXEC to no network
|
||||||
|
|
||||||
|
This is **security by topology**, not trust.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Residual Risk Is Explicitly Scoped
|
||||||
|
|
||||||
|
ThreeGate does **not** claim to defend against:
|
||||||
|
|
||||||
|
- Hardware fault induction (e.g., RowHammer)
|
||||||
|
- Microarchitectural side channels
|
||||||
|
- Kernel or firmware exploits
|
||||||
|
- Hostile multi-tenant environments
|
||||||
|
|
||||||
|
The system is designed for **single-user local operation** and documents its threat boundaries clearly.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Why This Matters
|
||||||
|
|
||||||
|
ThreeGate demonstrates a crucial shift in thinking:
|
||||||
|
|
||||||
|
> Safety does not come from making models behave better.
|
||||||
|
> Safety comes from making misbehavior inconsequential.
|
||||||
|
|
||||||
|
By breaking the agent loop into gated components, ThreeGate enables powerful assistance **without granting unbounded authority**.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Summary
|
||||||
|
|
||||||
|
ThreeGate is safer because it:
|
||||||
|
|
||||||
|
- Eliminates confused deputies
|
||||||
|
- Treats all external input as hostile
|
||||||
|
- Separates reasoning from action
|
||||||
|
- Makes execution rare and auditable
|
||||||
|
- Enforces policy at the OS and network level
|
||||||
|
|
||||||
|
This is not an optimization of existing agent designs.
|
||||||
|
It is a **different class of system**.
|
||||||
Loading…
Reference in New Issue