118 lines
3.0 KiB
Markdown
118 lines
3.0 KiB
Markdown
# Why ThreeGate Is Safer Than Agent-Based Systems
|
||
|
||
This document explains **why the ThreeGate architecture materially reduces risk** compared to common agent and tool-using AI frameworks.
|
||
|
||
---
|
||
|
||
## The Core Problem with Agents
|
||
|
||
Most agent frameworks combine:
|
||
|
||
- Untrusted input ingestion
|
||
- Reasoning
|
||
- Tool execution
|
||
- Network access
|
||
- Persistent state
|
||
|
||
…inside a single loop.
|
||
|
||
If prompt injection succeeds — and it eventually will — the model can immediately act with real-world authority.
|
||
|
||
This is known as the **confused deputy problem**.
|
||
|
||
---
|
||
|
||
## ThreeGate’s Structural Advantage
|
||
|
||
ThreeGate prevents confused deputies by **separating authority**.
|
||
|
||
| Capability | FETCH | CORE | TOOL-EXEC |
|
||
|----------|-------|------|-----------|
|
||
| Internet access | Yes (restricted) | No | No (default) |
|
||
| Reasoning | No | Yes | No |
|
||
| Execution | No | No | Yes (gated) |
|
||
| Persistence | Minimal | Limited | None (default) |
|
||
|
||
No component has enough authority to cause harm on its own.
|
||
|
||
---
|
||
|
||
## Prompt Injection Is Assumed, Not Denied
|
||
|
||
ThreeGate assumes:
|
||
|
||
- Prompt injection **cannot be perfectly prevented**
|
||
- Indirect injection via documents and web pages is common
|
||
- Behavioral safeguards alone are insufficient
|
||
|
||
Therefore:
|
||
- All external content is treated as data, not instructions
|
||
- Outputs are constrained and validated
|
||
- Consequences are limited by topology
|
||
|
||
---
|
||
|
||
## Tool Use Is the Primary Risk Multiplier
|
||
|
||
Execution is where AI systems most often fail catastrophically.
|
||
|
||
ThreeGate:
|
||
- Makes execution optional
|
||
- Requires explicit human approval
|
||
- Sandboxes execution in an isolated environment
|
||
- Treats execution output as hostile input
|
||
|
||
This dramatically reduces blast radius compared to agent loops that auto-execute.
|
||
|
||
---
|
||
|
||
## Network Access Is Physically Constrained
|
||
|
||
Many systems rely on the model to “decide responsibly” when using the network.
|
||
|
||
ThreeGate instead:
|
||
- Removes network access entirely from CORE
|
||
- Forces FETCH through an allowlisted proxy
|
||
- Defaults TOOL-EXEC to no network
|
||
|
||
This is **security by topology**, not trust.
|
||
|
||
---
|
||
|
||
## Residual Risk Is Explicitly Scoped
|
||
|
||
ThreeGate does **not** claim to defend against:
|
||
|
||
- Hardware fault induction (e.g., RowHammer)
|
||
- Microarchitectural side channels
|
||
- Kernel or firmware exploits
|
||
- Hostile multi-tenant environments
|
||
|
||
The system is designed for **single-user local operation** and documents its threat boundaries clearly.
|
||
|
||
---
|
||
|
||
## Why This Matters
|
||
|
||
ThreeGate demonstrates a crucial shift in thinking:
|
||
|
||
> Safety does not come from making models behave better.
|
||
> Safety comes from making misbehavior inconsequential.
|
||
|
||
By breaking the agent loop into gated components, ThreeGate enables powerful assistance **without granting unbounded authority**.
|
||
|
||
---
|
||
|
||
## Summary
|
||
|
||
ThreeGate is safer because it:
|
||
|
||
- Eliminates confused deputies
|
||
- Treats all external input as hostile
|
||
- Separates reasoning from action
|
||
- Makes execution rare and auditable
|
||
- Enforces policy at the OS and network level
|
||
|
||
This is not an optimization of existing agent designs.
|
||
It is a **different class of system**.
|