# Logging & Metrics (Privacy + Security)

ThreeGate benefits from audit logs, but logs can become an exfiltration and privacy liability.

This document defines what to log, what not to log, and how to bound retention.

---

## Goals

- Enable debugging and security review
- Prove that boundaries are enforced
- Avoid capturing sensitive content or secrets
- Avoid turning logs into a data lake

---

## Log What (Recommended)

### System events (safe)
- Container/service start/stop
- Validator ACCEPT/REJECT with artifact filename and reason codes
- Proxy allowlist changes (who/when/what)
- Firewall rules application success/failure
- Tool execution metadata:
  - request_id
  - backend
  - runtime_sec
  - exit_code
  - artifact hashes (sha256)
  - size of stdout/stderr (bytes)

### Retrieval metadata (safe)
- source_kind
- source_ref (URL/DOI) *if not sensitive*
- retrieved_utc
- content_type
- byte count
- redirect chain hosts (not full query strings)

---

## Do NOT Log (Hard Prohibitions)

- Full fetched page content / HTML bodies
- Full PDFs or extracted text
- Tool stdout/stderr content by default (store as artifacts, not logs)
- Secrets or tokens
- Local filesystem paths that reveal private structure (beyond controlled volumes)
- User prompts if they may contain sensitive content

---

## Retention

- Default: 7–30 days for operational logs
- Keep artifacts (packets/results) under your normal project retention policy
- Rotate proxy logs aggressively (high volume)

---

## Redaction

If you must log URLs, consider stripping:
- query strings (`?x=y`)
- fragments (`#section`)
- known tracking parameters

---

## Metrics (Minimal)

- count_validations_accept / reject
- count_fetch_requests, bytes_fetched
- count_tool_runs by backend
- mean runtime_sec by backend
- quarantine counts (packets/requests/results)

---

## Summary

Audit metadata is useful; content logging is dangerous.

Prefer:
- hashed artifacts + deterministic validators
- small, structured logs
- strict retention + rotation