2.0 KiB
2.0 KiB
Logging & Metrics (Privacy + Security)
ThreeGate benefits from audit logs, but logs can become an exfiltration and privacy liability.
This document defines what to log, what not to log, and how to bound retention.
Goals
- Enable debugging and security review
- Prove that boundaries are enforced
- Avoid capturing sensitive content or secrets
- Avoid turning logs into a data lake
Log What (Recommended)
System events (safe)
- Container/service start/stop
- Validator ACCEPT/REJECT with artifact filename and reason codes
- Proxy allowlist changes (who/when/what)
- Firewall rules application success/failure
- Tool execution metadata:
- request_id
- backend
- runtime_sec
- exit_code
- artifact hashes (sha256)
- size of stdout/stderr (bytes)
Retrieval metadata (safe)
- source_kind
- source_ref (URL/DOI) if not sensitive
- retrieved_utc
- content_type
- byte count
- redirect chain hosts (not full query strings)
Do NOT Log (Hard Prohibitions)
- Full fetched page content / HTML bodies
- Full PDFs or extracted text
- Tool stdout/stderr content by default (store as artifacts, not logs)
- Secrets or tokens
- Local filesystem paths that reveal private structure (beyond controlled volumes)
- User prompts if they may contain sensitive content
Retention
- Default: 7–30 days for operational logs
- Keep artifacts (packets/results) under your normal project retention policy
- Rotate proxy logs aggressively (high volume)
Redaction
If you must log URLs, consider stripping:
- query strings (
?x=y) - fragments (
#section) - known tracking parameters
Metrics (Minimal)
- count_validations_accept / reject
- count_fetch_requests, bytes_fetched
- count_tool_runs by backend
- mean runtime_sec by backend
- quarantine counts (packets/requests/results)
Summary
Audit metadata is useful; content logging is dangerous.
Prefer:
- hashed artifacts + deterministic validators
- small, structured logs
- strict retention + rotation