# Logging & Metrics (Privacy + Security) ThreeGate benefits from audit logs, but logs can become an exfiltration and privacy liability. This document defines what to log, what not to log, and how to bound retention. --- ## Goals - Enable debugging and security review - Prove that boundaries are enforced - Avoid capturing sensitive content or secrets - Avoid turning logs into a data lake --- ## Log What (Recommended) ### System events (safe) - Container/service start/stop - Validator ACCEPT/REJECT with artifact filename and reason codes - Proxy allowlist changes (who/when/what) - Firewall rules application success/failure - Tool execution metadata: - request_id - backend - runtime_sec - exit_code - artifact hashes (sha256) - size of stdout/stderr (bytes) ### Retrieval metadata (safe) - source_kind - source_ref (URL/DOI) *if not sensitive* - retrieved_utc - content_type - byte count - redirect chain hosts (not full query strings) --- ## Do NOT Log (Hard Prohibitions) - Full fetched page content / HTML bodies - Full PDFs or extracted text - Tool stdout/stderr content by default (store as artifacts, not logs) - Secrets or tokens - Local filesystem paths that reveal private structure (beyond controlled volumes) - User prompts if they may contain sensitive content --- ## Retention - Default: 7–30 days for operational logs - Keep artifacts (packets/results) under your normal project retention policy - Rotate proxy logs aggressively (high volume) --- ## Redaction If you must log URLs, consider stripping: - query strings (`?x=y`) - fragments (`#section`) - known tracking parameters --- ## Metrics (Minimal) - count_validations_accept / reject - count_fetch_requests, bytes_fetched - count_tool_runs by backend - mean runtime_sec by backend - quarantine counts (packets/requests/results) --- ## Summary Audit metadata is useful; content logging is dangerous. Prefer: - hashed artifacts + deterministic validators - small, structured logs - strict retention + rotation