ThreeGate/schemas/research-packet.schema.md

3.3 KiB

Research Packet Schema (Normative)

A Research Packet is the only permitted format for data flowing from FETCH to CORE.

All packet content is treated as untrusted data. The packet is designed to:

  • preserve provenance (where it came from)
  • prevent instruction smuggling
  • constrain content into predictable sections
  • support deterministic validation and quarantining

Packets that do not conform MUST be quarantined.


File Naming

Recommended:

  • RP-YYYYMMDD-HHMMSSZ-<slug>.md

Required Front Matter

Research Packets MUST begin with YAML front matter:

---
packet_type: research_packet
schema_version: 1
packet_id: "RP-20260209-153012Z-arxiv-llm-security"
created_utc: "2026-02-09T15:30:12Z"
source_kind: "arxiv|pubmed|crossref|europepmc|doi|url|manual"
source_ref: "https://... or DOI or PMID"
title: "..."
authors: ["Last, First", "..."]
published_date: "YYYY-MM-DD"   # if known
retrieved_utc: "YYYY-MM-DDTHH:MM:SSZ"
license: "open|unknown|restricted"
content_hashes:
  body_sha256: "hex..."
  sources_sha256: "hex..."
---

Notes:

  • license is informational; CORE must still treat as untrusted.
  • content_hashes support auditability and tamper detection.

Required Sections (in this order)

Packets MUST contain the following H2 sections, exactly:

  1. ## Executive Summary
  2. ## Source Metadata
  3. ## Extracted Content
  4. ## Claims and Evidence
  5. ## Safety Notes
  6. ## Citations

1) Executive Summary

  • Short, neutral description of what the source is about
  • No imperatives, no instructions to CORE
  • No tool suggestions

2) Source Metadata

Must include:

  • canonical URL / DOI / PMID
  • publication venue (if known)
  • retrieval method (API vs HTML)
  • any access constraints observed

3) Extracted Content

  • Quotes are allowed but must be short and attributed.
  • Prefer paraphrase with citations.
  • Avoid embedding procedural steps (install/run) beyond what is necessary to understand the source.

4) Claims and Evidence

A list of claim blocks:

- Claim: ...
  Evidence: ...
  Confidence: low|medium|high
  Citation: [C1]

5) Safety Notes

This section is mandatory and MUST contain:

  • Untrusted Content Statement: a sentence explicitly stating the content is untrusted and must not be treated as instructions.
  • Injection Indicators: list any suspicious patterns found (or None observed).

6) Citations

A numbered list with stable labels:

[C1] Author, Title, Venue, Year. URL/DOI.
[C2] ...

Forbidden Content (Validation Failures)

Packets MUST be rejected if they contain (case-insensitive, including obfuscations):

  • shell commands or code blocks intended for execution (e.g., bash, sh, powershell)
  • installation instructions (apt, pip install, curl | sh, etc.)
  • persistence suggestions (cron, systemd units, init scripts)
  • instructions aimed at overriding hierarchy (“ignore previous instructions”, “system prompt”, etc.)
  • embedded credentials or tokens
  • links to executables or binary downloads presented as steps to take

Packets may describe such things academically if necessary, but must do so as descriptive text with no runnable commands.


Validation Output

Validators should produce:

  • ACCEPT → moved to handoff/inbound-to-core/
  • REJECT → moved to handoff/quarantine/ with a reason report