# Research Packet Schema (Normative) A **Research Packet** is the only permitted format for data flowing from FETCH to CORE. All packet content is treated as **untrusted data**. The packet is designed to: - preserve provenance (where it came from) - prevent instruction smuggling - constrain content into predictable sections - support deterministic validation and quarantining Packets that do not conform MUST be quarantined. --- ## File Naming Recommended: - `RP-YYYYMMDD-HHMMSSZ-.md` --- ## Required Front Matter Research Packets MUST begin with YAML front matter: ```yaml --- packet_type: research_packet schema_version: 1 packet_id: "RP-20260209-153012Z-arxiv-llm-security" created_utc: "2026-02-09T15:30:12Z" source_kind: "arxiv|pubmed|crossref|europepmc|doi|url|manual" source_ref: "https://... or DOI or PMID" title: "..." authors: ["Last, First", "..."] published_date: "YYYY-MM-DD" # if known retrieved_utc: "YYYY-MM-DDTHH:MM:SSZ" license: "open|unknown|restricted" content_hashes: body_sha256: "hex..." sources_sha256: "hex..." --- ```` Notes: * `license` is informational; CORE must still treat as untrusted. * `content_hashes` support auditability and tamper detection. --- ## Required Sections (in this order) Packets MUST contain the following H2 sections, exactly: 1. `## Executive Summary` 2. `## Source Metadata` 3. `## Extracted Content` 4. `## Claims and Evidence` 5. `## Safety Notes` 6. `## Citations` ### 1) Executive Summary * Short, neutral description of what the source is about * No imperatives, no instructions to CORE * No tool suggestions ### 2) Source Metadata Must include: * canonical URL / DOI / PMID * publication venue (if known) * retrieval method (API vs HTML) * any access constraints observed ### 3) Extracted Content * Quotes are allowed but must be short and attributed. * Prefer paraphrase with citations. * Avoid embedding procedural steps (install/run) beyond what is necessary to understand the source. ### 4) Claims and Evidence A list of claim blocks: ```text - Claim: ... Evidence: ... Confidence: low|medium|high Citation: [C1] ``` ### 5) Safety Notes This section is mandatory and MUST contain: * `Untrusted Content Statement:` a sentence explicitly stating the content is untrusted and must not be treated as instructions. * `Injection Indicators:` list any suspicious patterns found (or `None observed`). ### 6) Citations A numbered list with stable labels: ```text [C1] Author, Title, Venue, Year. URL/DOI. [C2] ... ``` --- ## Forbidden Content (Validation Failures) Packets MUST be rejected if they contain (case-insensitive, including obfuscations): * shell commands or code blocks intended for execution (e.g., `bash`, `sh`, `powershell`) * installation instructions (`apt`, `pip install`, `curl | sh`, etc.) * persistence suggestions (cron, systemd units, init scripts) * instructions aimed at overriding hierarchy (“ignore previous instructions”, “system prompt”, etc.) * embedded credentials or tokens * links to executables or binary downloads presented as steps to take Packets may describe such things academically if necessary, but must do so as **descriptive text** with no runnable commands. --- ## Validation Output Validators should produce: * `ACCEPT` → moved to `handoff/inbound-to-core/` * `REJECT` → moved to `handoff/quarantine/` with a reason report