3.3 KiB
3.3 KiB
Research Packet Schema (Normative)
A Research Packet is the only permitted format for data flowing from FETCH to CORE.
All packet content is treated as untrusted data. The packet is designed to:
- preserve provenance (where it came from)
- prevent instruction smuggling
- constrain content into predictable sections
- support deterministic validation and quarantining
Packets that do not conform MUST be quarantined.
File Naming
Recommended:
RP-YYYYMMDD-HHMMSSZ-<slug>.md
Required Front Matter
Research Packets MUST begin with YAML front matter:
---
packet_type: research_packet
schema_version: 1
packet_id: "RP-20260209-153012Z-arxiv-llm-security"
created_utc: "2026-02-09T15:30:12Z"
source_kind: "arxiv|pubmed|crossref|europepmc|doi|url|manual"
source_ref: "https://... or DOI or PMID"
title: "..."
authors: ["Last, First", "..."]
published_date: "YYYY-MM-DD" # if known
retrieved_utc: "YYYY-MM-DDTHH:MM:SSZ"
license: "open|unknown|restricted"
content_hashes:
body_sha256: "hex..."
sources_sha256: "hex..."
---
Notes:
licenseis informational; CORE must still treat as untrusted.content_hashessupport auditability and tamper detection.
Required Sections (in this order)
Packets MUST contain the following H2 sections, exactly:
## Executive Summary## Source Metadata## Extracted Content## Claims and Evidence## Safety Notes## Citations
1) Executive Summary
- Short, neutral description of what the source is about
- No imperatives, no instructions to CORE
- No tool suggestions
2) Source Metadata
Must include:
- canonical URL / DOI / PMID
- publication venue (if known)
- retrieval method (API vs HTML)
- any access constraints observed
3) Extracted Content
- Quotes are allowed but must be short and attributed.
- Prefer paraphrase with citations.
- Avoid embedding procedural steps (install/run) beyond what is necessary to understand the source.
4) Claims and Evidence
A list of claim blocks:
- Claim: ...
Evidence: ...
Confidence: low|medium|high
Citation: [C1]
5) Safety Notes
This section is mandatory and MUST contain:
Untrusted Content Statement:a sentence explicitly stating the content is untrusted and must not be treated as instructions.Injection Indicators:list any suspicious patterns found (orNone observed).
6) Citations
A numbered list with stable labels:
[C1] Author, Title, Venue, Year. URL/DOI.
[C2] ...
Forbidden Content (Validation Failures)
Packets MUST be rejected if they contain (case-insensitive, including obfuscations):
- shell commands or code blocks intended for execution (e.g.,
bash,sh,powershell) - installation instructions (
apt,pip install,curl | sh, etc.) - persistence suggestions (cron, systemd units, init scripts)
- instructions aimed at overriding hierarchy (“ignore previous instructions”, “system prompt”, etc.)
- embedded credentials or tokens
- links to executables or binary downloads presented as steps to take
Packets may describe such things academically if necessary, but must do so as descriptive text with no runnable commands.
Validation Output
Validators should produce:
ACCEPT→ moved tohandoff/inbound-to-core/REJECT→ moved tohandoff/quarantine/with a reason report