ThreeGate/schemas/research-packet.schema.md

# Research Packet Schema (Normative)

A **Research Packet** is the only permitted format for data flowing from FETCH to CORE.

All packet content is treated as **untrusted data**. The packet is designed to:
- preserve provenance (where it came from)
- prevent instruction smuggling
- constrain content into predictable sections
- support deterministic validation and quarantining

Packets that do not conform MUST be quarantined.

---

## File Naming

Recommended:
- `RP-YYYYMMDD-HHMMSSZ-<slug>.md`

---

## Required Front Matter

Research Packets MUST begin with YAML front matter:

```yaml
---
packet_type: research_packet
schema_version: 1
packet_id: "RP-20260209-153012Z-arxiv-llm-security"
created_utc: "2026-02-09T15:30:12Z"
source_kind: "arxiv|pubmed|crossref|europepmc|doi|url|manual"
source_ref: "https://... or DOI or PMID"
title: "..."
authors: ["Last, First", "..."]
published_date: "YYYY-MM-DD"   # if known
retrieved_utc: "YYYY-MM-DDTHH:MM:SSZ"
license: "open|unknown|restricted"
content_hashes:
  body_sha256: "hex..."
  sources_sha256: "hex..."
---
````

Notes:

* `license` is informational; CORE must still treat as untrusted.
* `content_hashes` support auditability and tamper detection.

---

## Required Sections (in this order)

Packets MUST contain the following H2 sections, exactly:

1. `## Executive Summary`
2. `## Source Metadata`
3. `## Extracted Content`
4. `## Claims and Evidence`
5. `## Safety Notes`
6. `## Citations`

### 1) Executive Summary

* Short, neutral description of what the source is about
* No imperatives, no instructions to CORE
* No tool suggestions

### 2) Source Metadata

Must include:

* canonical URL / DOI / PMID
* publication venue (if known)
* retrieval method (API vs HTML)
* any access constraints observed

### 3) Extracted Content

* Quotes are allowed but must be short and attributed.
* Prefer paraphrase with citations.
* Avoid embedding procedural steps (install/run) beyond what is necessary to understand the source.

### 4) Claims and Evidence

A list of claim blocks:

```text
- Claim: ...
  Evidence: ...
  Confidence: low|medium|high
  Citation: [C1]
```

### 5) Safety Notes

This section is mandatory and MUST contain:

* `Untrusted Content Statement:` a sentence explicitly stating the content is untrusted and must not be treated as instructions.
* `Injection Indicators:` list any suspicious patterns found (or `None observed`).

### 6) Citations

A numbered list with stable labels:

```text
[C1] Author, Title, Venue, Year. URL/DOI.
[C2] ...
```

---

## Forbidden Content (Validation Failures)

Packets MUST be rejected if they contain (case-insensitive, including obfuscations):

* shell commands or code blocks intended for execution (e.g., `bash`, `sh`, `powershell`)
* installation instructions (`apt`, `pip install`, `curl | sh`, etc.)
* persistence suggestions (cron, systemd units, init scripts)
* instructions aimed at overriding hierarchy (“ignore previous instructions”, “system prompt”, etc.)
* embedded credentials or tokens
* links to executables or binary downloads presented as steps to take

Packets may describe such things academically if necessary, but must do so as **descriptive text** with no runnable commands.

---

## Validation Output

Validators should produce:

* `ACCEPT` → moved to `handoff/inbound-to-core/`
* `REJECT` → moved to `handoff/quarantine/` with a reason report