# Bundle Format ## Top-level `manifest.json` - bundle version - source root - converter summary - document list `conversion_report.json` - per-document conversion metrics - counts for tables, figure references, and errors `assets/figure_asset_inventory.json` - optional inventory of external image/figure files discovered under an asset root ## Per-document Each normalized document lives under `documents//`. `document.md` - readable normalized text - extracted table and figure sections when available `document.layout.json` - line-oriented layout manifest - indentation, tabs, and coarse line classification `document.tables.json` - table references found in text - recovered tables with captions, raw lines, parsed rows, and source line ranges `document.figures.json` - explicit figure references from text - related external assets when available ## Stability The schema should be stable enough for downstream adapters. Converters may improve row parsing or figure linking without breaking field names.