42 lines
1.0 KiB
Markdown
Executable File
42 lines
1.0 KiB
Markdown
Executable File
# Bundle Format
|
|
|
|
## Top-level
|
|
|
|
`manifest.json`
|
|
- bundle version
|
|
- source root
|
|
- converter summary
|
|
- document list
|
|
|
|
`conversion_report.json`
|
|
- per-document conversion metrics
|
|
- counts for tables, figure references, and errors
|
|
|
|
`assets/figure_asset_inventory.json`
|
|
- optional inventory of external image/figure files discovered under an asset root
|
|
|
|
## Per-document
|
|
|
|
Each normalized document lives under `documents/<document-id>/`.
|
|
|
|
`document.md`
|
|
- readable normalized text
|
|
- extracted table and figure sections when available
|
|
|
|
`document.layout.json`
|
|
- line-oriented layout manifest
|
|
- indentation, tabs, and coarse line classification
|
|
|
|
`document.tables.json`
|
|
- table references found in text
|
|
- recovered tables with captions, raw lines, parsed rows, and source line ranges
|
|
|
|
`document.figures.json`
|
|
- explicit figure references from text
|
|
- related external assets when available
|
|
|
|
## Stability
|
|
|
|
The schema should be stable enough for downstream adapters.
|
|
Converters may improve row parsing or figure linking without breaking field names.
|