doclift/docs/bundle-format.md

1.0 KiB
Executable File

Bundle Format

Top-level

manifest.json

  • bundle version
  • source root
  • converter summary
  • document list

conversion_report.json

  • per-document conversion metrics
  • counts for tables, figure references, and errors

assets/figure_asset_inventory.json

  • optional inventory of external image/figure files discovered under an asset root

Per-document

Each normalized document lives under documents/<document-id>/.

document.md

  • readable normalized text
  • extracted table and figure sections when available

document.layout.json

  • line-oriented layout manifest
  • indentation, tabs, and coarse line classification

document.tables.json

  • table references found in text
  • recovered tables with captions, raw lines, parsed rows, and source line ranges

document.figures.json

  • explicit figure references from text
  • related external assets when available

Stability

The schema should be stable enough for downstream adapters. Converters may improve row parsing or figure linking without breaking field names.