Add license and project roadmap

This commit is contained in:
welsberr 2026-04-22 17:46:41 -04:00
parent aa0951ebf1
commit d0f2352341
3 changed files with 66 additions and 0 deletions

21
LICENSE Executable file
View File

@ -0,0 +1,21 @@
MIT License
Copyright (c) 2026
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.

View File

@ -14,6 +14,12 @@ The initial target is legacy Word `.doc` files, but the repository boundary is i
`doclift` is not a learner-facing system. It is a source-normalization layer that other projects can consume. `doclift` is not a learner-facing system. It is a source-normalization layer that other projects can consume.
Project planning and lifecycle notes live in:
- `docs/architecture.md`
- `docs/bundle-format.md`
- `docs/roadmap.md`
Current implementation: Current implementation:
- legacy Word `.doc` conversion through `catdoc` - legacy Word `.doc` conversion through `catdoc`
@ -82,3 +88,7 @@ out/
- `Didactopus` should consume `doclift` bundles rather than own legacy format handling. - `Didactopus` should consume `doclift` bundles rather than own legacy format handling.
- `GroundRecall` can use the same bundles for provenance-aware import. - `GroundRecall` can use the same bundles for provenance-aware import.
- other archival or scholarly tooling can reuse the same normalization path without depending on Didactopus. - other archival or scholarly tooling can reuse the same normalization path without depending on Didactopus.
## License
`doclift` is licensed under the MIT license. See `LICENSE`.

35
docs/roadmap.md Executable file
View File

@ -0,0 +1,35 @@
# Roadmap
## Near Term
- stabilize the normalized bundle schema for downstream adapters
- add a `doclift` bundle consumer path in Didactopus
- extend test coverage around table parsing, title repair, and layout manifests
- add fixture-based regression tests from representative legacy corpora
- improve CLI output for batch conversion summaries
## Format Expansion
- add a higher-fidelity `.docx` path
- add RTF support
- add WordPerfect discovery and conversion plugins
- add optional OCR-assisted pipelines for scanned legacy material
## Structural Recovery
- improve multi-line table caption handling
- distinguish equations, taxonomy outlines, and nested lists more accurately
- support figure-to-text linking when explicit references exist
- separate external asset inventory from inferred figure linkage confidence
## Runtime and Packaging
- harden the Docker/Compose runtime for reproducible cross-platform conversion
- add a small HTTP service wrapper for queued conversions
- publish container image and package release workflow
## Integration
- define a Didactopus source adapter for `doclift` bundles
- define a GroundRecall importer for `doclift` manifests and sidecars
- document provenance mapping from `doclift` artifacts into downstream stores