From d0f23523412e388d53a3ff44ed1556f9a08e47c0 Mon Sep 17 00:00:00 2001 From: welsberr Date: Wed, 22 Apr 2026 17:46:41 -0400 Subject: [PATCH] Add license and project roadmap --- LICENSE | 21 +++++++++++++++++++++ README.md | 10 ++++++++++ docs/roadmap.md | 35 +++++++++++++++++++++++++++++++++++ 3 files changed, 66 insertions(+) create mode 100755 LICENSE create mode 100755 docs/roadmap.md diff --git a/LICENSE b/LICENSE new file mode 100755 index 0000000..14fac91 --- /dev/null +++ b/LICENSE @@ -0,0 +1,21 @@ +MIT License + +Copyright (c) 2026 + +Permission is hereby granted, free of charge, to any person obtaining a copy +of this software and associated documentation files (the "Software"), to deal +in the Software without restriction, including without limitation the rights +to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +copies of the Software, and to permit persons to whom the Software is +furnished to do so, subject to the following conditions: + +The above copyright notice and this permission notice shall be included in all +copies or substantial portions of the Software. + +THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +SOFTWARE. diff --git a/README.md b/README.md index 557bdeb..ef6c368 100755 --- a/README.md +++ b/README.md @@ -14,6 +14,12 @@ The initial target is legacy Word `.doc` files, but the repository boundary is i `doclift` is not a learner-facing system. It is a source-normalization layer that other projects can consume. +Project planning and lifecycle notes live in: + +- `docs/architecture.md` +- `docs/bundle-format.md` +- `docs/roadmap.md` + Current implementation: - legacy Word `.doc` conversion through `catdoc` @@ -82,3 +88,7 @@ out/ - `Didactopus` should consume `doclift` bundles rather than own legacy format handling. - `GroundRecall` can use the same bundles for provenance-aware import. - other archival or scholarly tooling can reuse the same normalization path without depending on Didactopus. + +## License + +`doclift` is licensed under the MIT license. See `LICENSE`. diff --git a/docs/roadmap.md b/docs/roadmap.md new file mode 100755 index 0000000..5feac79 --- /dev/null +++ b/docs/roadmap.md @@ -0,0 +1,35 @@ +# Roadmap + +## Near Term + +- stabilize the normalized bundle schema for downstream adapters +- add a `doclift` bundle consumer path in Didactopus +- extend test coverage around table parsing, title repair, and layout manifests +- add fixture-based regression tests from representative legacy corpora +- improve CLI output for batch conversion summaries + +## Format Expansion + +- add a higher-fidelity `.docx` path +- add RTF support +- add WordPerfect discovery and conversion plugins +- add optional OCR-assisted pipelines for scanned legacy material + +## Structural Recovery + +- improve multi-line table caption handling +- distinguish equations, taxonomy outlines, and nested lists more accurately +- support figure-to-text linking when explicit references exist +- separate external asset inventory from inferred figure linkage confidence + +## Runtime and Packaging + +- harden the Docker/Compose runtime for reproducible cross-platform conversion +- add a small HTTP service wrapper for queued conversions +- publish container image and package release workflow + +## Integration + +- define a Didactopus source adapter for `doclift` bundles +- define a GroundRecall importer for `doclift` manifests and sidecars +- document provenance mapping from `doclift` artifacts into downstream stores