Add Foundation Notebook inception pilot plan

This commit is contained in:
welsberr 2026-05-07 21:05:33 -04:00
parent addc920ac3
commit 73d89b5f5b
2 changed files with 293 additions and 0 deletions

View File

@ -231,6 +231,7 @@ The fuller bridge workflow is documented in:
- `docs/groundrecall-bridge.md`
- `docs/evo-edu-notebook-pipeline.md`
- `docs/foundation-notebook-inception-pilot.md`
## Didactopus As Pedagogy Support

View File

@ -0,0 +1,292 @@
# Foundation Notebook Inception Pilot
This note turns the broader Notebook pipeline into one concrete first run.
It answers a narrower question than
[evo-edu-notebook-pipeline.md](./evo-edu-notebook-pipeline.md):
- what is the first pilot region,
- what repos and commands are already ready,
- what exact artifacts should be produced,
- and what still blocks calling the Notebook "incepted".
## Current status
The stack is already past pure planning.
Implemented now:
- `CiteGeist` can export Notebook-ready topic bibliography bundles with
`export-notebook-topic`.
- `GroundRecall` can export `groundrecall_query_bundle.json` for one concept.
- `Didactopus` can build `notebook_page.json` directly from a GroundRecall
concept or bundle.
- `Didactopus` pack emission can already carry Notebook-facing artifacts.
Not yet done:
- one named pilot source workspace
- one reviewed pilot concept region carried end to end on real local sources
- one first published Notebook page candidate built from those real sources
So the missing step is no longer "invent Notebook machinery". The missing step
is "run one reproducible pilot from provisioned local sources".
## Chosen first pilot
Use:
- `natural selection and adaptation`
Reasons:
- it is already represented in the Notebook page tests and example graph
structure:
- `natural-selection`
- `variation`
- `adaptation`
- `common descent`
- it is narrow enough for a first pass
- it is central enough that the Notebook navigation model is meaningful
- it can draw on both textbook and web-corpus sources without needing an
enormous bibliography first
This is a better first region than a broad "history of evolutionary thought"
pilot because it stresses concept navigation without forcing huge historical
scope immediately.
## Pilot workspace
Create one stable workspace outside the library root, for example:
```text
/mnt/CIFS/pengolodh/Docs/Projects/evo-edu-notebook-pilot/natural-selection/
```
Recommended layout:
```text
natural-selection/
README.md
manifests/
source-manifest.yaml
sources/
textbooks/
web/
bibliographies/
normalized/
doclift/
citegeist/
groundrecall/
didactopus/
publish/
```
This keeps the library as upstream source storage while making the Notebook run
reproducible in one project-local tree.
## Minimum pilot sources
Start with a deliberately small set.
### Textbook side
Choose 1 to 2 textbook sections on natural selection/adaptation from the local
library root:
- `/mnt/CIFS/pengolodh/Docs/Library`
The exact textbooks can be finalized during provisioning, but the likely first
choices from the existing plan are:
- Futuyma, `Evolutionary Biology`
- Pianka, `Evolutionary Ecology`
### Web corpus side
Provision one small local snapshot from an evolution-focused corpus such as:
- TalkOrigins Archive
- Panda's Thumb
For inception, prefer a small curated subset over a full corpus mirror.
### Bibliography seed side
Use:
- one local `.bib` seed if available
- bibliography material extracted from the chosen textbook sections
- any relevant TalkOrigins bibliography fragments
## Inception steps
### Step 0. Provision the workspace
Deliverables:
- `sources/textbooks/`
- `sources/web/`
- `sources/bibliographies/`
- `manifests/source-manifest.yaml`
Completion check:
- every pilot input is copied or symlinked into the workspace
- the manifest names source type, origin, and local path
### Step 1. Normalize textbook/web material
Use `doclift` for textbook-like source material where it helps.
Expected output root:
```text
normalized/doclift/
```
Representative command pattern:
```bash
cd /home/netuser/bin/doclift
PYTHONPATH=src .venv/bin/python -m doclift.cli convert-dir \
/path/to/pilot-source-dir \
/mnt/CIFS/pengolodh/Docs/Projects/evo-edu-notebook-pilot/natural-selection/normalized/doclift
```
Completion check:
- a deterministic normalized bundle exists
- markdown and sidecars are present where applicable
### Step 2. Build the bibliography substrate
Use `CiteGeist` for the Notebook bibliography layer.
Representative command pattern:
```bash
cd /home/netuser/bin/CiteGeist
PYTHONPATH=src .venv/bin/python -m citegeist --db \
/mnt/CIFS/pengolodh/Docs/Projects/evo-edu-notebook-pilot/natural-selection/citegeist/library.sqlite3 \
ingest /path/to/pilot.bib
```
Then export the Notebook topic bibliography bundle once the pilot topic exists:
```bash
cd /home/netuser/bin/CiteGeist
PYTHONPATH=src .venv/bin/python -m citegeist --db \
/mnt/CIFS/pengolodh/Docs/Projects/evo-edu-notebook-pilot/natural-selection/citegeist/library.sqlite3 \
export-notebook-topic natural-selection --output-dir \
/mnt/CIFS/pengolodh/Docs/Projects/evo-edu-notebook-pilot/natural-selection/citegeist/notebook-bundle
```
Completion check:
- `notebook_topic_bundle.json`
- `notebook_topic_bibliography.bib`
### Step 3. Import and review canonical concepts in GroundRecall
The first real review target should be the concept neighborhood around
`natural-selection`.
Expected output root:
```text
groundrecall/store/
```
Completion check:
- reviewed concept for `natural-selection`
- at least a small connected concept neighborhood
- supporting observations and source artifacts retained
### Step 4. Export the Notebook concept bundle
Use `GroundRecall` export:
```bash
PYTHONPATH=/home/netuser/bin/GroundRecall/src python -m groundrecall.export \
/mnt/CIFS/pengolodh/Docs/Projects/evo-edu-notebook-pilot/natural-selection/groundrecall/store \
/mnt/CIFS/pengolodh/Docs/Projects/evo-edu-notebook-pilot/natural-selection/groundrecall/export \
--pack-ready-concept natural-selection
```
Completion check:
- `groundrecall_query_bundle.json`
### Step 5. Build the Notebook page artifact
Use the direct `Didactopus` wrapper:
```bash
didactopus notebook-page-groundrecall \
/mnt/CIFS/pengolodh/Docs/Projects/evo-edu-notebook-pilot/natural-selection/groundrecall/store \
natural-selection \
/mnt/CIFS/pengolodh/Docs/Projects/evo-edu-notebook-pilot/natural-selection/didactopus/notebook-page
```
Completion check:
- `groundrecall_query_bundle.json`
- `notebook_page.json`
The page artifact should already include:
- concept summary
- graph navigation buckets
- supporting sources
- supporting excerpts
- review context
- illustration opportunities
- suggested next actions
### Step 6. Decide whether inception is complete
For this first pilot, Foundation Notebook inception should mean:
1. one stable pilot workspace exists
2. one real pilot concept region is provisioned locally
3. one reviewed `GroundRecall` concept neighborhood exists
4. one `groundrecall_query_bundle.json` exists for that concept
5. one `notebook_page.json` exists from real reviewed sources
6. one Notebook bibliography bundle exists for the same region
If all six are true, Notebook inception has happened even if public publishing
and richer UI are still pending.
## Expected artifact inventory
At minimum, the first successful inception run should leave:
```text
natural-selection/
manifests/source-manifest.yaml
normalized/doclift/...
citegeist/library.sqlite3
citegeist/notebook-bundle/notebook_topic_bundle.json
citegeist/notebook-bundle/notebook_topic_bibliography.bib
groundrecall/store/...
groundrecall/export/groundrecall_query_bundle.json
didactopus/notebook-page/notebook_page.json
```
## Recommended immediate next actions
1. Create the pilot workspace directory and `source-manifest.yaml`.
2. Pick the exact textbook sections and one small web snapshot.
3. Run one small `doclift` normalization pass.
4. Seed one `CiteGeist` pilot database.
5. Build one real `GroundRecall` concept neighborhood for `natural-selection`.
6. Export the first real `notebook_page.json`.
## Bottom line
The Foundation Notebook is now blocked more by pilot execution than by missing
infrastructure. The first real threshold is not "build more Notebook code". It
is "produce the first real Notebook page artifact from provisioned, reviewed,
local sources".