Didactopus/docs/foundation-notebook-incepti...

7.9 KiB

Foundation Notebook Inception Pilot

This note turns the broader Notebook pipeline into one concrete first run.

It answers a narrower question than evo-edu-notebook-pipeline.md:

  • what is the first pilot region,
  • what repos and commands are already ready,
  • what exact artifacts should be produced,
  • and what still blocks calling the Notebook "incepted".

Current status

The stack is already past pure planning.

Implemented now:

  • CiteGeist can export Notebook-ready topic bibliography bundles with export-notebook-topic.
  • GroundRecall can export groundrecall_query_bundle.json for one concept.
  • Didactopus can build notebook_page.json directly from a GroundRecall concept or bundle.
  • Didactopus pack emission can already carry Notebook-facing artifacts.

Not yet done:

  • one named pilot source workspace
  • one reviewed pilot concept region carried end to end on real local sources
  • one first published Notebook page candidate built from those real sources

So the missing step is no longer "invent Notebook machinery". The missing step is "run one reproducible pilot from provisioned local sources".

Chosen first pilot

Use:

  • natural selection and adaptation

Reasons:

  • it is already represented in the Notebook page tests and example graph structure:
    • natural-selection
    • variation
    • adaptation
    • common descent
  • it is narrow enough for a first pass
  • it is central enough that the Notebook navigation model is meaningful
  • it can draw on both textbook and web-corpus sources without needing an enormous bibliography first

This is a better first region than a broad "history of evolutionary thought" pilot because it stresses concept navigation without forcing huge historical scope immediately.

Pilot workspace

Create one stable workspace outside the library root, for example:

/mnt/CIFS/pengolodh/Docs/Projects/evo-edu-notebook-pilot/natural-selection/

Recommended layout:

natural-selection/
  README.md
  manifests/
    source-manifest.yaml
  sources/
    textbooks/
    web/
    bibliographies/
  normalized/
    doclift/
  citegeist/
  groundrecall/
  didactopus/
  publish/

This keeps the library as upstream source storage while making the Notebook run reproducible in one project-local tree.

Minimum pilot sources

Start with a deliberately small set.

Textbook side

Choose 1 to 2 textbook sections on natural selection/adaptation from the local library root:

  • /mnt/CIFS/pengolodh/Docs/Library

The exact textbooks can be finalized during provisioning, but the likely first choices from the existing plan are:

  • Futuyma, Evolutionary Biology
  • Pianka, Evolutionary Ecology

Web corpus side

Provision one small local snapshot from an evolution-focused corpus such as:

  • TalkOrigins Archive
  • Panda's Thumb

For inception, prefer a small curated subset over a full corpus mirror.

Bibliography seed side

Use:

  • one local .bib seed if available
  • bibliography material extracted from the chosen textbook sections
  • any relevant TalkOrigins bibliography fragments

Inception steps

Step 0. Provision the workspace

Deliverables:

  • sources/textbooks/
  • sources/web/
  • sources/bibliographies/
  • manifests/source-manifest.yaml

Completion check:

  • every pilot input is copied or symlinked into the workspace
  • the manifest names source type, origin, and local path

Step 1. Normalize textbook/web material

Use doclift for textbook-like source material where it helps.

Expected output root:

normalized/doclift/

Representative command pattern:

cd /home/netuser/bin/doclift
PYTHONPATH=src .venv/bin/python -m doclift.cli convert-dir \
  /path/to/pilot-source-dir \
  /mnt/CIFS/pengolodh/Docs/Projects/evo-edu-notebook-pilot/natural-selection/normalized/doclift

Completion check:

  • a deterministic normalized bundle exists
  • markdown and sidecars are present where applicable

Step 2. Build the bibliography substrate

Use CiteGeist for the Notebook bibliography layer.

Representative command pattern:

cd /home/netuser/bin/CiteGeist
PYTHONPATH=src .venv/bin/python -m citegeist --db \
  /mnt/CIFS/pengolodh/Docs/Projects/evo-edu-notebook-pilot/natural-selection/citegeist/library.sqlite3 \
  ingest /path/to/pilot.bib

Then export the Notebook topic bibliography bundle once the pilot topic exists:

cd /home/netuser/bin/CiteGeist
PYTHONPATH=src .venv/bin/python -m citegeist --db \
  /mnt/CIFS/pengolodh/Docs/Projects/evo-edu-notebook-pilot/natural-selection/citegeist/library.sqlite3 \
  export-notebook-topic natural-selection --output-dir \
  /mnt/CIFS/pengolodh/Docs/Projects/evo-edu-notebook-pilot/natural-selection/citegeist/notebook-bundle

Completion check:

  • notebook_topic_bundle.json
  • notebook_topic_bibliography.bib

Step 3. Import and review canonical concepts in GroundRecall

The first real review target should be the concept neighborhood around natural-selection.

Expected output root:

groundrecall/store/

Completion check:

  • reviewed concept for natural-selection
  • at least a small connected concept neighborhood
  • supporting observations and source artifacts retained

Step 4. Export the Notebook concept bundle

Use GroundRecall export:

PYTHONPATH=/home/netuser/bin/GroundRecall/src python -m groundrecall.export \
  /mnt/CIFS/pengolodh/Docs/Projects/evo-edu-notebook-pilot/natural-selection/groundrecall/store \
  /mnt/CIFS/pengolodh/Docs/Projects/evo-edu-notebook-pilot/natural-selection/groundrecall/export \
  --pack-ready-concept natural-selection

Completion check:

  • groundrecall_query_bundle.json

Step 5. Build the Notebook page artifact

Use the direct Didactopus wrapper:

didactopus notebook-page-groundrecall \
  /mnt/CIFS/pengolodh/Docs/Projects/evo-edu-notebook-pilot/natural-selection/groundrecall/store \
  natural-selection \
  /mnt/CIFS/pengolodh/Docs/Projects/evo-edu-notebook-pilot/natural-selection/didactopus/notebook-page

Completion check:

  • groundrecall_query_bundle.json
  • notebook_page.json

The page artifact should already include:

  • concept summary
  • graph navigation buckets
  • supporting sources
  • supporting excerpts
  • review context
  • illustration opportunities
  • suggested next actions

Step 6. Decide whether inception is complete

For this first pilot, Foundation Notebook inception should mean:

  1. one stable pilot workspace exists
  2. one real pilot concept region is provisioned locally
  3. one reviewed GroundRecall concept neighborhood exists
  4. one groundrecall_query_bundle.json exists for that concept
  5. one notebook_page.json exists from real reviewed sources
  6. one Notebook bibliography bundle exists for the same region

If all six are true, Notebook inception has happened even if public publishing and richer UI are still pending.

Expected artifact inventory

At minimum, the first successful inception run should leave:

natural-selection/
  manifests/source-manifest.yaml
  normalized/doclift/...
  citegeist/library.sqlite3
  citegeist/notebook-bundle/notebook_topic_bundle.json
  citegeist/notebook-bundle/notebook_topic_bibliography.bib
  groundrecall/store/...
  groundrecall/export/groundrecall_query_bundle.json
  didactopus/notebook-page/notebook_page.json
  1. Create the pilot workspace directory and source-manifest.yaml.
  2. Pick the exact textbook sections and one small web snapshot.
  3. Run one small doclift normalization pass.
  4. Seed one CiteGeist pilot database.
  5. Build one real GroundRecall concept neighborhood for natural-selection.
  6. Export the first real notebook_page.json.

Bottom line

The Foundation Notebook is now blocked more by pilot execution than by missing infrastructure. The first real threshold is not "build more Notebook code". It is "produce the first real Notebook page artifact from provisioned, reviewed, local sources".