8.9 KiB

Raw Blame History

evo-edu Notebook Pipeline

This note turns the current Notebook idea into a concrete cross-repo workflow for doclift, GroundRecall, Didactopus, and CiteGeist.

The target is the conceptual resource at:

https://evo-edu.org/evo/notebook/

The important shift is that the Notebook should not be treated as "just another wiki". The strongest differentiator available in the current stack is graph-first navigation over reviewed concepts, claims, citations, and learner next-step suggestions.

Why this fits the current stack

The stack already divides responsibility in a useful way:

doclift: normalize messy source material into deterministic bundles
GroundRecall: canonical reviewed claims, concept graph, provenance, and query/export surfaces
Didactopus: learner-facing packs, sequencing, workbench flows, and concept navigation
CiteGeist: bibliography extraction, enrichment, review, and expansion

The Notebook use case needs all four:

explanation text
accessible concept sequencing
explicit source grounding
bibliography compilation and enrichment
illustration planning
visible graph structure for "what to learn next"

Source classes

The Notebook will likely need at least four source classes.

1. Web corpora

Examples:

TalkOrigins Archive FAQs and articles
TalkDesign posts
Panda's Thumb posts

Operational note:

these corpora should be provisioned locally before ingestion
do not rely on live scraping as the primary production path
keep source snapshots versioned or at least manifest-tracked

2. Scanned textbooks and monographs

Examples already named:

Futuyma, Evolutionary Biology
Pianka, Evolutionary Ecology
Bowler, Evolution: The History of an Idea

The current local library root is:

/mnt/CIFS/pengolodh/Docs/Library

This should be treated as the upstream source corpus, not as the final working directory for Notebook artifacts.

3. Bibliographic seed corpora

Examples:

TalkOrigins bibliographies
textbook reference sections
existing .bib files in the library

These are where CiteGeist becomes especially important.

4. Planned illustration sources

These are not just assets. They should be reviewable planning objects:

target concept
illustration intent
source basis
rights/compliance note
status: planned / needed / drafted / reviewed / published

Recommended working position for the Notebook

The Notebook should be positioned as:

a graph-guided conceptual atlas
a source-grounded explanation layer
a learner-facing bridge between articles, textbooks, and bibliographies

It should not try to compete by being the flattest or largest encyclopedia.

The distinguishing feature should be that the learner can see:

antecedent concepts
nearby or "closer" concepts
derivative or downstream concepts
representative supporting sources
bibliography growth points
illustration opportunities

That is much more consistent with the current stack than a generic article CMS.

Proposed pipeline

Phase 0. Provision the corpora locally

Create a local Notebook source workspace containing:

provisioned web corpora snapshots
selected textbook scan directories
bibliography seeds
source manifests

Expected result:

stable local inputs for repeatable ingestion

Phase 1. Normalize source material with `doclift`

Use doclift for:

OCR-derived text normalization where practical
sidecar generation
document.chunks.json emission
bundle manifests for scanned or converted materials

For web corpora, either:

convert into bundle-like normalized document trees, or
ingest through direct text/markdown adapters where that is simpler

Expected result:

deterministic source bundles for longer-form documents

Phase 2. Build bibliographic substrate with `CiteGeist`

Use CiteGeist to:

scrape or ingest TalkOrigins bibliography materials
expand weak references
enrich textbook references
cluster duplicates
build review exports for uncertain entries
maintain one or more Notebook .bib outputs

Expected result:

a reviewed bibliography layer rather than ad hoc citation lists

Phase 3. Import canonical knowledge into `GroundRecall`

Use GroundRecall to import:

doclift bundles for textbooks and scans
provisioned article/essay corpora
optional Didactopus-native artifacts where useful

Then use its review flow to:

standardize concepts
preserve fragments and provenance
compute graph diagnostics
queue bridge/isolated/small-component concepts for review
retain review rationale in promoted candidates

Expected result:

canonical Notebook concept/claim substrate with provenance and graph signals

Phase 4. Export pack-ready concept bundles from `GroundRecall`

For important notebook concepts, export:

groundrecall_query_bundle.json

This becomes the handoff object for learner-facing or page-facing pack flows.

Expected result:

reviewed concept payloads that can feed Didactopus and page generation

Phase 5. Build `Didactopus` packs and learner navigation

Use Didactopus to:

create draft packs around concept neighborhoods or topical modules
carry groundrecall_query_bundle.json as a declared supporting artifact
expose learner-workbench context that includes review and graph signals
sequence "what next" items from prerequisites and nearby graph structure

Expected result:

learner-facing concept packs grounded in reviewed Notebook knowledge

Phase 6. Publish the Notebook

Publication outputs should probably include:

accessible concept pages
graph-first navigation controls
bibliography sections or per-page reading lists
illustration status or image slots
links into interactive apps and learner-workbench flows

Expected result:

a Notebook that is not just readable, but navigable through conceptual structure

This is the main product differentiator.

For each concept page, the learner should be able to see a small graph-guided navigation panel with categories such as:

Antecedent concepts Concepts that must usually be understood first
Closer concepts Nearby concepts in the same explanatory neighborhood
Derivative concepts Concepts that extend or depend on the current concept
Supporting sources Canonical bibliography or source entries that materially support the concept
Illustration opportunities Candidate figures or planned visual explanations

The labels can be refined later, but the structure should come from typed graph relations rather than from arbitrary page links alone.

The current stack does not need all of these on day one, but they are useful as target categories:

prerequisite
supports
contrasts_with
historical_predecessor
historical_successor
applies_to
example_of
misconception_about
illustrated_by

Some can live in GroundRecall first and only later appear in learner-facing Didactopus packs.

Illustration planning

Illustrations should be tracked as structured planning artifacts, not buried in page notes.

At minimum, each planned illustration should record:

target concept id
working caption or purpose
source grounding
rights/compliance note
priority
status

This can begin as JSON or markdown sidecars before becoming a richer model.

Bibliography strategy

The Notebook may want both:

per-concept reading lists
larger topical bibliographies

Recommended split:

CiteGeist maintains the main bibliography workbench and review discipline
GroundRecall stores links between concepts/claims and source artifacts
published Notebook pages surface only the citations relevant to the current concept and nearby graph region

That avoids turning the Notebook itself into the bibliography editor.

Concrete first pilot

A good first Notebook pilot would be one narrow concept region rather than the whole corpus.

For example:

historical development of evolutionary thought
evidence for common descent
natural selection and adaptation

Choose one region with:

1 to 3 textbooks
a small local article/blog corpus
one reviewed bibliography export
one explicit graph-navigation experiment

Recommended next implementation tasks

Provision one local Notebook corpus workspace outside the library root.
Choose one pilot concept region and one target concept.
Normalize one textbook source with doclift.
Provision one local TalkOrigins or Panda's Thumb snapshot.
Run CiteGeist on the pilot bibliography inputs.
Import the pilot sources into GroundRecall.
Export one groundrecall_query_bundle.json.
Feed that into a Didactopus pack flow.
Prototype one Notebook page that exposes graph-guided next-to-learn links.

Bottom line

The Notebook is a strong fit for the current stack if it is treated as:

concept-first
graph-guided
provenance-aware
bibliography-backed
learner-navigable

It is a weaker fit if treated as only a flat wiki rewrite of source material.

8.9 KiB

Raw Blame History

evo-edu Notebook Pipeline

Why this fits the current stack

Source classes

1. Web corpora

2. Scanned textbooks and monographs

3. Bibliographic seed corpora

4. Planned illustration sources

Recommended working position for the Notebook

Proposed pipeline

Phase 0. Provision the corpora locally

Phase 1. Normalize source material with `doclift`

Phase 2. Build bibliographic substrate with `CiteGeist`

Phase 3. Import canonical knowledge into `GroundRecall`

Phase 4. Export pack-ready concept bundles from `GroundRecall`

Phase 5. Build `Didactopus` packs and learner navigation

Phase 6. Publish the Notebook

Knowledge-graph-first navigation

Suggested relation types for Notebook navigation

Illustration planning

Bibliography strategy

Concrete first pilot

Recommended next implementation tasks

Bottom line

8.9 KiB Raw Blame History

evo-edu Notebook Pipeline

Why this fits the current stack

Source classes

1. Web corpora

2. Scanned textbooks and monographs

3. Bibliographic seed corpora

4. Planned illustration sources

Recommended working position for the Notebook

Proposed pipeline

Phase 0. Provision the corpora locally

Phase 1. Normalize source material with doclift

Phase 2. Build bibliographic substrate with CiteGeist

Phase 3. Import canonical knowledge into GroundRecall

Phase 4. Export pack-ready concept bundles from GroundRecall

Phase 5. Build Didactopus packs and learner navigation

Phase 6. Publish the Notebook

Knowledge-graph-first navigation

Suggested relation types for Notebook navigation

Illustration planning

Bibliography strategy

Concrete first pilot

Recommended next implementation tasks

Bottom line

8.9 KiB

Raw Blame History

Phase 1. Normalize source material with `doclift`

Phase 2. Build bibliographic substrate with `CiteGeist`

Phase 3. Import canonical knowledge into `GroundRecall`

Phase 4. Export pack-ready concept bundles from `GroundRecall`

Phase 5. Build `Didactopus` packs and learner navigation