344 lines
9.3 KiB
Markdown
344 lines
9.3 KiB
Markdown
# evo-edu Notebook Pipeline
|
|
|
|
This note turns the current `Notebook` idea into a concrete cross-repo
|
|
workflow for `doclift`, `GroundRecall`, `Didactopus`, and `CiteGeist`.
|
|
|
|
The target is the conceptual resource at:
|
|
|
|
- <https://evo-edu.org/evo/notebook/>
|
|
|
|
The important shift is that the Notebook should not be treated as "just another
|
|
wiki". The strongest differentiator available in the current stack is
|
|
graph-first navigation over reviewed concepts, claims, citations, and learner
|
|
next-step suggestions.
|
|
|
|
## Why this fits the current stack
|
|
|
|
The stack already divides responsibility in a useful way:
|
|
|
|
- `doclift`: normalize messy source material into deterministic bundles
|
|
- `GroundRecall`: canonical reviewed claims, concept graph, provenance, and
|
|
query/export surfaces
|
|
- `Didactopus`: learner-facing packs, sequencing, workbench flows, and concept
|
|
navigation
|
|
- `CiteGeist`: bibliography extraction, enrichment, review, and expansion
|
|
|
|
The Notebook use case needs all four:
|
|
|
|
- explanation text
|
|
- accessible concept sequencing
|
|
- explicit source grounding
|
|
- bibliography compilation and enrichment
|
|
- illustration planning
|
|
- visible graph structure for "what to learn next"
|
|
|
|
## Source classes
|
|
|
|
The Notebook will likely need at least four source classes.
|
|
|
|
### 1. Web corpora
|
|
|
|
Examples:
|
|
|
|
- TalkOrigins Archive FAQs and articles
|
|
- TalkDesign posts
|
|
- Panda's Thumb posts
|
|
|
|
Operational note:
|
|
|
|
- these corpora should be provisioned locally before ingestion
|
|
- do not rely on live scraping as the primary production path
|
|
- keep source snapshots versioned or at least manifest-tracked
|
|
|
|
### 2. Scanned textbooks and monographs
|
|
|
|
Examples already named:
|
|
|
|
- Futuyma, `Evolutionary Biology`
|
|
- Pianka, `Evolutionary Ecology`
|
|
- Bowler, `Evolution: The History of an Idea`
|
|
|
|
The current local library root is:
|
|
|
|
- `/mnt/CIFS/pengolodh/Docs/Library`
|
|
|
|
This should be treated as the upstream source corpus, not as the final working
|
|
directory for Notebook artifacts.
|
|
|
|
### 3. Bibliographic seed corpora
|
|
|
|
Examples:
|
|
|
|
- TalkOrigins bibliographies
|
|
- textbook reference sections
|
|
- existing `.bib` files in the library
|
|
|
|
These are where `CiteGeist` becomes especially important.
|
|
|
|
### 4. Planned illustration sources
|
|
|
|
These are not just assets. They should be reviewable planning objects:
|
|
|
|
- target concept
|
|
- illustration intent
|
|
- source basis
|
|
- rights/compliance note
|
|
- status: planned / needed / drafted / reviewed / published
|
|
|
|
## Recommended working position for the Notebook
|
|
|
|
The Notebook should be positioned as:
|
|
|
|
- a graph-guided conceptual atlas
|
|
- a source-grounded explanation layer
|
|
- a learner-facing bridge between articles, textbooks, and bibliographies
|
|
|
|
It should not try to compete by being the flattest or largest encyclopedia.
|
|
|
|
The distinguishing feature should be that the learner can see:
|
|
|
|
- antecedent concepts
|
|
- nearby or "closer" concepts
|
|
- derivative or downstream concepts
|
|
- representative supporting sources
|
|
- bibliography growth points
|
|
- illustration opportunities
|
|
|
|
That is much more consistent with the current stack than a generic article CMS.
|
|
|
|
## Proposed pipeline
|
|
|
|
### Phase 0. Provision the corpora locally
|
|
|
|
Create a local Notebook source workspace containing:
|
|
|
|
- provisioned web corpora snapshots
|
|
- selected textbook scan directories
|
|
- bibliography seeds
|
|
- source manifests
|
|
|
|
Expected result:
|
|
|
|
- stable local inputs for repeatable ingestion
|
|
|
|
### Phase 1. Normalize source material with `doclift`
|
|
|
|
Use `doclift` for:
|
|
|
|
- OCR-derived text normalization where practical
|
|
- sidecar generation
|
|
- `document.chunks.json` emission
|
|
- bundle manifests for scanned or converted materials
|
|
|
|
For web corpora, either:
|
|
|
|
- convert into bundle-like normalized document trees, or
|
|
- ingest through direct text/markdown adapters where that is simpler
|
|
|
|
Expected result:
|
|
|
|
- deterministic source bundles for longer-form documents
|
|
|
|
### Phase 2. Build bibliographic substrate with `CiteGeist`
|
|
|
|
Use `CiteGeist` to:
|
|
|
|
- scrape or ingest TalkOrigins bibliography materials
|
|
- expand weak references
|
|
- enrich textbook references
|
|
- cluster duplicates
|
|
- build review exports for uncertain entries
|
|
- maintain one or more Notebook `.bib` outputs
|
|
|
|
Expected result:
|
|
|
|
- a reviewed bibliography layer rather than ad hoc citation lists
|
|
|
|
### Phase 3. Import canonical knowledge into `GroundRecall`
|
|
|
|
Use `GroundRecall` to import:
|
|
|
|
- `doclift` bundles for textbooks and scans
|
|
- provisioned article/essay corpora
|
|
- optional Didactopus-native artifacts where useful
|
|
|
|
Then use its review flow to:
|
|
|
|
- standardize concepts
|
|
- preserve fragments and provenance
|
|
- compute graph diagnostics
|
|
- queue bridge/isolated/small-component concepts for review
|
|
- retain review rationale in promoted candidates
|
|
|
|
Expected result:
|
|
|
|
- canonical Notebook concept/claim substrate with provenance and graph signals
|
|
|
|
### Phase 4. Export pack-ready concept bundles from `GroundRecall`
|
|
|
|
For important notebook concepts, export:
|
|
|
|
- `groundrecall_query_bundle.json`
|
|
|
|
If you only need the page-ready artifact for a concept, `Didactopus` now also
|
|
has a direct wrapper that writes both the query bundle and `notebook_page.json`
|
|
into one output directory:
|
|
|
|
```bash
|
|
didactopus notebook-page-groundrecall \
|
|
/path/to/groundrecall-store \
|
|
natural-selection \
|
|
/tmp/notebook-page-export
|
|
```
|
|
|
|
This becomes the handoff object for learner-facing or page-facing pack flows.
|
|
|
|
Expected result:
|
|
|
|
- reviewed concept payloads that can feed Didactopus and page generation
|
|
|
|
### Phase 5. Build `Didactopus` packs and learner navigation
|
|
|
|
Use `Didactopus` to:
|
|
|
|
- create draft packs around concept neighborhoods or topical modules
|
|
- carry `groundrecall_query_bundle.json` as a declared supporting artifact
|
|
- expose learner-workbench context that includes review and graph signals
|
|
- sequence "what next" items from prerequisites and nearby graph structure
|
|
|
|
Expected result:
|
|
|
|
- learner-facing concept packs grounded in reviewed Notebook knowledge
|
|
|
|
### Phase 6. Publish the Notebook
|
|
|
|
Publication outputs should probably include:
|
|
|
|
- accessible concept pages
|
|
- graph-first navigation controls
|
|
- bibliography sections or per-page reading lists
|
|
- illustration status or image slots
|
|
- links into interactive apps and learner-workbench flows
|
|
|
|
Expected result:
|
|
|
|
- a Notebook that is not just readable, but navigable through conceptual
|
|
structure
|
|
|
|
## Knowledge-graph-first navigation
|
|
|
|
This is the main product differentiator.
|
|
|
|
For each concept page, the learner should be able to see a small graph-guided
|
|
navigation panel with categories such as:
|
|
|
|
- `Antecedent concepts`
|
|
Concepts that must usually be understood first
|
|
|
|
- `Closer concepts`
|
|
Nearby concepts in the same explanatory neighborhood
|
|
|
|
- `Derivative concepts`
|
|
Concepts that extend or depend on the current concept
|
|
|
|
- `Supporting sources`
|
|
Canonical bibliography or source entries that materially support the concept
|
|
|
|
- `Illustration opportunities`
|
|
Candidate figures or planned visual explanations
|
|
|
|
The labels can be refined later, but the structure should come from typed graph
|
|
relations rather than from arbitrary page links alone.
|
|
|
|
## Suggested relation types for Notebook navigation
|
|
|
|
The current stack does not need all of these on day one, but they are useful as
|
|
target categories:
|
|
|
|
- `prerequisite`
|
|
- `supports`
|
|
- `contrasts_with`
|
|
- `historical_predecessor`
|
|
- `historical_successor`
|
|
- `applies_to`
|
|
- `example_of`
|
|
- `misconception_about`
|
|
- `illustrated_by`
|
|
|
|
Some can live in `GroundRecall` first and only later appear in learner-facing
|
|
Didactopus packs.
|
|
|
|
## Illustration planning
|
|
|
|
Illustrations should be tracked as structured planning artifacts, not buried in
|
|
page notes.
|
|
|
|
At minimum, each planned illustration should record:
|
|
|
|
- target concept id
|
|
- working caption or purpose
|
|
- source grounding
|
|
- rights/compliance note
|
|
- priority
|
|
- status
|
|
|
|
This can begin as JSON or markdown sidecars before becoming a richer model.
|
|
|
|
## Bibliography strategy
|
|
|
|
The Notebook may want both:
|
|
|
|
- per-concept reading lists
|
|
- larger topical bibliographies
|
|
|
|
Recommended split:
|
|
|
|
- `CiteGeist` maintains the main bibliography workbench and review discipline
|
|
- `GroundRecall` stores links between concepts/claims and source artifacts
|
|
- published Notebook pages surface only the citations relevant to the current
|
|
concept and nearby graph region
|
|
|
|
That avoids turning the Notebook itself into the bibliography editor.
|
|
|
|
## Concrete first pilot
|
|
|
|
A good first Notebook pilot would be one narrow concept region rather than the
|
|
whole corpus.
|
|
|
|
For example:
|
|
|
|
- historical development of evolutionary thought
|
|
- evidence for common descent
|
|
- natural selection and adaptation
|
|
|
|
Choose one region with:
|
|
|
|
- 1 to 3 textbooks
|
|
- a small local article/blog corpus
|
|
- one reviewed bibliography export
|
|
- one explicit graph-navigation experiment
|
|
|
|
## Recommended next implementation tasks
|
|
|
|
1. Provision one local Notebook corpus workspace outside the library root.
|
|
2. Choose one pilot concept region and one target concept.
|
|
3. Normalize one textbook source with `doclift`.
|
|
4. Provision one local TalkOrigins or Panda's Thumb snapshot.
|
|
5. Run `CiteGeist` on the pilot bibliography inputs.
|
|
6. Import the pilot sources into `GroundRecall`.
|
|
7. Export one `groundrecall_query_bundle.json`.
|
|
8. Feed that into a `Didactopus` pack flow.
|
|
9. Prototype one Notebook page that exposes graph-guided next-to-learn links.
|
|
|
|
## Bottom line
|
|
|
|
The Notebook is a strong fit for the current stack if it is treated as:
|
|
|
|
- concept-first
|
|
- graph-guided
|
|
- provenance-aware
|
|
- bibliography-backed
|
|
- learner-navigable
|
|
|
|
It is a weaker fit if treated as only a flat wiki rewrite of source material.
|