Add evo-edu Notebook pipeline note
This commit is contained in:
parent
f296326d4c
commit
9d972d4144
|
|
@ -0,0 +1,332 @@
|
|||
# evo-edu Notebook Pipeline
|
||||
|
||||
This note turns the current `Notebook` idea into a concrete cross-repo
|
||||
workflow for `doclift`, `GroundRecall`, `Didactopus`, and `CiteGeist`.
|
||||
|
||||
The target is the conceptual resource at:
|
||||
|
||||
- <https://evo-edu.org/evo/notebook/>
|
||||
|
||||
The important shift is that the Notebook should not be treated as "just another
|
||||
wiki". The strongest differentiator available in the current stack is
|
||||
graph-first navigation over reviewed concepts, claims, citations, and learner
|
||||
next-step suggestions.
|
||||
|
||||
## Why this fits the current stack
|
||||
|
||||
The stack already divides responsibility in a useful way:
|
||||
|
||||
- `doclift`: normalize messy source material into deterministic bundles
|
||||
- `GroundRecall`: canonical reviewed claims, concept graph, provenance, and
|
||||
query/export surfaces
|
||||
- `Didactopus`: learner-facing packs, sequencing, workbench flows, and concept
|
||||
navigation
|
||||
- `CiteGeist`: bibliography extraction, enrichment, review, and expansion
|
||||
|
||||
The Notebook use case needs all four:
|
||||
|
||||
- explanation text
|
||||
- accessible concept sequencing
|
||||
- explicit source grounding
|
||||
- bibliography compilation and enrichment
|
||||
- illustration planning
|
||||
- visible graph structure for "what to learn next"
|
||||
|
||||
## Source classes
|
||||
|
||||
The Notebook will likely need at least four source classes.
|
||||
|
||||
### 1. Web corpora
|
||||
|
||||
Examples:
|
||||
|
||||
- TalkOrigins Archive FAQs and articles
|
||||
- TalkDesign posts
|
||||
- Panda's Thumb posts
|
||||
|
||||
Operational note:
|
||||
|
||||
- these corpora should be provisioned locally before ingestion
|
||||
- do not rely on live scraping as the primary production path
|
||||
- keep source snapshots versioned or at least manifest-tracked
|
||||
|
||||
### 2. Scanned textbooks and monographs
|
||||
|
||||
Examples already named:
|
||||
|
||||
- Futuyma, `Evolutionary Biology`
|
||||
- Pianka, `Evolutionary Ecology`
|
||||
- Bowler, `Evolution: The History of an Idea`
|
||||
|
||||
The current local library root is:
|
||||
|
||||
- `/mnt/CIFS/pengolodh/Docs/Library`
|
||||
|
||||
This should be treated as the upstream source corpus, not as the final working
|
||||
directory for Notebook artifacts.
|
||||
|
||||
### 3. Bibliographic seed corpora
|
||||
|
||||
Examples:
|
||||
|
||||
- TalkOrigins bibliographies
|
||||
- textbook reference sections
|
||||
- existing `.bib` files in the library
|
||||
|
||||
These are where `CiteGeist` becomes especially important.
|
||||
|
||||
### 4. Planned illustration sources
|
||||
|
||||
These are not just assets. They should be reviewable planning objects:
|
||||
|
||||
- target concept
|
||||
- illustration intent
|
||||
- source basis
|
||||
- rights/compliance note
|
||||
- status: planned / needed / drafted / reviewed / published
|
||||
|
||||
## Recommended working position for the Notebook
|
||||
|
||||
The Notebook should be positioned as:
|
||||
|
||||
- a graph-guided conceptual atlas
|
||||
- a source-grounded explanation layer
|
||||
- a learner-facing bridge between articles, textbooks, and bibliographies
|
||||
|
||||
It should not try to compete by being the flattest or largest encyclopedia.
|
||||
|
||||
The distinguishing feature should be that the learner can see:
|
||||
|
||||
- antecedent concepts
|
||||
- nearby or "closer" concepts
|
||||
- derivative or downstream concepts
|
||||
- representative supporting sources
|
||||
- bibliography growth points
|
||||
- illustration opportunities
|
||||
|
||||
That is much more consistent with the current stack than a generic article CMS.
|
||||
|
||||
## Proposed pipeline
|
||||
|
||||
### Phase 0. Provision the corpora locally
|
||||
|
||||
Create a local Notebook source workspace containing:
|
||||
|
||||
- provisioned web corpora snapshots
|
||||
- selected textbook scan directories
|
||||
- bibliography seeds
|
||||
- source manifests
|
||||
|
||||
Expected result:
|
||||
|
||||
- stable local inputs for repeatable ingestion
|
||||
|
||||
### Phase 1. Normalize source material with `doclift`
|
||||
|
||||
Use `doclift` for:
|
||||
|
||||
- OCR-derived text normalization where practical
|
||||
- sidecar generation
|
||||
- `document.chunks.json` emission
|
||||
- bundle manifests for scanned or converted materials
|
||||
|
||||
For web corpora, either:
|
||||
|
||||
- convert into bundle-like normalized document trees, or
|
||||
- ingest through direct text/markdown adapters where that is simpler
|
||||
|
||||
Expected result:
|
||||
|
||||
- deterministic source bundles for longer-form documents
|
||||
|
||||
### Phase 2. Build bibliographic substrate with `CiteGeist`
|
||||
|
||||
Use `CiteGeist` to:
|
||||
|
||||
- scrape or ingest TalkOrigins bibliography materials
|
||||
- expand weak references
|
||||
- enrich textbook references
|
||||
- cluster duplicates
|
||||
- build review exports for uncertain entries
|
||||
- maintain one or more Notebook `.bib` outputs
|
||||
|
||||
Expected result:
|
||||
|
||||
- a reviewed bibliography layer rather than ad hoc citation lists
|
||||
|
||||
### Phase 3. Import canonical knowledge into `GroundRecall`
|
||||
|
||||
Use `GroundRecall` to import:
|
||||
|
||||
- `doclift` bundles for textbooks and scans
|
||||
- provisioned article/essay corpora
|
||||
- optional Didactopus-native artifacts where useful
|
||||
|
||||
Then use its review flow to:
|
||||
|
||||
- standardize concepts
|
||||
- preserve fragments and provenance
|
||||
- compute graph diagnostics
|
||||
- queue bridge/isolated/small-component concepts for review
|
||||
- retain review rationale in promoted candidates
|
||||
|
||||
Expected result:
|
||||
|
||||
- canonical Notebook concept/claim substrate with provenance and graph signals
|
||||
|
||||
### Phase 4. Export pack-ready concept bundles from `GroundRecall`
|
||||
|
||||
For important notebook concepts, export:
|
||||
|
||||
- `groundrecall_query_bundle.json`
|
||||
|
||||
This becomes the handoff object for learner-facing or page-facing pack flows.
|
||||
|
||||
Expected result:
|
||||
|
||||
- reviewed concept payloads that can feed Didactopus and page generation
|
||||
|
||||
### Phase 5. Build `Didactopus` packs and learner navigation
|
||||
|
||||
Use `Didactopus` to:
|
||||
|
||||
- create draft packs around concept neighborhoods or topical modules
|
||||
- carry `groundrecall_query_bundle.json` as a declared supporting artifact
|
||||
- expose learner-workbench context that includes review and graph signals
|
||||
- sequence "what next" items from prerequisites and nearby graph structure
|
||||
|
||||
Expected result:
|
||||
|
||||
- learner-facing concept packs grounded in reviewed Notebook knowledge
|
||||
|
||||
### Phase 6. Publish the Notebook
|
||||
|
||||
Publication outputs should probably include:
|
||||
|
||||
- accessible concept pages
|
||||
- graph-first navigation controls
|
||||
- bibliography sections or per-page reading lists
|
||||
- illustration status or image slots
|
||||
- links into interactive apps and learner-workbench flows
|
||||
|
||||
Expected result:
|
||||
|
||||
- a Notebook that is not just readable, but navigable through conceptual
|
||||
structure
|
||||
|
||||
## Knowledge-graph-first navigation
|
||||
|
||||
This is the main product differentiator.
|
||||
|
||||
For each concept page, the learner should be able to see a small graph-guided
|
||||
navigation panel with categories such as:
|
||||
|
||||
- `Antecedent concepts`
|
||||
Concepts that must usually be understood first
|
||||
|
||||
- `Closer concepts`
|
||||
Nearby concepts in the same explanatory neighborhood
|
||||
|
||||
- `Derivative concepts`
|
||||
Concepts that extend or depend on the current concept
|
||||
|
||||
- `Supporting sources`
|
||||
Canonical bibliography or source entries that materially support the concept
|
||||
|
||||
- `Illustration opportunities`
|
||||
Candidate figures or planned visual explanations
|
||||
|
||||
The labels can be refined later, but the structure should come from typed graph
|
||||
relations rather than from arbitrary page links alone.
|
||||
|
||||
## Suggested relation types for Notebook navigation
|
||||
|
||||
The current stack does not need all of these on day one, but they are useful as
|
||||
target categories:
|
||||
|
||||
- `prerequisite`
|
||||
- `supports`
|
||||
- `contrasts_with`
|
||||
- `historical_predecessor`
|
||||
- `historical_successor`
|
||||
- `applies_to`
|
||||
- `example_of`
|
||||
- `misconception_about`
|
||||
- `illustrated_by`
|
||||
|
||||
Some can live in `GroundRecall` first and only later appear in learner-facing
|
||||
Didactopus packs.
|
||||
|
||||
## Illustration planning
|
||||
|
||||
Illustrations should be tracked as structured planning artifacts, not buried in
|
||||
page notes.
|
||||
|
||||
At minimum, each planned illustration should record:
|
||||
|
||||
- target concept id
|
||||
- working caption or purpose
|
||||
- source grounding
|
||||
- rights/compliance note
|
||||
- priority
|
||||
- status
|
||||
|
||||
This can begin as JSON or markdown sidecars before becoming a richer model.
|
||||
|
||||
## Bibliography strategy
|
||||
|
||||
The Notebook may want both:
|
||||
|
||||
- per-concept reading lists
|
||||
- larger topical bibliographies
|
||||
|
||||
Recommended split:
|
||||
|
||||
- `CiteGeist` maintains the main bibliography workbench and review discipline
|
||||
- `GroundRecall` stores links between concepts/claims and source artifacts
|
||||
- published Notebook pages surface only the citations relevant to the current
|
||||
concept and nearby graph region
|
||||
|
||||
That avoids turning the Notebook itself into the bibliography editor.
|
||||
|
||||
## Concrete first pilot
|
||||
|
||||
A good first Notebook pilot would be one narrow concept region rather than the
|
||||
whole corpus.
|
||||
|
||||
For example:
|
||||
|
||||
- historical development of evolutionary thought
|
||||
- evidence for common descent
|
||||
- natural selection and adaptation
|
||||
|
||||
Choose one region with:
|
||||
|
||||
- 1 to 3 textbooks
|
||||
- a small local article/blog corpus
|
||||
- one reviewed bibliography export
|
||||
- one explicit graph-navigation experiment
|
||||
|
||||
## Recommended next implementation tasks
|
||||
|
||||
1. Provision one local Notebook corpus workspace outside the library root.
|
||||
2. Choose one pilot concept region and one target concept.
|
||||
3. Normalize one textbook source with `doclift`.
|
||||
4. Provision one local TalkOrigins or Panda's Thumb snapshot.
|
||||
5. Run `CiteGeist` on the pilot bibliography inputs.
|
||||
6. Import the pilot sources into `GroundRecall`.
|
||||
7. Export one `groundrecall_query_bundle.json`.
|
||||
8. Feed that into a `Didactopus` pack flow.
|
||||
9. Prototype one Notebook page that exposes graph-guided next-to-learn links.
|
||||
|
||||
## Bottom line
|
||||
|
||||
The Notebook is a strong fit for the current stack if it is treated as:
|
||||
|
||||
- concept-first
|
||||
- graph-guided
|
||||
- provenance-aware
|
||||
- bibliography-backed
|
||||
- learner-navigable
|
||||
|
||||
It is a weaker fit if treated as only a flat wiki rewrite of source material.
|
||||
Loading…
Reference in New Issue