Compare commits
5 Commits
f296326d4c
...
addc920ac3
| Author | SHA1 | Date |
|---|---|---|
|
|
addc920ac3 | |
|
|
3a83fbb2cf | |
|
|
75f0b5c06a | |
|
|
839590006a | |
|
|
9d972d4144 |
14
README.md
14
README.md
|
|
@ -214,9 +214,23 @@ That flow:
|
|||
- makes the resulting pack consumable by the learner workbench with
|
||||
GroundRecall review and graph context intact
|
||||
|
||||
If you want just the Notebook page artifact without building a full pack, use
|
||||
the direct export wrapper:
|
||||
|
||||
```bash
|
||||
didactopus notebook-page-groundrecall \
|
||||
/path/to/groundrecall-store \
|
||||
channel-capacity \
|
||||
/tmp/notebook-page-export
|
||||
```
|
||||
|
||||
That command writes both `groundrecall_query_bundle.json` and
|
||||
`notebook_page.json` into the output directory.
|
||||
|
||||
The fuller bridge workflow is documented in:
|
||||
|
||||
- `docs/groundrecall-bridge.md`
|
||||
- `docs/evo-edu-notebook-pipeline.md`
|
||||
|
||||
## Didactopus As Pedagogy Support
|
||||
|
||||
|
|
|
|||
|
|
@ -0,0 +1,343 @@
|
|||
# evo-edu Notebook Pipeline
|
||||
|
||||
This note turns the current `Notebook` idea into a concrete cross-repo
|
||||
workflow for `doclift`, `GroundRecall`, `Didactopus`, and `CiteGeist`.
|
||||
|
||||
The target is the conceptual resource at:
|
||||
|
||||
- <https://evo-edu.org/evo/notebook/>
|
||||
|
||||
The important shift is that the Notebook should not be treated as "just another
|
||||
wiki". The strongest differentiator available in the current stack is
|
||||
graph-first navigation over reviewed concepts, claims, citations, and learner
|
||||
next-step suggestions.
|
||||
|
||||
## Why this fits the current stack
|
||||
|
||||
The stack already divides responsibility in a useful way:
|
||||
|
||||
- `doclift`: normalize messy source material into deterministic bundles
|
||||
- `GroundRecall`: canonical reviewed claims, concept graph, provenance, and
|
||||
query/export surfaces
|
||||
- `Didactopus`: learner-facing packs, sequencing, workbench flows, and concept
|
||||
navigation
|
||||
- `CiteGeist`: bibliography extraction, enrichment, review, and expansion
|
||||
|
||||
The Notebook use case needs all four:
|
||||
|
||||
- explanation text
|
||||
- accessible concept sequencing
|
||||
- explicit source grounding
|
||||
- bibliography compilation and enrichment
|
||||
- illustration planning
|
||||
- visible graph structure for "what to learn next"
|
||||
|
||||
## Source classes
|
||||
|
||||
The Notebook will likely need at least four source classes.
|
||||
|
||||
### 1. Web corpora
|
||||
|
||||
Examples:
|
||||
|
||||
- TalkOrigins Archive FAQs and articles
|
||||
- TalkDesign posts
|
||||
- Panda's Thumb posts
|
||||
|
||||
Operational note:
|
||||
|
||||
- these corpora should be provisioned locally before ingestion
|
||||
- do not rely on live scraping as the primary production path
|
||||
- keep source snapshots versioned or at least manifest-tracked
|
||||
|
||||
### 2. Scanned textbooks and monographs
|
||||
|
||||
Examples already named:
|
||||
|
||||
- Futuyma, `Evolutionary Biology`
|
||||
- Pianka, `Evolutionary Ecology`
|
||||
- Bowler, `Evolution: The History of an Idea`
|
||||
|
||||
The current local library root is:
|
||||
|
||||
- `/mnt/CIFS/pengolodh/Docs/Library`
|
||||
|
||||
This should be treated as the upstream source corpus, not as the final working
|
||||
directory for Notebook artifacts.
|
||||
|
||||
### 3. Bibliographic seed corpora
|
||||
|
||||
Examples:
|
||||
|
||||
- TalkOrigins bibliographies
|
||||
- textbook reference sections
|
||||
- existing `.bib` files in the library
|
||||
|
||||
These are where `CiteGeist` becomes especially important.
|
||||
|
||||
### 4. Planned illustration sources
|
||||
|
||||
These are not just assets. They should be reviewable planning objects:
|
||||
|
||||
- target concept
|
||||
- illustration intent
|
||||
- source basis
|
||||
- rights/compliance note
|
||||
- status: planned / needed / drafted / reviewed / published
|
||||
|
||||
## Recommended working position for the Notebook
|
||||
|
||||
The Notebook should be positioned as:
|
||||
|
||||
- a graph-guided conceptual atlas
|
||||
- a source-grounded explanation layer
|
||||
- a learner-facing bridge between articles, textbooks, and bibliographies
|
||||
|
||||
It should not try to compete by being the flattest or largest encyclopedia.
|
||||
|
||||
The distinguishing feature should be that the learner can see:
|
||||
|
||||
- antecedent concepts
|
||||
- nearby or "closer" concepts
|
||||
- derivative or downstream concepts
|
||||
- representative supporting sources
|
||||
- bibliography growth points
|
||||
- illustration opportunities
|
||||
|
||||
That is much more consistent with the current stack than a generic article CMS.
|
||||
|
||||
## Proposed pipeline
|
||||
|
||||
### Phase 0. Provision the corpora locally
|
||||
|
||||
Create a local Notebook source workspace containing:
|
||||
|
||||
- provisioned web corpora snapshots
|
||||
- selected textbook scan directories
|
||||
- bibliography seeds
|
||||
- source manifests
|
||||
|
||||
Expected result:
|
||||
|
||||
- stable local inputs for repeatable ingestion
|
||||
|
||||
### Phase 1. Normalize source material with `doclift`
|
||||
|
||||
Use `doclift` for:
|
||||
|
||||
- OCR-derived text normalization where practical
|
||||
- sidecar generation
|
||||
- `document.chunks.json` emission
|
||||
- bundle manifests for scanned or converted materials
|
||||
|
||||
For web corpora, either:
|
||||
|
||||
- convert into bundle-like normalized document trees, or
|
||||
- ingest through direct text/markdown adapters where that is simpler
|
||||
|
||||
Expected result:
|
||||
|
||||
- deterministic source bundles for longer-form documents
|
||||
|
||||
### Phase 2. Build bibliographic substrate with `CiteGeist`
|
||||
|
||||
Use `CiteGeist` to:
|
||||
|
||||
- scrape or ingest TalkOrigins bibliography materials
|
||||
- expand weak references
|
||||
- enrich textbook references
|
||||
- cluster duplicates
|
||||
- build review exports for uncertain entries
|
||||
- maintain one or more Notebook `.bib` outputs
|
||||
|
||||
Expected result:
|
||||
|
||||
- a reviewed bibliography layer rather than ad hoc citation lists
|
||||
|
||||
### Phase 3. Import canonical knowledge into `GroundRecall`
|
||||
|
||||
Use `GroundRecall` to import:
|
||||
|
||||
- `doclift` bundles for textbooks and scans
|
||||
- provisioned article/essay corpora
|
||||
- optional Didactopus-native artifacts where useful
|
||||
|
||||
Then use its review flow to:
|
||||
|
||||
- standardize concepts
|
||||
- preserve fragments and provenance
|
||||
- compute graph diagnostics
|
||||
- queue bridge/isolated/small-component concepts for review
|
||||
- retain review rationale in promoted candidates
|
||||
|
||||
Expected result:
|
||||
|
||||
- canonical Notebook concept/claim substrate with provenance and graph signals
|
||||
|
||||
### Phase 4. Export pack-ready concept bundles from `GroundRecall`
|
||||
|
||||
For important notebook concepts, export:
|
||||
|
||||
- `groundrecall_query_bundle.json`
|
||||
|
||||
If you only need the page-ready artifact for a concept, `Didactopus` now also
|
||||
has a direct wrapper that writes both the query bundle and `notebook_page.json`
|
||||
into one output directory:
|
||||
|
||||
```bash
|
||||
didactopus notebook-page-groundrecall \
|
||||
/path/to/groundrecall-store \
|
||||
natural-selection \
|
||||
/tmp/notebook-page-export
|
||||
```
|
||||
|
||||
This becomes the handoff object for learner-facing or page-facing pack flows.
|
||||
|
||||
Expected result:
|
||||
|
||||
- reviewed concept payloads that can feed Didactopus and page generation
|
||||
|
||||
### Phase 5. Build `Didactopus` packs and learner navigation
|
||||
|
||||
Use `Didactopus` to:
|
||||
|
||||
- create draft packs around concept neighborhoods or topical modules
|
||||
- carry `groundrecall_query_bundle.json` as a declared supporting artifact
|
||||
- expose learner-workbench context that includes review and graph signals
|
||||
- sequence "what next" items from prerequisites and nearby graph structure
|
||||
|
||||
Expected result:
|
||||
|
||||
- learner-facing concept packs grounded in reviewed Notebook knowledge
|
||||
|
||||
### Phase 6. Publish the Notebook
|
||||
|
||||
Publication outputs should probably include:
|
||||
|
||||
- accessible concept pages
|
||||
- graph-first navigation controls
|
||||
- bibliography sections or per-page reading lists
|
||||
- illustration status or image slots
|
||||
- links into interactive apps and learner-workbench flows
|
||||
|
||||
Expected result:
|
||||
|
||||
- a Notebook that is not just readable, but navigable through conceptual
|
||||
structure
|
||||
|
||||
## Knowledge-graph-first navigation
|
||||
|
||||
This is the main product differentiator.
|
||||
|
||||
For each concept page, the learner should be able to see a small graph-guided
|
||||
navigation panel with categories such as:
|
||||
|
||||
- `Antecedent concepts`
|
||||
Concepts that must usually be understood first
|
||||
|
||||
- `Closer concepts`
|
||||
Nearby concepts in the same explanatory neighborhood
|
||||
|
||||
- `Derivative concepts`
|
||||
Concepts that extend or depend on the current concept
|
||||
|
||||
- `Supporting sources`
|
||||
Canonical bibliography or source entries that materially support the concept
|
||||
|
||||
- `Illustration opportunities`
|
||||
Candidate figures or planned visual explanations
|
||||
|
||||
The labels can be refined later, but the structure should come from typed graph
|
||||
relations rather than from arbitrary page links alone.
|
||||
|
||||
## Suggested relation types for Notebook navigation
|
||||
|
||||
The current stack does not need all of these on day one, but they are useful as
|
||||
target categories:
|
||||
|
||||
- `prerequisite`
|
||||
- `supports`
|
||||
- `contrasts_with`
|
||||
- `historical_predecessor`
|
||||
- `historical_successor`
|
||||
- `applies_to`
|
||||
- `example_of`
|
||||
- `misconception_about`
|
||||
- `illustrated_by`
|
||||
|
||||
Some can live in `GroundRecall` first and only later appear in learner-facing
|
||||
Didactopus packs.
|
||||
|
||||
## Illustration planning
|
||||
|
||||
Illustrations should be tracked as structured planning artifacts, not buried in
|
||||
page notes.
|
||||
|
||||
At minimum, each planned illustration should record:
|
||||
|
||||
- target concept id
|
||||
- working caption or purpose
|
||||
- source grounding
|
||||
- rights/compliance note
|
||||
- priority
|
||||
- status
|
||||
|
||||
This can begin as JSON or markdown sidecars before becoming a richer model.
|
||||
|
||||
## Bibliography strategy
|
||||
|
||||
The Notebook may want both:
|
||||
|
||||
- per-concept reading lists
|
||||
- larger topical bibliographies
|
||||
|
||||
Recommended split:
|
||||
|
||||
- `CiteGeist` maintains the main bibliography workbench and review discipline
|
||||
- `GroundRecall` stores links between concepts/claims and source artifacts
|
||||
- published Notebook pages surface only the citations relevant to the current
|
||||
concept and nearby graph region
|
||||
|
||||
That avoids turning the Notebook itself into the bibliography editor.
|
||||
|
||||
## Concrete first pilot
|
||||
|
||||
A good first Notebook pilot would be one narrow concept region rather than the
|
||||
whole corpus.
|
||||
|
||||
For example:
|
||||
|
||||
- historical development of evolutionary thought
|
||||
- evidence for common descent
|
||||
- natural selection and adaptation
|
||||
|
||||
Choose one region with:
|
||||
|
||||
- 1 to 3 textbooks
|
||||
- a small local article/blog corpus
|
||||
- one reviewed bibliography export
|
||||
- one explicit graph-navigation experiment
|
||||
|
||||
## Recommended next implementation tasks
|
||||
|
||||
1. Provision one local Notebook corpus workspace outside the library root.
|
||||
2. Choose one pilot concept region and one target concept.
|
||||
3. Normalize one textbook source with `doclift`.
|
||||
4. Provision one local TalkOrigins or Panda's Thumb snapshot.
|
||||
5. Run `CiteGeist` on the pilot bibliography inputs.
|
||||
6. Import the pilot sources into `GroundRecall`.
|
||||
7. Export one `groundrecall_query_bundle.json`.
|
||||
8. Feed that into a `Didactopus` pack flow.
|
||||
9. Prototype one Notebook page that exposes graph-guided next-to-learn links.
|
||||
|
||||
## Bottom line
|
||||
|
||||
The Notebook is a strong fit for the current stack if it is treated as:
|
||||
|
||||
- concept-first
|
||||
- graph-guided
|
||||
- provenance-aware
|
||||
- bibliography-backed
|
||||
- learner-navigable
|
||||
|
||||
It is a weaker fit if treated as only a flat wiki rewrite of source material.
|
||||
|
|
@ -96,3 +96,12 @@ Run the plain `doclift` bundle conversion without GroundRecall:
|
|||
```bash
|
||||
didactopus doclift-bundle /tmp/doclift-bundle /tmp/didactopus-pack --course-title "Example Course"
|
||||
```
|
||||
|
||||
Build just the Notebook page artifact from a GroundRecall concept:
|
||||
|
||||
```bash
|
||||
didactopus notebook-page-groundrecall \
|
||||
/path/to/groundrecall-store \
|
||||
channel-capacity \
|
||||
/tmp/notebook-page-export
|
||||
```
|
||||
|
|
|
|||
|
|
@ -63,6 +63,7 @@ def run_doclift_bundle_demo(
|
|||
"concept_count": len(ctx.concepts),
|
||||
"review_flags": list(ctx.review_flags),
|
||||
"groundrecall_bundle_included": bool(groundrecall_bundle),
|
||||
"notebook_page_included": bool(groundrecall_bundle),
|
||||
}
|
||||
(pack_dir / "doclift_bundle_summary.json").write_text(json.dumps(summary, indent=2), encoding="utf-8")
|
||||
return summary
|
||||
|
|
|
|||
|
|
@ -7,6 +7,8 @@ from pathlib import Path
|
|||
from .config import load_config
|
||||
from .doclift_bundle_demo import run_doclift_bundle_demo
|
||||
from .groundrecall_pack_bridge import run_doclift_bundle_with_groundrecall
|
||||
from .notebook_page import export_notebook_page_from_groundrecall_bundle
|
||||
from .notebook_page import export_notebook_page_from_groundrecall_store
|
||||
from .review_loader import load_draft_pack
|
||||
from .review_schema import ReviewSession, ReviewAction
|
||||
from .review_actions import apply_action
|
||||
|
|
@ -48,6 +50,21 @@ def build_parser() -> argparse.ArgumentParser:
|
|||
doclift_gr_parser.add_argument("--course-title", required=True)
|
||||
doclift_gr_parser.add_argument("--author", default="doclift bundle import")
|
||||
doclift_gr_parser.add_argument("--license-name", default="See source bundle metadata")
|
||||
|
||||
notebook_parser = subparsers.add_parser(
|
||||
"notebook-page",
|
||||
help="Build a Notebook page payload from a GroundRecall query bundle",
|
||||
)
|
||||
notebook_parser.add_argument("groundrecall_query_bundle")
|
||||
notebook_parser.add_argument("output_path")
|
||||
|
||||
notebook_gr_parser = subparsers.add_parser(
|
||||
"notebook-page-groundrecall",
|
||||
help="Build a Notebook page and query bundle directly from a GroundRecall concept",
|
||||
)
|
||||
notebook_gr_parser.add_argument("groundrecall_store_dir")
|
||||
notebook_gr_parser.add_argument("groundrecall_concept_ref")
|
||||
notebook_gr_parser.add_argument("output_dir")
|
||||
return parser
|
||||
|
||||
|
||||
|
|
@ -120,4 +137,19 @@ def main() -> None:
|
|||
)
|
||||
print(summary)
|
||||
return
|
||||
if args.command == "notebook-page":
|
||||
summary = export_notebook_page_from_groundrecall_bundle(
|
||||
args.groundrecall_query_bundle,
|
||||
args.output_path,
|
||||
)
|
||||
print(summary)
|
||||
return
|
||||
if args.command == "notebook-page-groundrecall":
|
||||
summary = export_notebook_page_from_groundrecall_store(
|
||||
args.groundrecall_store_dir,
|
||||
args.groundrecall_concept_ref,
|
||||
args.output_dir,
|
||||
)
|
||||
print(summary)
|
||||
return
|
||||
build_parser().print_help()
|
||||
|
|
|
|||
|
|
@ -0,0 +1,231 @@
|
|||
from __future__ import annotations
|
||||
|
||||
import json
|
||||
from pathlib import Path
|
||||
import sys
|
||||
from typing import Any
|
||||
|
||||
|
||||
_ANTECEDENT_TYPES = {"prerequisite", "historical_predecessor"}
|
||||
_DERIVATIVE_TYPES = {"historical_successor"}
|
||||
|
||||
|
||||
def _concept_entry(concept: dict[str, Any], relation_types: set[str] | None = None) -> dict[str, Any]:
|
||||
entry = {
|
||||
"concept_id": concept.get("concept_id", ""),
|
||||
"title": concept.get("title", ""),
|
||||
"description": concept.get("description", ""),
|
||||
}
|
||||
if relation_types:
|
||||
entry["relation_types"] = sorted(relation_types)
|
||||
return entry
|
||||
|
||||
|
||||
def _bucket_relation(
|
||||
relation: dict[str, Any],
|
||||
concept_id: str,
|
||||
concepts_by_id: dict[str, dict[str, Any]],
|
||||
) -> tuple[str | None, dict[str, Any] | None]:
|
||||
source_id = str(relation.get("source_id", ""))
|
||||
target_id = str(relation.get("target_id", ""))
|
||||
relation_type = str(relation.get("relation_type", "")).strip() or "related_to"
|
||||
if concept_id not in {source_id, target_id}:
|
||||
return None, None
|
||||
|
||||
other_id = target_id if source_id == concept_id else source_id
|
||||
other = concepts_by_id.get(other_id)
|
||||
if other is None:
|
||||
return None, None
|
||||
|
||||
if relation_type in _ANTECEDENT_TYPES:
|
||||
bucket = "antecedent_concepts" if target_id == concept_id else "derivative_concepts"
|
||||
elif relation_type in _DERIVATIVE_TYPES:
|
||||
bucket = "derivative_concepts" if source_id == concept_id else "antecedent_concepts"
|
||||
else:
|
||||
bucket = "closer_concepts"
|
||||
|
||||
return bucket, _concept_entry(other, {relation_type})
|
||||
|
||||
|
||||
def _merge_bucket_entries(items: list[dict[str, Any]]) -> list[dict[str, Any]]:
|
||||
merged: dict[str, dict[str, Any]] = {}
|
||||
for item in items:
|
||||
concept_id = str(item.get("concept_id", ""))
|
||||
if not concept_id:
|
||||
continue
|
||||
existing = merged.setdefault(
|
||||
concept_id,
|
||||
{
|
||||
"concept_id": concept_id,
|
||||
"title": item.get("title", ""),
|
||||
"description": item.get("description", ""),
|
||||
"relation_types": [],
|
||||
},
|
||||
)
|
||||
existing["relation_types"] = sorted(set(existing["relation_types"]) | set(item.get("relation_types", [])))
|
||||
return list(merged.values())
|
||||
|
||||
|
||||
def _review_context(bundle: dict[str, Any]) -> dict[str, Any]:
|
||||
review_candidates = bundle.get("review_candidates", []) or []
|
||||
graph_codes = sorted(
|
||||
{
|
||||
code
|
||||
for item in review_candidates
|
||||
for code in item.get("finding_codes", []) or []
|
||||
if "concept" in str(code) or "bridge" in str(code) or "component" in str(code)
|
||||
}
|
||||
)
|
||||
top_rationales = [str(item.get("rationale", "")).strip() for item in review_candidates if str(item.get("rationale", "")).strip()][:3]
|
||||
return {
|
||||
"review_candidate_count": len(review_candidates),
|
||||
"graph_codes": graph_codes,
|
||||
"top_rationales": top_rationales,
|
||||
}
|
||||
|
||||
|
||||
def _supporting_sources(bundle: dict[str, Any]) -> list[dict[str, Any]]:
|
||||
artifacts = bundle.get("source_artifacts", []) or []
|
||||
observations = bundle.get("supporting_observations", []) or []
|
||||
by_origin: dict[str, int] = {}
|
||||
for observation in observations:
|
||||
origin_path = str(observation.get("origin_path", "")).strip()
|
||||
if origin_path:
|
||||
by_origin[origin_path] = by_origin.get(origin_path, 0) + 1
|
||||
|
||||
sources = []
|
||||
for artifact in artifacts:
|
||||
path = str(artifact.get("path", "")).strip()
|
||||
sources.append(
|
||||
{
|
||||
"artifact_id": artifact.get("artifact_id", ""),
|
||||
"title": artifact.get("title", ""),
|
||||
"path": path,
|
||||
"artifact_kind": artifact.get("artifact_kind", ""),
|
||||
"supporting_observation_count": by_origin.get(path, 0),
|
||||
}
|
||||
)
|
||||
return sources
|
||||
|
||||
|
||||
def _illustration_opportunities(bundle: dict[str, Any], navigation: dict[str, list[dict[str, Any]]]) -> list[dict[str, Any]]:
|
||||
concept = bundle.get("concept", {}) or {}
|
||||
concept_title = str(concept.get("title", "")).strip() or str(concept.get("concept_id", "")).strip()
|
||||
opportunities = []
|
||||
if navigation["antecedent_concepts"] or navigation["derivative_concepts"]:
|
||||
opportunities.append(
|
||||
{
|
||||
"kind": "concept_path",
|
||||
"target_concept_id": concept.get("concept_id", ""),
|
||||
"purpose": f"Show how {concept_title} fits into a prerequisite or downstream concept path.",
|
||||
"status": "planned",
|
||||
}
|
||||
)
|
||||
if navigation["closer_concepts"]:
|
||||
titles = ", ".join(item["title"] for item in navigation["closer_concepts"][:3] if item.get("title"))
|
||||
opportunities.append(
|
||||
{
|
||||
"kind": "comparison",
|
||||
"target_concept_id": concept.get("concept_id", ""),
|
||||
"purpose": f"Compare {concept_title} with nearby concepts: {titles}." if titles else f"Compare {concept_title} with nearby concepts.",
|
||||
"status": "planned",
|
||||
}
|
||||
)
|
||||
if bundle.get("supporting_observations"):
|
||||
opportunities.append(
|
||||
{
|
||||
"kind": "evidence_trace",
|
||||
"target_concept_id": concept.get("concept_id", ""),
|
||||
"purpose": f"Trace the evidence and claims currently grounding {concept_title}.",
|
||||
"status": "planned",
|
||||
}
|
||||
)
|
||||
return opportunities
|
||||
|
||||
|
||||
def build_notebook_page_from_groundrecall_bundle(bundle: dict[str, Any]) -> dict[str, Any]:
|
||||
concept = bundle.get("concept", {}) or {}
|
||||
concept_id = str(concept.get("concept_id", "")).strip()
|
||||
concepts_by_id = {concept_id: concept}
|
||||
for item in bundle.get("related_concepts", []) or []:
|
||||
item_id = str(item.get("concept_id", "")).strip()
|
||||
if item_id:
|
||||
concepts_by_id[item_id] = item
|
||||
|
||||
navigation: dict[str, list[dict[str, Any]]] = {
|
||||
"antecedent_concepts": [],
|
||||
"closer_concepts": [],
|
||||
"derivative_concepts": [],
|
||||
}
|
||||
for relation in bundle.get("relations", []) or []:
|
||||
bucket, entry = _bucket_relation(relation, concept_id, concepts_by_id)
|
||||
if bucket and entry:
|
||||
navigation[bucket].append(entry)
|
||||
|
||||
navigation = {key: _merge_bucket_entries(items) for key, items in navigation.items()}
|
||||
supporting_observations = bundle.get("supporting_observations", []) or []
|
||||
supporting_excerpts = [
|
||||
{
|
||||
"observation_id": item.get("observation_id", ""),
|
||||
"text": item.get("text", ""),
|
||||
"origin_path": item.get("origin_path", ""),
|
||||
"grounding_status": item.get("grounding_status", ""),
|
||||
}
|
||||
for item in supporting_observations[:5]
|
||||
]
|
||||
|
||||
return {
|
||||
"page_kind": "didactopus_notebook_page",
|
||||
"concept": {
|
||||
"concept_id": concept.get("concept_id", ""),
|
||||
"title": concept.get("title", ""),
|
||||
"description": concept.get("description", ""),
|
||||
"aliases": concept.get("aliases", []) or [],
|
||||
},
|
||||
"summary": {
|
||||
"claim_count": len(bundle.get("relevant_claims", []) or []),
|
||||
"supporting_observation_count": len(supporting_observations),
|
||||
"related_concept_count": len(bundle.get("related_concepts", []) or []),
|
||||
},
|
||||
"graph_navigation": navigation,
|
||||
"supporting_sources": _supporting_sources(bundle),
|
||||
"supporting_excerpts": supporting_excerpts,
|
||||
"review_context": _review_context(bundle),
|
||||
"illustration_opportunities": _illustration_opportunities(bundle, navigation),
|
||||
"suggested_next_actions": bundle.get("suggested_next_actions", []) or [],
|
||||
}
|
||||
|
||||
|
||||
def export_notebook_page_from_groundrecall_bundle(bundle_path: str | Path, out_path: str | Path) -> dict[str, Any]:
|
||||
bundle_file = Path(bundle_path)
|
||||
payload = json.loads(bundle_file.read_text(encoding="utf-8"))
|
||||
page = build_notebook_page_from_groundrecall_bundle(payload)
|
||||
target = Path(out_path)
|
||||
target.parent.mkdir(parents=True, exist_ok=True)
|
||||
target.write_text(json.dumps(page, indent=2), encoding="utf-8")
|
||||
return {"page_path": str(target), "page": page}
|
||||
|
||||
|
||||
def export_notebook_page_from_groundrecall_store(
|
||||
store_dir: str | Path,
|
||||
concept_ref: str,
|
||||
out_dir: str | Path,
|
||||
) -> dict[str, Any]:
|
||||
export_groundrecall_query_bundle = _load_groundrecall_export()
|
||||
target = Path(out_dir)
|
||||
target.mkdir(parents=True, exist_ok=True)
|
||||
exported = export_groundrecall_query_bundle(store_dir, concept_ref, target)
|
||||
page_path = target / "notebook_page.json"
|
||||
page_result = export_notebook_page_from_groundrecall_bundle(exported["bundle_path"], page_path)
|
||||
page_result["groundrecall_query_bundle_path"] = exported["bundle_path"]
|
||||
page_result["concept_ref"] = concept_ref
|
||||
return page_result
|
||||
|
||||
|
||||
def _load_groundrecall_export():
|
||||
groundrecall_src = Path("/home/netuser/bin/GroundRecall/src")
|
||||
if groundrecall_src.exists():
|
||||
sys.path.insert(0, str(groundrecall_src))
|
||||
from groundrecall.export import export_groundrecall_query_bundle # type: ignore
|
||||
|
||||
return export_groundrecall_query_bundle
|
||||
|
|
@ -4,6 +4,7 @@ from pathlib import Path
|
|||
import json
|
||||
import yaml
|
||||
from .course_schema import NormalizedCourse, ConceptCandidate, DraftPack
|
||||
from .notebook_page import build_notebook_page_from_groundrecall_bundle
|
||||
|
||||
|
||||
def build_source_corpus(course: NormalizedCourse) -> dict:
|
||||
|
|
@ -76,7 +77,7 @@ def build_draft_pack(
|
|||
pack_name = course.title.lower().replace(" ", "-")
|
||||
supporting_artifacts = ["source_corpus.json", "knowledge_graph.json"]
|
||||
if groundrecall_query_bundle is not None:
|
||||
supporting_artifacts.append("groundrecall_query_bundle.json")
|
||||
supporting_artifacts.extend(["groundrecall_query_bundle.json", "notebook_page.json"])
|
||||
pack = {
|
||||
"name": pack_name,
|
||||
"display_name": course.title,
|
||||
|
|
@ -140,6 +141,7 @@ def build_draft_pack(
|
|||
}
|
||||
if groundrecall_query_bundle is not None:
|
||||
attribution["groundrecall_query_bundle"] = groundrecall_query_bundle
|
||||
attribution["notebook_page"] = build_notebook_page_from_groundrecall_bundle(groundrecall_query_bundle)
|
||||
return DraftPack(
|
||||
pack=pack,
|
||||
concepts=concepts_yaml,
|
||||
|
|
@ -170,6 +172,11 @@ def write_draft_pack(pack: DraftPack, outdir: str | Path) -> None:
|
|||
json.dumps(pack.attribution["groundrecall_query_bundle"], indent=2),
|
||||
encoding="utf-8",
|
||||
)
|
||||
if isinstance(pack.attribution.get("notebook_page"), dict):
|
||||
(out / "notebook_page.json").write_text(
|
||||
json.dumps(pack.attribution["notebook_page"], indent=2),
|
||||
encoding="utf-8",
|
||||
)
|
||||
|
||||
|
||||
def write_source_corpus(course: NormalizedCourse, outdir: str | Path) -> None:
|
||||
|
|
|
|||
|
|
@ -96,3 +96,61 @@ def test_main_legacy_review_mode_uses_review_parser(monkeypatch, tmp_path: Path)
|
|||
|
||||
assert called["draft_pack"] == str(tmp_path / "draft")
|
||||
assert called["output_dir"] == str(tmp_path / "out")
|
||||
|
||||
|
||||
def test_main_notebook_page_subcommand(monkeypatch, capsys, tmp_path: Path) -> None:
|
||||
captured: dict = {}
|
||||
|
||||
def _fake_export_notebook_page_from_groundrecall_bundle(bundle_path, out_path):
|
||||
captured["bundle_path"] = str(bundle_path)
|
||||
captured["out_path"] = str(out_path)
|
||||
return {"page_path": str(out_path)}
|
||||
|
||||
monkeypatch.setattr(main_module, "export_notebook_page_from_groundrecall_bundle", _fake_export_notebook_page_from_groundrecall_bundle)
|
||||
monkeypatch.setattr(
|
||||
main_module.sys,
|
||||
"argv",
|
||||
[
|
||||
"didactopus",
|
||||
"notebook-page",
|
||||
str(tmp_path / "groundrecall_query_bundle.json"),
|
||||
str(tmp_path / "notebook_page.json"),
|
||||
],
|
||||
)
|
||||
|
||||
main_module.main()
|
||||
out = capsys.readouterr().out
|
||||
|
||||
assert captured["bundle_path"].endswith("groundrecall_query_bundle.json")
|
||||
assert captured["out_path"].endswith("notebook_page.json")
|
||||
assert "page_path" in out
|
||||
|
||||
|
||||
def test_main_notebook_page_groundrecall_subcommand(monkeypatch, capsys, tmp_path: Path) -> None:
|
||||
captured: dict = {}
|
||||
|
||||
def _fake_export_notebook_page_from_groundrecall_store(store_dir, concept_ref, out_dir):
|
||||
captured["store_dir"] = str(store_dir)
|
||||
captured["concept_ref"] = concept_ref
|
||||
captured["out_dir"] = str(out_dir)
|
||||
return {"page_path": str(Path(out_dir) / "notebook_page.json")}
|
||||
|
||||
monkeypatch.setattr(main_module, "export_notebook_page_from_groundrecall_store", _fake_export_notebook_page_from_groundrecall_store)
|
||||
monkeypatch.setattr(
|
||||
main_module.sys,
|
||||
"argv",
|
||||
[
|
||||
"didactopus",
|
||||
"notebook-page-groundrecall",
|
||||
str(tmp_path / "store"),
|
||||
"natural-selection",
|
||||
str(tmp_path / "out"),
|
||||
],
|
||||
)
|
||||
|
||||
main_module.main()
|
||||
out = capsys.readouterr().out
|
||||
|
||||
assert captured["concept_ref"] == "natural-selection"
|
||||
assert captured["out_dir"].endswith("out")
|
||||
assert "page_path" in out
|
||||
|
|
|
|||
|
|
@ -0,0 +1,142 @@
|
|||
from __future__ import annotations
|
||||
|
||||
import json
|
||||
from pathlib import Path
|
||||
|
||||
from didactopus.notebook_page import (
|
||||
build_notebook_page_from_groundrecall_bundle,
|
||||
export_notebook_page_from_groundrecall_bundle,
|
||||
export_notebook_page_from_groundrecall_store,
|
||||
)
|
||||
|
||||
|
||||
def _sample_bundle() -> dict:
|
||||
return {
|
||||
"bundle_kind": "groundrecall_query_bundle",
|
||||
"concept": {
|
||||
"concept_id": "concept::natural-selection",
|
||||
"title": "Natural Selection",
|
||||
"description": "Differential survival and reproduction.",
|
||||
"aliases": ["selection"],
|
||||
},
|
||||
"relevant_claims": [
|
||||
{"claim_id": "clm_001", "claim_text": "Selection can change trait frequencies."},
|
||||
{"claim_id": "clm_002", "claim_text": "Selection depends on heritable variation."},
|
||||
],
|
||||
"relations": [
|
||||
{
|
||||
"relation_id": "rel_001",
|
||||
"source_id": "concept::variation",
|
||||
"target_id": "concept::natural-selection",
|
||||
"relation_type": "prerequisite",
|
||||
},
|
||||
{
|
||||
"relation_id": "rel_002",
|
||||
"source_id": "concept::natural-selection",
|
||||
"target_id": "concept::adaptation",
|
||||
"relation_type": "historical_successor",
|
||||
},
|
||||
{
|
||||
"relation_id": "rel_003",
|
||||
"source_id": "concept::natural-selection",
|
||||
"target_id": "concept::common-descent",
|
||||
"relation_type": "supports",
|
||||
},
|
||||
],
|
||||
"related_concepts": [
|
||||
{
|
||||
"concept_id": "concept::variation",
|
||||
"title": "Variation",
|
||||
"description": "Differences among individuals.",
|
||||
},
|
||||
{
|
||||
"concept_id": "concept::adaptation",
|
||||
"title": "Adaptation",
|
||||
"description": "Traits fit to local conditions.",
|
||||
},
|
||||
{
|
||||
"concept_id": "concept::common-descent",
|
||||
"title": "Common Descent",
|
||||
"description": "Shared ancestry of organisms.",
|
||||
},
|
||||
],
|
||||
"supporting_observations": [
|
||||
{
|
||||
"observation_id": "obs_001",
|
||||
"text": "Population differences can affect survival.",
|
||||
"origin_path": "texts/futuyma/ch1.md",
|
||||
"grounding_status": "grounded",
|
||||
}
|
||||
],
|
||||
"source_artifacts": [
|
||||
{
|
||||
"artifact_id": "art_001",
|
||||
"artifact_kind": "compiled_page",
|
||||
"title": "Evolutionary Biology Chapter 1",
|
||||
"path": "texts/futuyma/ch1.md",
|
||||
}
|
||||
],
|
||||
"review_candidates": [
|
||||
{
|
||||
"candidate_id": "concept::natural-selection",
|
||||
"finding_codes": ["bridge_concept"],
|
||||
"rationale": "Natural Selection | lane=conflict_resolution | priority=12 | graph=bridge_concept",
|
||||
}
|
||||
],
|
||||
"suggested_next_actions": ["Inspect supporting observations before export."],
|
||||
}
|
||||
|
||||
|
||||
def test_build_notebook_page_buckets_graph_navigation() -> None:
|
||||
page = build_notebook_page_from_groundrecall_bundle(_sample_bundle())
|
||||
|
||||
assert page["page_kind"] == "didactopus_notebook_page"
|
||||
assert page["concept"]["title"] == "Natural Selection"
|
||||
assert page["summary"]["claim_count"] == 2
|
||||
assert page["graph_navigation"]["antecedent_concepts"][0]["title"] == "Variation"
|
||||
assert page["graph_navigation"]["derivative_concepts"][0]["title"] == "Adaptation"
|
||||
assert page["graph_navigation"]["closer_concepts"][0]["title"] == "Common Descent"
|
||||
assert page["supporting_sources"][0]["supporting_observation_count"] == 1
|
||||
assert page["review_context"]["graph_codes"] == ["bridge_concept"]
|
||||
assert page["illustration_opportunities"]
|
||||
|
||||
|
||||
def test_export_notebook_page_writes_json(tmp_path: Path) -> None:
|
||||
bundle_path = tmp_path / "groundrecall_query_bundle.json"
|
||||
out_path = tmp_path / "notebook_page.json"
|
||||
bundle_path.write_text(json.dumps(_sample_bundle()), encoding="utf-8")
|
||||
|
||||
payload = export_notebook_page_from_groundrecall_bundle(bundle_path, out_path)
|
||||
|
||||
assert out_path.exists()
|
||||
assert payload["page_path"].endswith("notebook_page.json")
|
||||
written = json.loads(out_path.read_text(encoding="utf-8"))
|
||||
assert written["concept"]["concept_id"] == "concept::natural-selection"
|
||||
|
||||
|
||||
def test_export_notebook_page_from_groundrecall_store_writes_bundle_and_page(monkeypatch, tmp_path: Path) -> None:
|
||||
captured: dict = {}
|
||||
|
||||
def _fake_export(store_dir, concept_ref, out_dir):
|
||||
out_dir = Path(out_dir)
|
||||
out_dir.mkdir(parents=True, exist_ok=True)
|
||||
bundle_path = out_dir / "groundrecall_query_bundle.json"
|
||||
bundle_path.write_text(json.dumps(_sample_bundle()), encoding="utf-8")
|
||||
captured["store_dir"] = str(store_dir)
|
||||
captured["concept_ref"] = concept_ref
|
||||
captured["out_dir"] = str(out_dir)
|
||||
return {"bundle_path": str(bundle_path), "bundle": _sample_bundle()}
|
||||
|
||||
monkeypatch.setattr("didactopus.notebook_page._load_groundrecall_export", lambda: _fake_export)
|
||||
|
||||
payload = export_notebook_page_from_groundrecall_store(
|
||||
tmp_path / "store",
|
||||
"natural-selection",
|
||||
tmp_path / "out",
|
||||
)
|
||||
|
||||
assert captured["concept_ref"] == "natural-selection"
|
||||
assert (tmp_path / "out" / "groundrecall_query_bundle.json").exists()
|
||||
assert (tmp_path / "out" / "notebook_page.json").exists()
|
||||
assert payload["concept_ref"] == "natural-selection"
|
||||
assert payload["groundrecall_query_bundle_path"].endswith("groundrecall_query_bundle.json")
|
||||
|
|
@ -51,5 +51,8 @@ def test_emit_pack_can_write_groundrecall_query_bundle(tmp_path: Path) -> None:
|
|||
|
||||
pack_yaml = (tmp_path / "pack.yaml").read_text(encoding="utf-8")
|
||||
bundle_payload = (tmp_path / "groundrecall_query_bundle.json").read_text(encoding="utf-8")
|
||||
notebook_payload = (tmp_path / "notebook_page.json").read_text(encoding="utf-8")
|
||||
assert "groundrecall_query_bundle.json" in pack_yaml
|
||||
assert "notebook_page.json" in pack_yaml
|
||||
assert '"bundle_kind": "groundrecall_query_bundle"' in bundle_payload
|
||||
assert '"page_kind": "didactopus_notebook_page"' in notebook_payload
|
||||
|
|
|
|||
Loading…
Reference in New Issue