Add artificial life seeding example

Reframe TalkOrigins as an example workflow
Add topic phrase review export workflow
2026-03-20 08:25:36 -04:00 · 2026-03-20 08:22:44 -04:00 · 2026-03-20 08:14:41 -04:00
12 changed files with 678 additions and 113 deletions
--- a/README.md
+++ b/README.md
@ -56,11 +56,15 @@ The initial repo includes:
 - OAI-PMH repository discovery via `Identify`, `ListSets`, and `ListMetadataFormats` to target harvests more precisely;
 - bibliography bootstrap workflows that can start from a seed `.bib`, a topic phrase, or both;
 - batch bootstrap orchestration from JSON job files containing seed BibTeX paths, topic phrases, or both;
- a TalkOrigins scraper that fixes repeated-author plaintext references, emits per-topic seed BibTeX files, and writes a batch JSON specification;
 - normalized tables for entries, creators, identifiers, and citation relations;
 - full-text-search-ready indexing over title, abstract, and fulltext when SQLite FTS5 is available;
 - tests covering parsing, ingestion, relation storage, and search.

+Example applications live alongside the core package rather than defining it. Current examples include:
+
+- a topic-only bootstrap workflow for `artificial life` in [examples/artificial-life/README.md](./examples/artificial-life/README.md);
+- the TalkOrigins bibliography pipeline under [`citegeist.examples.talkorigins`](./src/citegeist/examples/talkorigins.py) with a usage guide in [examples/talkorigins/README.md](./examples/talkorigins/README.md).
+
 The prioritized execution plan lives in [ROADMAP.md](./ROADMAP.md).

 ## Layout
@ -69,6 +73,7 @@ The prioritized execution plan lives in [ROADMAP.md](./ROADMAP.md).
 citegeist/
  src/citegeist/
    bibtex.py
+    examples/
    storage.py
  tests/
    test_storage.py
@ -125,7 +130,6 @@ PYTHONPATH=src .venv/bin/python -m citegeist --db library.sqlite3 resolve-confli
 PYTHONPATH=src .venv/bin/python -m citegeist --db library.sqlite3 apply-conflict smith2024graphs title
 PYTHONPATH=src .venv/bin/python -m citegeist --db library.sqlite3 bootstrap --seed-bib seed.bib --topic "bayesian nonparametrics"
 PYTHONPATH=src .venv/bin/python -m citegeist --db library.sqlite3 bootstrap --topic "bayesian nonparametrics" --preview --topic-commit-limit 5
-PYTHONPATH=src .venv/bin/python -m citegeist --db library.sqlite3 scrape-talkorigins talkorigins-out --limit-topics 5 --limit-entries-per-topic 20
 PYTHONPATH=src .venv/bin/python -m citegeist extract references.txt --output draft.bib
 PYTHONPATH=src .venv/bin/python -m citegeist --db library.sqlite3 resolve smith2024graphs
 PYTHONPATH=src .venv/bin/python -m citegeist --db library.sqlite3 topics
@ -143,44 +147,14 @@ PYTHONPATH=src .venv/bin/python -m citegeist --db library.sqlite3 export --outpu

 For live-source development, prefer fixture-backed or cache-backed source clients so resolver and expansion work can be exercised repeatedly without re-hitting upstream APIs on every run.

-For large legacy plaintext corpora such as the TalkOrigins bibliography, prefer a two-step workflow:
-
-1. `scrape-talkorigins` to generate cleaned per-topic `seed_bib` files plus a `talkorigins_jobs.json` batch spec.
-2. `bootstrap-batch` on that JSON file when you want to ingest, resolve, and expand from the generated seeds.
-
-The TalkOrigins scrape output now includes:
-
- `seeds/*.bib` per-topic seed BibTeX files for `bootstrap-batch`
- `plaintext/*.txt` per-topic cleaned GSA-style plaintext with repeated authors expanded
- `site/topics/*.html` reconstructed topic pages with hide/show BibTeX blocks
- `talkorigins_full.txt` and `talkorigins_full.bib` aggregate downloads
- `snapshots/*.json` cached topic payloads so reruns can resume without re-fetching already scraped topics
-
-After a full scrape, run:
-
-```bash
-PYTHONPATH=src .venv/bin/python -m citegeist validate-talkorigins talkorigins-out/talkorigins_manifest.json
-PYTHONPATH=src .venv/bin/python -m citegeist duplicates-talkorigins talkorigins-out/talkorigins_manifest.json --limit 20
-PYTHONPATH=src .venv/bin/python -m citegeist duplicates-talkorigins talkorigins-out/talkorigins_manifest.json --limit 20 --preview --weak-only
-PYTHONPATH=src .venv/bin/python -m citegeist suggest-talkorigins-phrases talkorigins-out/talkorigins_manifest.json --output topic-phrases.json
-PYTHONPATH=src .venv/bin/python -m citegeist --db library.sqlite3 stage-topic-phrases topic-phrases.json
-PYTHONPATH=src .venv/bin/python -m citegeist --db library.sqlite3 review-topic-phrase abiogenesis accepted --notes "curated from local corpus"
-PYTHONPATH=src .venv/bin/python -m citegeist --db library.sqlite3 apply-topic-phrases topic-phrases.json
-PYTHONPATH=src .venv/bin/python -m citegeist --db talkorigins.sqlite3 enrich-talkorigins talkorigins-out/talkorigins_manifest.json --limit 20
-PYTHONPATH=src .venv/bin/python -m citegeist --db talkorigins-copy.sqlite3 enrich-talkorigins talkorigins-out/talkorigins_manifest.json --limit 5 --apply --allow-unsafe-search-matches
-PYTHONPATH=src .venv/bin/python -m citegeist --db talkorigins.sqlite3 review-talkorigins talkorigins-out/talkorigins_manifest.json --output talkorigins-review.json
-PYTHONPATH=src .venv/bin/python -m citegeist --db talkorigins.sqlite3 apply-talkorigins-corrections talkorigins-out/talkorigins_manifest.json talkorigins-corrections.json
-```
-
-That report summarizes parse coverage and flags suspicious entry-type / venue combinations for manual cleanup.
-It also reports duplicate clusters across topic seed files so you can gauge how much deduplication pressure to expect before ingestion.
-Use `duplicates-talkorigins` when you want to inspect specific clusters, filter by text, restrict the audit to one topic slug, or preview only weak canonicalization outcomes before importing.
-
-Use `suggest-talkorigins-phrases` to derive candidate stored expansion phrases from the existing TalkOrigins topic corpus itself. The output is deterministic JSON keyed by topic slug, with a suggested phrase plus the extracted keywords that drove it. This is a useful first pass before setting topic phrases in the database or editing generated batch jobs.
+## Example Application

 Use `stage-topic-phrases` to load those suggestions into the database as review items. Staging stores the candidate in `suggested_phrase` and marks the topic `pending` without changing the active `expansion_phrase`.
-Use `review-topic-phrase` to accept or reject one staged suggestion in place. Accepting a suggestion copies it into `expansion_phrase`; rejecting it preserves the review state without changing the live phrase.
+Use `export-topic-phrase-reviews` to write an editable JSON template directly from the database for the currently staged suggestions. That gives you a round-trip path from DB review queue to file edits and back into `review-topic-phrases`.
+Use `review-topic-phrase` to accept or reject one staged suggestion in place. Accepting a suggestion copies it into `expansion_phrase` and clears it from the staged review queue; rejecting it preserves the staged suggestion together with its review state.
+Use `review-topic-phrases` when you want to apply many accept/reject decisions from one JSON file. Each item should carry `slug`, `status`, and optional `phrase` / `review_notes`.
 Use `apply-topic-phrases` when you want a direct patch path instead of the staged review flow. It accepts either the raw suggestion list or an object with a `topics` list, and will apply `suggested_phrase` or `phrase` to matching topic slugs immediately.
+Use `topic-phrase-reviews --phrase-review-status pending` when you want a compact audit view of unresolved staged suggestions, including both the current live phrase and the pending replacement.
 Use `enrich-talkorigins` when you want to target those weak canonical entries for resolver-based metadata upgrades before retrying graph expansion on imported topic slices.
 Use `review-talkorigins` when you want one JSON review artifact that combines weak canonical clusters with dry-run enrichment outcomes for manual cleanup.
 Use `expand-topic` when you already have both a topic phrase and a curated topic seed set in the database: it expands outward from the topic’s existing entries, then only assigns discovered works back to that topic if they clear a topic-relevance threshold. Write-enabled assignment is stricter than preview ranking: a candidate must clear the score threshold and show a non-generic title anchor to the topic phrase, so broad methods papers do not get attached just because their abstracts or related terms overlap. On large noisy topics, prefer `--seed-key` to restrict the run to just the trusted seed entries you want to expand from, and use `--preview` first to inspect discovered candidates and relevance scores before writing anything.
@ -189,6 +163,24 @@ Use `set-topic-phrase` to store a curated expansion phrase on the topic itself.
 Use `topics --phrase-review-status pending` when you want to audit only topics whose staged phrase suggestions still need review.
 `--allow-unsafe-search-matches` exists only for bounded experiments on copied databases when you explicitly want to relax trust to exercise downstream expansion behavior.

+The TalkOrigins corpus pipeline remains in the repository as an example application rather than a core package surface. Use the example-scoped Python namespace:
+
+```python
+from citegeist.examples.talkorigins import TalkOriginsScraper
+```
+
+and the example-scoped CLI commands:
+
+```bash
+PYTHONPATH=src .venv/bin/python -m citegeist example-talkorigins-scrape talkorigins-out --limit-topics 5 --limit-entries-per-topic 20
+PYTHONPATH=src .venv/bin/python -m citegeist example-talkorigins-validate talkorigins-out/talkorigins_manifest.json
+PYTHONPATH=src .venv/bin/python -m citegeist example-talkorigins-duplicates talkorigins-out/talkorigins_manifest.json --limit 20 --preview --weak-only
+```
+
+The older `scrape-talkorigins`-style command names remain available as compatibility aliases. The full example workflow and reconstruction notes live in [examples/talkorigins/README.md](./examples/talkorigins/README.md).
+
+For a smaller example that starts from a topic phrase alone, see [examples/artificial-life/README.md](./examples/artificial-life/README.md).
+
 Correction files are simple JSON:

 ```json
@ -210,15 +202,6 @@ Correction files are simple JSON:

 `fields` values overwrite the canonical entry for that duplicate-cluster key. Set a field to `null` to remove it.

-To import the reconstructed corpus into SQLite while collapsing duplicate works across topics into canonical entries:
-
-```bash
-PYTHONPATH=src .venv/bin/python -m citegeist --db talkorigins.sqlite3 ingest-talkorigins talkorigins-out/talkorigins_manifest.json
-```
-
-That import preserves many-to-many topic membership through the `topics` and `entry_topics` tables.
-After import, use `topics`, `topic-entries`, `search --topic`, and `export-topic` to inspect or export topic slices from the consolidated database.
-
 Live-source workflow:

 ```bash
--- a/examples/artificial-life/README.md
+++ b/examples/artificial-life/README.md
@ -0,0 +1,100 @@
+# Artificial Life Topic-Seeding Example
+
+This example shows the smallest useful `citegeist` workflow that starts from a topic phrase alone.
+
+The seed phrase is:
+
+```text
+artificial life
+```
+
+## What It Demonstrates
+
+- topic-only bootstrap without a seed `.bib`;
+- previewing ranked candidate seed entries before writing anything;
+- storing a curated topic slug, topic name, and expansion phrase in the database;
+- running later topic-aware expansion from that stored phrase.
+
+## Preview First
+
+Use a preview run to inspect the best candidate seed entries without changing the database:
+
+```bash
+PYTHONPATH=src .venv/bin/python -m citegeist --db library.sqlite3 \
+  bootstrap \
+  --topic "artificial life" \
+  --topic-slug artificial-life \
+  --topic-name "Artificial life" \
+  --store-topic-phrase "artificial life alife artificial organisms complex systems evolution simulation" \
+  --topic-limit 10 \
+  --topic-commit-limit 5 \
+  --preview
+```
+
+That returns ranked candidates gathered through the configured resolver/search stack.
+
+## Commit The Topic Seeds
+
+Once the preview looks reasonable, run the same bootstrap without `--preview`:
+
+```bash
+PYTHONPATH=src .venv/bin/python -m citegeist --db library.sqlite3 \
+  bootstrap \
+  --topic "artificial life" \
+  --topic-slug artificial-life \
+  --topic-name "Artificial life" \
+  --store-topic-phrase "artificial life alife artificial organisms complex systems evolution simulation" \
+  --topic-limit 10 \
+  --topic-commit-limit 5
+```
+
+That does three things:
+
+1. finds topic-relevant seed entries;
+2. stores them in the bibliography database;
+3. creates or updates the `artificial-life` topic row with the curated expansion phrase.
+
+## Inspect The Result
+
+```bash
+PYTHONPATH=src .venv/bin/python -m citegeist --db library.sqlite3 topics
+PYTHONPATH=src .venv/bin/python -m citegeist --db library.sqlite3 topic-entries artificial-life
+```
+
+If you want to adjust the stored phrase later:
+
+```bash
+PYTHONPATH=src .venv/bin/python -m citegeist --db library.sqlite3 \
+  set-topic-phrase artificial-life "artificial life alife artificial organisms autonomous agents evolution simulation"
+```
+
+## Optional Batch Form
+
+The same topic-only seed can be expressed as a batch job:
+
+```json
+[
+  {
+    "name": "artificial-life-topic-seed",
+    "topic": "artificial life",
+    "topic_slug": "artificial-life",
+    "topic_name": "Artificial life",
+    "topic_phrase": "artificial life alife artificial organisms complex systems evolution simulation",
+    "topic_limit": 10,
+    "topic_commit_limit": 5,
+    "expand": false
+  }
+]
+```
+
+Run it with:
+
+```bash
+PYTHONPATH=src .venv/bin/python -m citegeist --db library.sqlite3 bootstrap-batch artificial-life.json
+```
+
+## Notes
+
+- This example is intentionally generic and corpus-independent.
+- The exact candidate set depends on live source availability and resolver behavior.
+- Prefer preview mode before committing topic-only seeds, because topic phrases are noisier than curated seed `.bib` inputs.
--- a/examples/talkorigins/README.md
+++ b/examples/talkorigins/README.md
@ -0,0 +1,52 @@
+# TalkOrigins Example
+
+This example shows how to use `citegeist` on a large legacy plaintext bibliography corpus.
+
+It is intentionally positioned as an application of the core library, not as the main product surface.
+
+## What It Demonstrates
+
+- scraping a legacy bibliography index;
+- normalizing repeated-author plaintext references;
+- converting topic pages into per-topic seed BibTeX;
+- generating batch bootstrap specs for downstream ingest and expansion;
+- reconstructing cleaned plaintext and BibTeX topic pages for review;
+- validating parse quality, duplicate clusters, and weak canonical entries;
+- curating topic phrases and correction files before broader enrichment.
+
+The example implementation lives under the Python namespace:
+
+```python
+from citegeist.examples.talkorigins import TalkOriginsScraper
+```
+
+The preferred CLI commands are example-scoped:
+
+```bash
+PYTHONPATH=src .venv/bin/python -m citegeist example-talkorigins-scrape talkorigins-out --limit-topics 5 --limit-entries-per-topic 20
+PYTHONPATH=src .venv/bin/python -m citegeist example-talkorigins-validate talkorigins-out/talkorigins_manifest.json
+PYTHONPATH=src .venv/bin/python -m citegeist example-talkorigins-duplicates talkorigins-out/talkorigins_manifest.json --limit 20 --preview --weak-only
+PYTHONPATH=src .venv/bin/python -m citegeist example-talkorigins-suggest-phrases talkorigins-out/talkorigins_manifest.json --output topic-phrases.json
+PYTHONPATH=src .venv/bin/python -m citegeist --db library.sqlite3 stage-topic-phrases topic-phrases.json
+PYTHONPATH=src .venv/bin/python -m citegeist --db library.sqlite3 export-topic-phrase-reviews --output topic-phrase-review.json
+PYTHONPATH=src .venv/bin/python -m citegeist --db library.sqlite3 review-topic-phrases topic-phrase-review.json
+PYTHONPATH=src .venv/bin/python -m citegeist --db talkorigins.sqlite3 example-talkorigins-enrich talkorigins-out/talkorigins_manifest.json --limit 20
+PYTHONPATH=src .venv/bin/python -m citegeist --db talkorigins.sqlite3 example-talkorigins-review talkorigins-out/talkorigins_manifest.json --output talkorigins-review.json
+PYTHONPATH=src .venv/bin/python -m citegeist --db talkorigins.sqlite3 example-talkorigins-apply-corrections talkorigins-out/talkorigins_manifest.json talkorigins-corrections.json
+PYTHONPATH=src .venv/bin/python -m citegeist --db talkorigins.sqlite3 example-talkorigins-ingest talkorigins-out/talkorigins_manifest.json
+```
+
+## Output Artifacts
+
+The example scrape writes:
+
+- `seeds/*.bib` per-topic seed BibTeX files;
+- `plaintext/*.txt` cleaned GSA-style plaintext with repeated authors expanded;
+- `site/topics/*.html` reconstructed topic pages with hide/show BibTeX blocks;
+- `talkorigins_full.txt` and `talkorigins_full.bib` aggregate downloads;
+- `snapshots/*.json` cached topic payloads so reruns can resume.
+
+## Notes
+
+- The example-specific CLI names have compatibility aliases matching the older `scrape-talkorigins` style commands.
+- Topic phrase staging, review, and export commands are generic `citegeist` functionality and are not specific to TalkOrigins.
--- a/src/citegeist/init.py
+++ b/src/citegeist/init.py
@ -7,18 +7,6 @@ from .harvest import OaiMetadataFormat, OaiPmhHarvester, OaiSet
 from .resolve import MetadataResolver, merge_entries, merge_entries_with_conflicts
 from .sources import SourceClient
 from .storage import BibliographyStore
-from .talkorigins import (
-    TalkOriginsBatchExport,
-    TalkOriginsDuplicateCluster,
-    TalkOriginsEnrichmentResult,
-    TalkOriginsIngestReport,
-    TalkOriginsReviewExport,
-    TalkOriginsScraper,
-    TalkOriginsSeedSet,
-    TalkOriginsTopicPhraseSuggestion,
-    TalkOriginsTopic,
-    TalkOriginsValidationReport,
-)

 __all__ = [
    "BibEntry",
@ -34,16 +22,6 @@ __all__ = [
    "OaiMetadataFormat",
    "OaiSet",
    "SourceClient",
-    "TalkOriginsBatchExport",
-    "TalkOriginsDuplicateCluster",
-    "TalkOriginsEnrichmentResult",
-    "TalkOriginsIngestReport",
-    "TalkOriginsReviewExport",
-    "TalkOriginsScraper",
-    "TalkOriginsSeedSet",
-    "TalkOriginsTopicPhraseSuggestion",
-    "TalkOriginsTopic",
-    "TalkOriginsValidationReport",
    "extract_references",
    "load_batch_jobs",
    "merge_entries",
--- a/src/citegeist/cli.py
+++ b/src/citegeist/cli.py
@ -9,12 +9,12 @@ from pathlib import Path
 from .batch import BatchBootstrapRunner, load_batch_jobs
 from .bibtex import parse_bibtex, render_bibtex
 from .bootstrap import Bootstrapper
+from .examples.talkorigins import TalkOriginsScraper
 from .expand import CrossrefExpander, OpenAlexExpander, TopicExpander
 from .extract import extract_references
 from .harvest import OaiPmhHarvester
 from .resolve import MetadataResolver, merge_entries_with_conflicts
 from .storage import BibliographyStore
-from .talkorigins import TalkOriginsScraper


 def build_parser() -> argparse.ArgumentParser:
@ -205,8 +205,9 @@ def build_parser() -> argparse.ArgumentParser:
    batch_parser.add_argument("input", help="Path to batch JSON file")

    talkorigins_parser = subparsers.add_parser(
-        "scrape-talkorigins",
-        help="Scrape TalkOrigins into per-topic seed BibTeX files and a bootstrap-batch JSON file",
+        "example-talkorigins-scrape",
+        aliases=["scrape-talkorigins"],
+        help="Example workflow: scrape TalkOrigins into per-topic seed BibTeX files and a bootstrap-batch JSON file",
    )
    talkorigins_parser.add_argument(
        "output_dir",
@ -257,14 +258,16 @@ def build_parser() -> argparse.ArgumentParser:
    talkorigins_parser.add_argument("--status", default="draft", help="Review status for generated seed jobs")

    validate_talkorigins_parser = subparsers.add_parser(
-        "validate-talkorigins",
-        help="Validate a generated TalkOrigins manifest and report parse coverage and suspicious entries",
+        "example-talkorigins-validate",
+        aliases=["validate-talkorigins"],
+        help="Example workflow: validate a generated TalkOrigins manifest and report parse coverage and suspicious entries",
    )
    validate_talkorigins_parser.add_argument("manifest", help="Path to talkorigins_manifest.json")

    suggest_talkorigins_parser = subparsers.add_parser(
-        "suggest-talkorigins-phrases",
-        help="Suggest stored topic expansion phrases from a TalkOrigins manifest",
+        "example-talkorigins-suggest-phrases",
+        aliases=["suggest-talkorigins-phrases"],
+        help="Example workflow: suggest stored topic expansion phrases from a TalkOrigins manifest",
    )
    suggest_talkorigins_parser.add_argument("manifest", help="Path to talkorigins_manifest.json")
    suggest_talkorigins_parser.add_argument("--topic", help="Optional topic slug to restrict suggestions")
@ -298,9 +301,16 @@ def build_parser() -> argparse.ArgumentParser:
        help="Optional expansion phrase override to apply with the review decision",
    )

+    review_topic_phrases_parser = subparsers.add_parser(
+        "review-topic-phrases",
+        help="Apply topic phrase review decisions in bulk from JSON",
+    )
+    review_topic_phrases_parser.add_argument("input", help="Path to JSON file containing topic phrase review records")
+
    duplicates_talkorigins_parser = subparsers.add_parser(
-        "duplicates-talkorigins",
-        help="Inspect duplicate clusters in a generated TalkOrigins manifest",
+        "example-talkorigins-duplicates",
+        aliases=["duplicates-talkorigins"],
+        help="Example workflow: inspect duplicate clusters in a generated TalkOrigins manifest",
    )
    duplicates_talkorigins_parser.add_argument("manifest", help="Path to talkorigins_manifest.json")
    duplicates_talkorigins_parser.add_argument("--limit", type=int, default=20, help="Maximum clusters to show")
@ -324,8 +334,9 @@ def build_parser() -> argparse.ArgumentParser:
    )

    ingest_talkorigins_parser = subparsers.add_parser(
-        "ingest-talkorigins",
-        help="Ingest a TalkOrigins manifest into the database with duplicate consolidation and topic membership",
+        "example-talkorigins-ingest",
+        aliases=["ingest-talkorigins"],
+        help="Example workflow: ingest a TalkOrigins manifest into the database with duplicate consolidation and topic membership",
    )
    ingest_talkorigins_parser.add_argument("manifest", help="Path to talkorigins_manifest.json")
    ingest_talkorigins_parser.add_argument("--status", default="draft", help="Review status for imported entries")
@ -336,8 +347,9 @@ def build_parser() -> argparse.ArgumentParser:
    )

    enrich_talkorigins_parser = subparsers.add_parser(
-        "enrich-talkorigins",
-        help="Attempt metadata enrichment for weak TalkOrigins canonical entries",
+        "example-talkorigins-enrich",
+        aliases=["enrich-talkorigins"],
+        help="Example workflow: attempt metadata enrichment for weak TalkOrigins canonical entries",
    )
    enrich_talkorigins_parser.add_argument("manifest", help="Path to talkorigins_manifest.json")
    enrich_talkorigins_parser.add_argument("--limit", type=int, default=20, help="Maximum weak clusters to inspect")
@ -366,8 +378,9 @@ def build_parser() -> argparse.ArgumentParser:
    )

    review_talkorigins_parser = subparsers.add_parser(
-        "review-talkorigins",
-        help="Export weak TalkOrigins clusters plus dry-run enrichment outcomes for manual review",
+        "example-talkorigins-review",
+        aliases=["review-talkorigins"],
+        help="Example workflow: export weak TalkOrigins clusters plus dry-run enrichment outcomes for manual review",
    )
    review_talkorigins_parser.add_argument("manifest", help="Path to talkorigins_manifest.json")
    review_talkorigins_parser.add_argument("--limit", type=int, default=20, help="Maximum weak clusters to export")
@ -382,8 +395,9 @@ def build_parser() -> argparse.ArgumentParser:
    review_talkorigins_parser.add_argument("--output", help="Write review export JSON to a file instead of stdout")

    apply_review_talkorigins_parser = subparsers.add_parser(
-        "apply-talkorigins-corrections",
-        help="Apply curated TalkOrigins review corrections to the consolidated database",
+        "example-talkorigins-apply-corrections",
+        aliases=["apply-talkorigins-corrections"],
+        help="Example workflow: apply curated TalkOrigins review corrections to the consolidated database",
    )
    apply_review_talkorigins_parser.add_argument("manifest", help="Path to talkorigins_manifest.json")
    apply_review_talkorigins_parser.add_argument("corrections", help="Path to corrections JSON")
@ -401,6 +415,33 @@ def build_parser() -> argparse.ArgumentParser:
        help="Restrict topics to one stored phrase review state",
    )

+    topic_phrase_reviews_parser = subparsers.add_parser(
+        "topic-phrase-reviews",
+        help="List staged topic phrase suggestions and their review state",
+    )
+    topic_phrase_reviews_parser.add_argument("--limit", type=int, default=100, help="Maximum reviews to list")
+    topic_phrase_reviews_parser.add_argument(
+        "--phrase-review-status",
+        choices=["unreviewed", "pending", "accepted", "rejected"],
+        help="Restrict results to one stored phrase review state",
+    )
+
+    export_topic_phrase_reviews_parser = subparsers.add_parser(
+        "export-topic-phrase-reviews",
+        help="Export an editable JSON review template for staged topic phrase suggestions",
+    )
+    export_topic_phrase_reviews_parser.add_argument("--limit", type=int, default=100, help="Maximum reviews to export")
+    export_topic_phrase_reviews_parser.add_argument(
+        "--phrase-review-status",
+        choices=["unreviewed", "pending", "accepted", "rejected"],
+        default="pending",
+        help="Restrict exported reviews to one stored phrase review state",
+    )
+    export_topic_phrase_reviews_parser.add_argument(
+        "--output",
+        help="Write the review template JSON to a file instead of stdout",
+    )
+
    topic_entries_parser = subparsers.add_parser(
        "topic-entries",
        help="List entries assigned to one topic",
@ -497,7 +538,7 @@ def main(argv: list[str] | None = None) -> int:
            )
        if args.command == "bootstrap-batch":
            return _run_bootstrap_batch(store, Path(args.input))
-        if args.command == "scrape-talkorigins":
+        if args.command in {"example-talkorigins-scrape", "scrape-talkorigins"}:
            return _run_scrape_talkorigins(
                store,
                args.base_url,
@ -512,9 +553,9 @@ def main(argv: list[str] | None = None) -> int:
                args.topic_commit_limit,
                args.status,
            )
-        if args.command == "validate-talkorigins":
+        if args.command in {"example-talkorigins-validate", "validate-talkorigins"}:
            return _run_validate_talkorigins(Path(args.manifest))
-        if args.command == "suggest-talkorigins-phrases":
+        if args.command in {"example-talkorigins-suggest-phrases", "suggest-talkorigins-phrases"}:
            return _run_suggest_talkorigins_phrases(Path(args.manifest), args.topic, args.limit, args.output)
        if args.command == "apply-topic-phrases":
            return _run_apply_topic_phrases(store, Path(args.input))
@ -522,7 +563,9 @@ def main(argv: list[str] | None = None) -> int:
            return _run_stage_topic_phrases(store, Path(args.input))
        if args.command == "review-topic-phrase":
            return _run_review_topic_phrase(store, args.topic_slug, args.status, args.notes, args.phrase)
-        if args.command == "duplicates-talkorigins":
+        if args.command == "review-topic-phrases":
+            return _run_review_topic_phrases(store, Path(args.input))
+        if args.command in {"example-talkorigins-duplicates", "duplicates-talkorigins"}:
            return _run_duplicates_talkorigins(
                Path(args.manifest),
                args.limit,
@ -532,9 +575,9 @@ def main(argv: list[str] | None = None) -> int:
                args.preview,
                args.weak_only,
            )
-        if args.command == "ingest-talkorigins":
+        if args.command in {"example-talkorigins-ingest", "ingest-talkorigins"}:
            return _run_ingest_talkorigins(store, Path(args.manifest), args.status, not args.no_dedupe)
-        if args.command == "enrich-talkorigins":
+        if args.command in {"example-talkorigins-enrich", "enrich-talkorigins"}:
            return _run_enrich_talkorigins(
                store,
                Path(args.manifest),
@ -546,7 +589,7 @@ def main(argv: list[str] | None = None) -> int:
                args.status,
                args.allow_unsafe_search_matches,
            )
-        if args.command == "review-talkorigins":
+        if args.command in {"example-talkorigins-review", "review-talkorigins"}:
            return _run_review_talkorigins(
                store,
                Path(args.manifest),
@ -556,7 +599,7 @@ def main(argv: list[str] | None = None) -> int:
                args.topic,
                args.output,
            )
-        if args.command == "apply-talkorigins-corrections":
+        if args.command in {"example-talkorigins-apply-corrections", "apply-talkorigins-corrections"}:
            return _run_apply_talkorigins_corrections(
                store,
                Path(args.manifest),
@ -565,6 +608,10 @@ def main(argv: list[str] | None = None) -> int:
            )
        if args.command == "topics":
            return _run_topics(store, args.limit, args.phrase_review_status)
+        if args.command == "topic-phrase-reviews":
+            return _run_topic_phrase_reviews(store, args.limit, args.phrase_review_status)
+        if args.command == "export-topic-phrase-reviews":
+            return _run_export_topic_phrase_reviews(store, args.limit, args.phrase_review_status, args.output)
        if args.command == "topic-entries":
            return _run_topic_entries(store, args.topic_slug, args.limit)
        if args.command == "export-topic":
@ -1056,6 +1103,51 @@ def _run_review_topic_phrase(
    return 0


+def _run_review_topic_phrases(store: BibliographyStore, input_path: Path) -> int:
+    payload = json.loads(input_path.read_text(encoding="utf-8"))
+    if isinstance(payload, dict):
+        items = payload.get("topics", payload.get("items", []))
+    else:
+        items = payload
+    if not isinstance(items, list):
+        print("Topic phrase review JSON must be a list or an object with a 'topics' or 'items' list", file=sys.stderr)
+        return 1
+
+    results: list[dict[str, object]] = []
+    exit_code = 0
+    for item in items:
+        if not isinstance(item, dict):
+            continue
+        slug = str(item.get("slug") or "")
+        status = str(item.get("status") or item.get("phrase_review_status") or "")
+        notes = item.get("review_notes")
+        phrase = item.get("phrase", item.get("expansion_phrase"))
+        if not slug or status not in {"accepted", "rejected"}:
+            continue
+        if notes is not None:
+            notes = str(notes)
+        if phrase is not None:
+            phrase = str(phrase)
+        reviewed = store.review_topic_phrase_suggestion(
+            slug,
+            review_status=status,
+            review_notes=notes,
+            applied_phrase=phrase,
+        )
+        if not reviewed:
+            exit_code = 1
+        results.append(
+            {
+                "slug": slug,
+                "phrase_review_status": status,
+                "expansion_phrase": phrase,
+                "reviewed": reviewed,
+            }
+        )
+    print(json.dumps(results, indent=2))
+    return exit_code
+
+
 def _run_duplicates_talkorigins(
    manifest_path: Path,
    limit: int,
@ -1171,6 +1263,39 @@ def _run_topics(store: BibliographyStore, limit: int, phrase_review_status: str
    return 0


+def _run_topic_phrase_reviews(store: BibliographyStore, limit: int, phrase_review_status: str | None) -> int:
+    print(json.dumps(store.list_topic_phrase_reviews(limit=limit, phrase_review_status=phrase_review_status), indent=2))
+    return 0
+
+
+def _run_export_topic_phrase_reviews(
+    store: BibliographyStore,
+    limit: int,
+    phrase_review_status: str | None,
+    output: str | None,
+) -> int:
+    items = store.list_topic_phrase_reviews(limit=limit, phrase_review_status=phrase_review_status)
+    payload = [
+        {
+            "slug": item["slug"],
+            "topic": item["name"],
+            "current_expansion_phrase": item.get("expansion_phrase"),
+            "suggested_phrase": item.get("suggested_phrase"),
+            "current_status": item.get("phrase_review_status"),
+            "review_notes": item.get("phrase_review_notes"),
+            "status": "",
+            "phrase": item.get("suggested_phrase"),
+        }
+        for item in items
+    ]
+    rendered = json.dumps(payload, indent=2)
+    if output:
+        Path(output).write_text(rendered + "\n", encoding="utf-8")
+    else:
+        print(rendered)
+    return 0
+
+
 def _run_topic_entries(store: BibliographyStore, topic_slug: str, limit: int) -> int:
    topic = store.get_topic(topic_slug)
    if topic is None:
--- a/src/citegeist/examples/init.py
+++ b/src/citegeist/examples/init.py
@ -0,0 +1,29 @@
+from .talkorigins import (
+    TalkOriginsBatchExport,
+    TalkOriginsCorrectionResult,
+    TalkOriginsDuplicateCluster,
+    TalkOriginsEnrichmentResult,
+    TalkOriginsIngestReport,
+    TalkOriginsReviewExport,
+    TalkOriginsScraper,
+    TalkOriginsSeedSet,
+    TalkOriginsTopic,
+    TalkOriginsTopicPhraseSuggestion,
+    TalkOriginsValidationReport,
+    normalize_topic_entries,
+)
+
+__all__ = [
+    "TalkOriginsBatchExport",
+    "TalkOriginsCorrectionResult",
+    "TalkOriginsDuplicateCluster",
+    "TalkOriginsEnrichmentResult",
+    "TalkOriginsIngestReport",
+    "TalkOriginsReviewExport",
+    "TalkOriginsScraper",
+    "TalkOriginsSeedSet",
+    "TalkOriginsTopic",
+    "TalkOriginsTopicPhraseSuggestion",
+    "TalkOriginsValidationReport",
+    "normalize_topic_entries",
+]
--- a/src/citegeist/examples/talkorigins.py
+++ b/src/citegeist/examples/talkorigins.py
@ -0,0 +1,29 @@
+from ..talkorigins import (
+    TalkOriginsBatchExport,
+    TalkOriginsCorrectionResult,
+    TalkOriginsDuplicateCluster,
+    TalkOriginsEnrichmentResult,
+    TalkOriginsIngestReport,
+    TalkOriginsReviewExport,
+    TalkOriginsScraper,
+    TalkOriginsSeedSet,
+    TalkOriginsTopic,
+    TalkOriginsTopicPhraseSuggestion,
+    TalkOriginsValidationReport,
+    normalize_topic_entries,
+)
+
+__all__ = [
+    "TalkOriginsBatchExport",
+    "TalkOriginsCorrectionResult",
+    "TalkOriginsDuplicateCluster",
+    "TalkOriginsEnrichmentResult",
+    "TalkOriginsIngestReport",
+    "TalkOriginsReviewExport",
+    "TalkOriginsScraper",
+    "TalkOriginsSeedSet",
+    "TalkOriginsTopic",
+    "TalkOriginsTopicPhraseSuggestion",
+    "TalkOriginsValidationReport",
+    "normalize_topic_entries",
+]
--- a/src/citegeist/storage.py
+++ b/src/citegeist/storage.py
@ -603,6 +603,43 @@ class BibliographyStore:
        ).fetchone()
        return dict(row) if row else None

+    def list_topic_phrase_reviews(
+        self,
+        limit: int = 100,
+        phrase_review_status: str | None = None,
+    ) -> list[dict[str, object]]:
+        where = "WHERE t.suggested_phrase IS NOT NULL"
+        params: list[object] = []
+        if phrase_review_status is not None:
+            where += " AND t.phrase_review_status = ?"
+            params.append(phrase_review_status)
+        params.append(limit)
+        rows = self.connection.execute(
+            f"""
+            SELECT t.slug, t.name, t.expansion_phrase, t.suggested_phrase,
+                   t.phrase_review_status, t.phrase_review_notes,
+                   COUNT(et.entry_id) AS entry_count
+            FROM topics t
+            LEFT JOIN entry_topics et ON et.topic_id = t.id
+            {where}
+            GROUP BY t.id, t.slug, t.name, t.expansion_phrase, t.suggested_phrase,
+                     t.phrase_review_status, t.phrase_review_notes
+            ORDER BY
+                CASE t.phrase_review_status
+                    WHEN 'pending' THEN 0
+                    WHEN 'unreviewed' THEN 1
+                    WHEN 'rejected' THEN 2
+                    WHEN 'accepted' THEN 3
+                    ELSE 4
+                END,
+                t.name,
+                t.slug
+            LIMIT ?
+            """,
+            params,
+        ).fetchall()
+        return [dict(row) for row in rows]
+
    def set_topic_expansion_phrase(self, slug: str, expansion_phrase: str | None) -> bool:
        row = self.connection.execute(
            """
@ -651,8 +688,10 @@ class BibliographyStore:

        suggested_phrase = topic.get("suggested_phrase")
        expansion_phrase = topic.get("expansion_phrase")
+        stored_suggested_phrase = suggested_phrase
        if review_status == "accepted":
            expansion_phrase = applied_phrase if applied_phrase is not None else suggested_phrase
+            stored_suggested_phrase = None
        elif applied_phrase is not None:
            expansion_phrase = applied_phrase

@ -660,13 +699,14 @@ class BibliographyStore:
            """
            UPDATE topics
            SET expansion_phrase = ?,
+                suggested_phrase = ?,
                phrase_review_status = ?,
                phrase_review_notes = ?,
                updated_at = CURRENT_TIMESTAMP
            WHERE slug = ?
            RETURNING id
            """,
-            (expansion_phrase, review_status, review_notes, slug),
+            (expansion_phrase, stored_suggested_phrase, review_status, review_notes, slug),
        ).fetchone()
        self.connection.commit()
        return row is not None
--- a/src/citegeist/talkorigins.py
+++ b/src/citegeist/talkorigins.py
@ -1,3 +1,10 @@
+"""TalkOrigins example implementation.
+
+This module backs the example-facing namespace at ``citegeist.examples.talkorigins``.
+New code should prefer importing from the examples namespace rather than treating
+TalkOrigins support as part of the core top-level package surface.
+"""
+
 from __future__ import annotations

 from collections import Counter
--- a/tests/test_cli.py
+++ b/tests/test_cli.py
@ -7,6 +7,16 @@ from pathlib import Path
 from unittest.mock import patch

 from citegeist.cli import main
+from citegeist.examples.talkorigins import (
+    TalkOriginsBatchExport,
+    TalkOriginsCorrectionResult,
+    TalkOriginsDuplicateCluster,
+    TalkOriginsEnrichmentResult,
+    TalkOriginsIngestReport,
+    TalkOriginsReviewExport,
+    TalkOriginsTopicPhraseSuggestion,
+    TalkOriginsValidationReport,
+)


 SAMPLE_BIB = """
@ -313,7 +323,7 @@ def test_cli_scrape_talkorigins_accepts_output_dir(tmp_path):

    database = tmp_path / "library.sqlite3"
    with patch("citegeist.cli.TalkOriginsScraper.scrape_to_directory") as mocked_scrape:
-        mocked_scrape.return_value = __import__("citegeist").TalkOriginsBatchExport(
+        mocked_scrape.return_value = TalkOriginsBatchExport(
            base_url="https://www.talkorigins.org/origins/biblio/",
            output_dir=str(tmp_path),
            topic_count=1,
@ -326,7 +336,7 @@ def test_cli_scrape_talkorigins_accepts_output_dir(tmp_path):
            [
                "--db",
                str(database),
-                "scrape-talkorigins",
+                "example-talkorigins-scrape",
                str(tmp_path / "talkorigins-out"),
                "--limit-topics",
                "3",
@ -346,7 +356,7 @@ def test_cli_validate_talkorigins_accepts_manifest(tmp_path):
    manifest = tmp_path / "talkorigins_manifest.json"
    manifest.write_text("{}", encoding="utf-8")
    with patch("citegeist.cli.TalkOriginsScraper.validate_export") as mocked_validate:
-        mocked_validate.return_value = __import__("citegeist").TalkOriginsValidationReport(
+        mocked_validate.return_value = TalkOriginsValidationReport(
            manifest_path=str(manifest),
            topic_count=1,
            entry_count=2,
@ -360,7 +370,7 @@ def test_cli_validate_talkorigins_accepts_manifest(tmp_path):
            duplicate_entry_count=0,
            duplicate_examples=[],
        )
-        exit_code = main(["validate-talkorigins", str(manifest)])
+        exit_code = main(["example-talkorigins-validate", str(manifest)])

    assert exit_code == 0

@ -373,7 +383,7 @@ def test_cli_suggest_talkorigins_phrases_writes_output(tmp_path):
    output = tmp_path / "phrases.json"
    with patch("citegeist.cli.TalkOriginsScraper.suggest_topic_phrases") as mocked_suggest:
        mocked_suggest.return_value = [
-            __import__("citegeist", fromlist=["TalkOriginsTopicPhraseSuggestion"]).TalkOriginsTopicPhraseSuggestion(
+            TalkOriginsTopicPhraseSuggestion(
                slug="abiogenesis",
                topic="Abiogenesis",
                entry_count=2,
@ -385,7 +395,7 @@ def test_cli_suggest_talkorigins_phrases_writes_output(tmp_path):
        ]
        exit_code = main(
            [
-                "suggest-talkorigins-phrases",
+                "example-talkorigins-suggest-phrases",
                str(manifest),
                "--topic",
                "abiogenesis",
@ -406,7 +416,7 @@ def test_cli_duplicates_talkorigins_accepts_manifest(tmp_path):
    manifest.write_text("{}", encoding="utf-8")
    with patch("citegeist.cli.TalkOriginsScraper.inspect_duplicate_clusters") as mocked_duplicates:
        mocked_duplicates.return_value = [
-            __import__("citegeist.talkorigins", fromlist=["TalkOriginsDuplicateCluster"]).TalkOriginsDuplicateCluster(
+            TalkOriginsDuplicateCluster(
                key="smith|1999|duplicate paper",
                count=2,
                items=[
@ -431,7 +441,7 @@ def test_cli_duplicates_talkorigins_accepts_manifest(tmp_path):
        ]
        exit_code = main(
            [
-                "duplicates-talkorigins",
+                "example-talkorigins-duplicates",
                str(manifest),
                "--topic",
                "abiogenesis",
@ -452,7 +462,7 @@ def test_cli_ingest_talkorigins_accepts_manifest(tmp_path):
    manifest = tmp_path / "talkorigins_manifest.json"
    manifest.write_text("{}", encoding="utf-8")
    with patch("citegeist.cli.TalkOriginsScraper.ingest_export") as mocked_ingest:
-        mocked_ingest.return_value = __import__("citegeist").TalkOriginsIngestReport(
+        mocked_ingest.return_value = TalkOriginsIngestReport(
            manifest_path=str(manifest),
            topic_count=1,
            raw_entry_count=2,
@ -461,7 +471,7 @@ def test_cli_ingest_talkorigins_accepts_manifest(tmp_path):
            duplicate_entry_count=2,
            canonicalized_count=1,
        )
-        exit_code = main(["--db", str(database), "ingest-talkorigins", str(manifest)])
+        exit_code = main(["--db", str(database), "example-talkorigins-ingest", str(manifest)])

    assert exit_code == 0

@ -474,7 +484,7 @@ def test_cli_enrich_talkorigins_accepts_manifest(tmp_path):
    manifest.write_text("{}", encoding="utf-8")
    with patch("citegeist.cli.TalkOriginsScraper.enrich_weak_canonicals") as mocked_enrich:
        mocked_enrich.return_value = [
-            __import__("citegeist.talkorigins", fromlist=["TalkOriginsEnrichmentResult"]).TalkOriginsEnrichmentResult(
+            TalkOriginsEnrichmentResult(
                key="smith|1999|duplicate paper",
                citation_key="dup1",
                weak_reasons_before=["missing:doi"],
@ -490,7 +500,7 @@ def test_cli_enrich_talkorigins_accepts_manifest(tmp_path):
            [
                "--db",
                str(database),
-                "enrich-talkorigins",
+                "example-talkorigins-enrich",
                str(manifest),
                "--limit",
                "5",
@ -510,7 +520,7 @@ def test_cli_review_talkorigins_writes_output(tmp_path):
    manifest.write_text("{}", encoding="utf-8")
    output = tmp_path / "review.json"
    with patch("citegeist.cli.TalkOriginsScraper.build_review_export") as mocked_review:
-        mocked_review.return_value = __import__("citegeist.talkorigins", fromlist=["TalkOriginsReviewExport"]).TalkOriginsReviewExport(
+        mocked_review.return_value = TalkOriginsReviewExport(
            manifest_path=str(manifest),
            item_count=1,
            items=[{"key": "smith|1999|duplicate paper", "canonical": {}, "enrichment": {}}],
@ -519,7 +529,7 @@ def test_cli_review_talkorigins_writes_output(tmp_path):
            [
                "--db",
                str(database),
-                "review-talkorigins",
+                "example-talkorigins-review",
                str(manifest),
                "--output",
                str(output),
@ -540,7 +550,7 @@ def test_cli_apply_talkorigins_corrections_accepts_files(tmp_path):
    corrections.write_text('{"corrections": []}', encoding="utf-8")
    with patch("citegeist.cli.TalkOriginsScraper.apply_review_corrections") as mocked_apply:
        mocked_apply.return_value = [
-            __import__("citegeist.talkorigins", fromlist=["TalkOriginsCorrectionResult"]).TalkOriginsCorrectionResult(
+            TalkOriginsCorrectionResult(
                key="smith|1999|duplicate paper",
                citation_key="dup1",
                applied=True,
@ -551,7 +561,7 @@ def test_cli_apply_talkorigins_corrections_accepts_files(tmp_path):
            [
                "--db",
                str(database),
-                "apply-talkorigins-corrections",
+                "example-talkorigins-apply-corrections",
                str(manifest),
                str(corrections),
            ]
@ -797,7 +807,7 @@ def test_cli_can_review_topic_phrase(tmp_path: Path):
    )
    assert result.returncode == 0
    payload = json.loads(result.stdout)
-    assert payload["suggested_phrase"] == "graph networks biology"
+    assert payload["suggested_phrase"] is None
    assert payload["expansion_phrase"] == "graph networks biology"
    assert payload["phrase_review_status"] == "accepted"
    assert payload["phrase_review_notes"] == "curated and approved"
@ -844,6 +854,172 @@ def test_cli_topics_can_filter_by_phrase_review_status(tmp_path: Path):
    assert [topic["slug"] for topic in payload] == ["graph-methods"]


+def test_cli_can_list_topic_phrase_reviews(tmp_path: Path):
+    bib_path = tmp_path / "input.bib"
+    bib_path.write_text(
+        """
+@article{seed2024,
+  author = {Seed, Alice},
+  title = {Seed Paper},
+  year = {2024}
+}
+""",
+        encoding="utf-8",
+    )
+    ingest = run_cli(tmp_path, "ingest", str(bib_path))
+    assert ingest.returncode == 0
+
+    from citegeist.storage import BibliographyStore
+
+    database = tmp_path / "library.sqlite3"
+    store = BibliographyStore(database)
+    try:
+        store.add_entry_topic(
+            "seed2024",
+            topic_slug="graph-methods",
+            topic_name="Graph Methods",
+            source_type="talkorigins",
+            source_url="https://example.org/topics/graph-methods",
+            source_label="topic-seed",
+        )
+        store.ensure_topic("abiogenesis", "Abiogenesis")
+        store.stage_topic_phrase_suggestion("graph-methods", "graph networks biology")
+        store.stage_topic_phrase_suggestion("abiogenesis", "abiogenesis life origin")
+        store.review_topic_phrase_suggestion("abiogenesis", "accepted")
+    finally:
+        store.close()
+
+    result = run_cli(tmp_path, "topic-phrase-reviews", "--phrase-review-status", "pending")
+    assert result.returncode == 0
+    payload = json.loads(result.stdout)
+    assert [review["slug"] for review in payload] == ["graph-methods"]
+    assert payload[0]["suggested_phrase"] == "graph networks biology"
+    assert payload[0]["phrase_review_status"] == "pending"
+
+
+def test_cli_can_review_topic_phrases_in_bulk(tmp_path: Path):
+    bib_path = tmp_path / "input.bib"
+    bib_path.write_text(
+        """
+@article{seed2024,
+  author = {Seed, Alice},
+  title = {Seed Paper},
+  year = {2024}
+}
+""",
+        encoding="utf-8",
+    )
+    ingest = run_cli(tmp_path, "ingest", str(bib_path))
+    assert ingest.returncode == 0
+
+    from citegeist.storage import BibliographyStore
+
+    database = tmp_path / "library.sqlite3"
+    store = BibliographyStore(database)
+    try:
+        store.add_entry_topic(
+            "seed2024",
+            topic_slug="graph-methods",
+            topic_name="Graph Methods",
+            source_type="talkorigins",
+            source_url="https://example.org/topics/graph-methods",
+            source_label="topic-seed",
+        )
+        store.ensure_topic("abiogenesis", "Abiogenesis")
+        store.stage_topic_phrase_suggestion("graph-methods", "graph networks biology")
+        store.stage_topic_phrase_suggestion("abiogenesis", "abiogenesis life origin")
+    finally:
+        store.close()
+
+    review_path = tmp_path / "phrase-review.json"
+    review_path.write_text(
+        json.dumps(
+            [
+                {
+                    "slug": "graph-methods",
+                    "status": "accepted",
+                    "review_notes": "good phrase",
+                },
+                {
+                    "slug": "abiogenesis",
+                    "status": "rejected",
+                    "review_notes": "too sparse",
+                },
+            ]
+        ),
+        encoding="utf-8",
+    )
+
+    result = run_cli(tmp_path, "review-topic-phrases", str(review_path))
+    assert result.returncode == 0
+    payload = json.loads(result.stdout)
+    assert payload[0]["reviewed"] is True
+    assert payload[1]["reviewed"] is True
+
+    pending_result = run_cli(tmp_path, "topic-phrase-reviews", "--phrase-review-status", "pending")
+    assert pending_result.returncode == 0
+    assert json.loads(pending_result.stdout) == []
+
+    rejected_result = run_cli(tmp_path, "topic-phrase-reviews", "--phrase-review-status", "rejected")
+    assert rejected_result.returncode == 0
+    rejected_payload = json.loads(rejected_result.stdout)
+    assert [review["slug"] for review in rejected_payload] == ["abiogenesis"]
+
+    topics_result = run_cli(tmp_path, "topics", "--phrase-review-status", "accepted")
+    assert topics_result.returncode == 0
+    topics_payload = json.loads(topics_result.stdout)
+    assert [topic["slug"] for topic in topics_payload] == ["graph-methods"]
+
+
+def test_cli_can_export_topic_phrase_review_template(tmp_path: Path):
+    bib_path = tmp_path / "input.bib"
+    bib_path.write_text(
+        """
+@article{seed2024,
+  author = {Seed, Alice},
+  title = {Seed Paper},
+  year = {2024}
+}
+""",
+        encoding="utf-8",
+    )
+    ingest = run_cli(tmp_path, "ingest", str(bib_path))
+    assert ingest.returncode == 0
+
+    from citegeist.storage import BibliographyStore
+
+    database = tmp_path / "library.sqlite3"
+    store = BibliographyStore(database)
+    try:
+        store.add_entry_topic(
+            "seed2024",
+            topic_slug="graph-methods",
+            topic_name="Graph Methods",
+            source_type="talkorigins",
+            source_url="https://example.org/topics/graph-methods",
+            source_label="topic-seed",
+        )
+        store.stage_topic_phrase_suggestion("graph-methods", "graph networks biology")
+    finally:
+        store.close()
+
+    output_path = tmp_path / "topic-phrase-review.json"
+    result = run_cli(
+        tmp_path,
+        "export-topic-phrase-reviews",
+        "--output",
+        str(output_path),
+    )
+    assert result.returncode == 0
+    payload = json.loads(output_path.read_text(encoding="utf-8"))
+    assert [item["slug"] for item in payload] == ["graph-methods"]
+    assert payload[0]["current_expansion_phrase"] is None
+    assert payload[0]["suggested_phrase"] == "graph networks biology"
+    assert payload[0]["current_status"] == "pending"
+    assert payload[0]["status"] == ""
+    assert payload[0]["phrase"] == "graph networks biology"
+
+
 def test_cli_export_topic(tmp_path: Path):
    bib_path = tmp_path / "input.bib"
    bib_path.write_text(
--- a/tests/test_storage.py
+++ b/tests/test_storage.py
@ -307,7 +307,7 @@ def test_store_can_stage_and_review_topic_phrase_suggestion():

        reviewed = store.get_topic("graph-methods")
        assert reviewed is not None
-        assert reviewed["suggested_phrase"] == "graph networks biology"
+        assert reviewed["suggested_phrase"] is None
        assert reviewed["expansion_phrase"] == "graph networks biology"
        assert reviewed["phrase_review_status"] == "accepted"
        assert reviewed["phrase_review_notes"] == "looks good"
@ -333,6 +333,52 @@ def test_store_can_filter_topics_by_phrase_review_status():
        store.close()


+def test_store_can_list_topic_phrase_reviews():
+    store = BibliographyStore()
+    try:
+        store.ensure_topic("graph-methods", "Graph Methods")
+        store.ensure_topic("abiogenesis", "Abiogenesis")
+        store.ensure_topic("plain-topic", "Plain Topic")
+        store.stage_topic_phrase_suggestion("graph-methods", "graph networks biology")
+        store.stage_topic_phrase_suggestion("abiogenesis", "abiogenesis life origin")
+        store.review_topic_phrase_suggestion("abiogenesis", "accepted")
+
+        reviews = store.list_topic_phrase_reviews()
+        pending_reviews = store.list_topic_phrase_reviews(phrase_review_status="pending")
+
+        assert [review["slug"] for review in reviews] == ["graph-methods"]
+        assert reviews[0]["suggested_phrase"] == "graph networks biology"
+        assert reviews[0]["phrase_review_status"] == "pending"
+        assert [review["slug"] for review in pending_reviews] == ["graph-methods"]
+    finally:
+        store.close()
+
+
+def test_store_rejected_topic_phrase_stays_in_review_queue():
+    store = BibliographyStore()
+    try:
+        store.ensure_topic("graph-methods", "Graph Methods")
+        store.stage_topic_phrase_suggestion("graph-methods", "graph networks biology")
+
+        assert store.review_topic_phrase_suggestion(
+            "graph-methods",
+            "rejected",
+            review_notes="too broad",
+        ) is True
+
+        topic = store.get_topic("graph-methods")
+        assert topic is not None
+        assert topic["suggested_phrase"] == "graph networks biology"
+        assert topic["expansion_phrase"] is None
+        assert topic["phrase_review_status"] == "rejected"
+
+        reviews = store.list_topic_phrase_reviews()
+        assert [review["slug"] for review in reviews] == ["graph-methods"]
+        assert reviews[0]["phrase_review_status"] == "rejected"
+    finally:
+        store.close()
+
+
 def test_store_search_text_can_filter_by_topic():
    store = BibliographyStore()
    try:
--- a/tests/test_talkorigins.py
+++ b/tests/test_talkorigins.py
@ -5,8 +5,8 @@ from pathlib import Path

 from citegeist.batch import load_batch_jobs
 from citegeist.bibtex import BibEntry
+from citegeist.examples.talkorigins import TalkOriginsScraper, normalize_topic_entries
 from citegeist.storage import BibliographyStore
-from citegeist.talkorigins import TalkOriginsScraper, normalize_topic_entries


 INDEX_HTML = """
Author	SHA1	Message	Date
welsberr	c1a977b5e2	Add artificial life seeding example	2026-03-20 08:25:36 -04:00
welsberr	c76707e45e	Reframe TalkOrigins as an example workflow	2026-03-20 08:22:44 -04:00
welsberr	dc53d16af5	Add topic phrase review export workflow	2026-03-20 08:14:41 -04:00