Add CLI examples cookbook

This commit is contained in:
welsberr 2026-03-20 10:37:16 -04:00
parent e4eaf52393
commit c0fe9de6f0
2 changed files with 644 additions and 0 deletions

View File

@ -62,6 +62,7 @@ The initial repo includes:
Example applications live alongside the core package rather than defining it. Current examples include:
- a comprehensive CLI cookbook in [examples/cli/README.md](./examples/cli/README.md);
- a topic-only bootstrap workflow for `artificial life` in [examples/artificial-life/README.md](./examples/artificial-life/README.md);
- the TalkOrigins bibliography pipeline under [`citegeist.examples.talkorigins`](./src/citegeist/examples/talkorigins.py) with a usage guide in [examples/talkorigins/README.md](./examples/talkorigins/README.md).
@ -150,6 +151,8 @@ PYTHONPATH=src .venv/bin/python -m citegeist --db library.sqlite3 harvest-oai ht
PYTHONPATH=src .venv/bin/python -m citegeist --db library.sqlite3 export --output reviewed.bib
```
For a fuller option-by-option CLI cookbook, see [examples/cli/README.md](./examples/cli/README.md).
For live-source development, prefer fixture-backed or cache-backed source clients so resolver and expansion work can be exercised repeatedly without re-hitting upstream APIs on every run.
## Example Application

641
examples/cli/README.md Normal file
View File

@ -0,0 +1,641 @@
# CLI Examples
This guide gives example invocations for the `citegeist` CLI, including the major option combinations for each command.
Where a topic is named, this guide uses:
- topic phrase: `artificial life`
- topic slug: `artificial-life`
- topic name: `Artificial life`
Assume:
```bash
cd citegeist
export PYTHONPATH=src
```
## Global Option
Use a non-default database path:
```bash
.venv/bin/python -m citegeist --db library.sqlite3 topics
```
## Ingest
Basic ingest:
```bash
.venv/bin/python -m citegeist --db library.sqlite3 ingest references.bib
```
Set initial review status:
```bash
.venv/bin/python -m citegeist --db library.sqlite3 ingest references.bib --status reviewed
```
Set a provenance label:
```bash
.venv/bin/python -m citegeist --db library.sqlite3 ingest references.bib --source-label "examples/artificial-life/references.bib"
```
Use both ingest options together:
```bash
.venv/bin/python -m citegeist --db library.sqlite3 ingest references.bib --status draft --source-label "manual-import:artificial-life"
```
## Search
Basic search:
```bash
.venv/bin/python -m citegeist --db library.sqlite3 search "artificial life"
```
Limit the number of matches:
```bash
.venv/bin/python -m citegeist --db library.sqlite3 search "artificial life" --limit 5
```
Restrict search to one topic slice:
```bash
.venv/bin/python -m citegeist --db library.sqlite3 search "artificial life" --topic artificial-life
```
## Show
Show one entry:
```bash
.venv/bin/python -m citegeist --db library.sqlite3 show langton1989artificial1
```
List entries:
```bash
.venv/bin/python -m citegeist --db library.sqlite3 show --limit 10
```
Include provenance:
```bash
.venv/bin/python -m citegeist --db library.sqlite3 show langton1989artificial1 --provenance
```
Include conflicts:
```bash
.venv/bin/python -m citegeist --db library.sqlite3 show langton1989artificial1 --conflicts
```
Use both:
```bash
.venv/bin/python -m citegeist --db library.sqlite3 show langton1989artificial1 --provenance --conflicts
```
## Export
Export the whole library:
```bash
.venv/bin/python -m citegeist --db library.sqlite3 export
```
Export selected citation keys:
```bash
.venv/bin/python -m citegeist --db library.sqlite3 export langton1989artificial1 bedau2003artificial2
```
Write BibTeX to a file:
```bash
.venv/bin/python -m citegeist --db library.sqlite3 export --output artificial-life.bib
```
## Entry Review
Set review status:
```bash
.venv/bin/python -m citegeist --db library.sqlite3 set-status langton1989artificial1 reviewed
```
Resolve field conflicts:
```bash
.venv/bin/python -m citegeist --db library.sqlite3 resolve-conflicts langton1989artificial1 title accepted
```
Reject a conflict instead:
```bash
.venv/bin/python -m citegeist --db library.sqlite3 resolve-conflicts langton1989artificial1 title rejected
```
Apply the latest proposed conflict value:
```bash
.venv/bin/python -m citegeist --db library.sqlite3 apply-conflict langton1989artificial1 title
```
## Extract
Extract draft BibTeX from plaintext:
```bash
.venv/bin/python -m citegeist extract references.txt
```
Write extracted BibTeX to a file:
```bash
.venv/bin/python -m citegeist extract references.txt --output extracted-artificial-life.bib
```
## Resolve
Resolve one or more entries against remote metadata:
```bash
.venv/bin/python -m citegeist --db library.sqlite3 resolve langton1989artificial1 bedau2003artificial2
```
## Graph Traversal
Basic traversal:
```bash
.venv/bin/python -m citegeist --db library.sqlite3 graph langton1989artificial1
```
Use multiple relation filters:
```bash
.venv/bin/python -m citegeist --db library.sqlite3 graph langton1989artificial1 --relation cites --relation cited_by
```
Set traversal depth:
```bash
.venv/bin/python -m citegeist --db library.sqlite3 graph langton1989artificial1 --depth 2
```
Filter by target review status:
```bash
.venv/bin/python -m citegeist --db library.sqlite3 graph langton1989artificial1 --review-status reviewed
```
Show only unresolved targets:
```bash
.venv/bin/python -m citegeist --db library.sqlite3 graph langton1989artificial1 --missing-only
```
Render DOT instead of traversal rows:
```bash
.venv/bin/python -m citegeist --db library.sqlite3 graph langton1989artificial1 --format dot
```
Render node/edge JSON for visualization:
```bash
.venv/bin/python -m citegeist --db library.sqlite3 graph langton1989artificial1 --format json-graph
```
Write graph output to a file:
```bash
.venv/bin/python -m citegeist --db library.sqlite3 graph langton1989artificial1 --depth 2 --format dot --output artificial-life.dot
```
## Graph Viewer
Render a standalone HTML page from a `json-graph` export:
```bash
.venv/bin/python -m citegeist graph-view artificial-life.json --output artificial-life.html
```
Set the HTML page title:
```bash
.venv/bin/python -m citegeist graph-view artificial-life.json --output artificial-life.html --title "Artificial Life Graph"
```
## Graph Expansion
Expand from one or more seed entries:
```bash
.venv/bin/python -m citegeist --db library.sqlite3 expand langton1989artificial1
```
Choose the source:
```bash
.venv/bin/python -m citegeist --db library.sqlite3 expand langton1989artificial1 --source openalex
```
Choose relation direction:
```bash
.venv/bin/python -m citegeist --db library.sqlite3 expand langton1989artificial1 --source openalex --relation cited_by
```
Limit discoveries per seed:
```bash
.venv/bin/python -m citegeist --db library.sqlite3 expand langton1989artificial1 --source openalex --limit 10
```
## Topic Expansion
Basic topic expansion from stored topic metadata:
```bash
.venv/bin/python -m citegeist --db library.sqlite3 expand-topic artificial-life
```
Override the topic phrase:
```bash
.venv/bin/python -m citegeist --db library.sqlite3 expand-topic artificial-life --topic-phrase "artificial life alife artificial organisms"
```
Choose source and relation:
```bash
.venv/bin/python -m citegeist --db library.sqlite3 expand-topic artificial-life --source openalex --relation cited_by
```
Control seed and discovery limits:
```bash
.venv/bin/python -m citegeist --db library.sqlite3 expand-topic artificial-life --seed-limit 10 --per-seed-limit 5
```
Restrict to trusted seed entries:
```bash
.venv/bin/python -m citegeist --db library.sqlite3 expand-topic artificial-life --seed-key langton1989artificial1 --seed-key bedau2003artificial2
```
Raise or lower the topic assignment threshold:
```bash
.venv/bin/python -m citegeist --db library.sqlite3 expand-topic artificial-life --min-relevance 0.3
```
Preview without writing:
```bash
.venv/bin/python -m citegeist --db library.sqlite3 expand-topic artificial-life --preview
```
## Topic Phrase Storage
Set a stored topic phrase:
```bash
.venv/bin/python -m citegeist --db library.sqlite3 set-topic-phrase artificial-life "artificial life alife artificial organisms complex systems evolution simulation"
```
Clear a stored topic phrase:
```bash
.venv/bin/python -m citegeist --db library.sqlite3 set-topic-phrase artificial-life --clear
```
## OAI-PMH Harvesting
Inspect a repository:
```bash
.venv/bin/python -m citegeist discover-oai https://example.edu/oai
```
Harvest with default metadata prefix:
```bash
.venv/bin/python -m citegeist --db library.sqlite3 harvest-oai https://example.edu/oai
```
Use an alternate metadata prefix:
```bash
.venv/bin/python -m citegeist --db library.sqlite3 harvest-oai https://example.edu/oai --metadata-prefix mods
```
Restrict to a set:
```bash
.venv/bin/python -m citegeist --db library.sqlite3 harvest-oai https://example.edu/oai --set artificial-life
```
Harvest a date range:
```bash
.venv/bin/python -m citegeist --db library.sqlite3 harvest-oai https://example.edu/oai --from 2024-01-01 --until 2024-12-31
```
Limit harvested records and set review status:
```bash
.venv/bin/python -m citegeist --db library.sqlite3 harvest-oai https://example.edu/oai --limit 10 --status draft
```
## Bootstrap
Seed from a BibTeX file:
```bash
.venv/bin/python -m citegeist --db library.sqlite3 bootstrap --seed-bib artificial-life.bib
```
Seed from a topic phrase alone:
```bash
.venv/bin/python -m citegeist --db library.sqlite3 bootstrap --topic "artificial life"
```
Use both a seed `.bib` and a topic phrase:
```bash
.venv/bin/python -m citegeist --db library.sqlite3 bootstrap --seed-bib artificial-life.bib --topic "artificial life"
```
Store topic metadata while bootstrapping:
```bash
.venv/bin/python -m citegeist --db library.sqlite3 \
bootstrap \
--topic "artificial life" \
--topic-slug artificial-life \
--topic-name "Artificial life" \
--store-topic-phrase "artificial life alife artificial organisms complex systems evolution simulation"
```
Control topic-search candidate count:
```bash
.venv/bin/python -m citegeist --db library.sqlite3 bootstrap --topic "artificial life" --topic-limit 10
```
Control how many topic candidates are actually committed:
```bash
.venv/bin/python -m citegeist --db library.sqlite3 bootstrap --topic "artificial life" --topic-commit-limit 5
```
Disable immediate expansion:
```bash
.venv/bin/python -m citegeist --db library.sqlite3 bootstrap --topic "artificial life" --no-expand
```
Preview without writing:
```bash
.venv/bin/python -m citegeist --db library.sqlite3 bootstrap --topic "artificial life" --preview
```
Set review status for imported entries:
```bash
.venv/bin/python -m citegeist --db library.sqlite3 bootstrap --topic "artificial life" --status reviewed
```
## Batch Bootstrap
Run a JSON batch file:
```bash
.venv/bin/python -m citegeist --db library.sqlite3 bootstrap-batch artificial-life.json
```
## Topic Phrase Review Workflow
Apply topic phrases directly:
```bash
.venv/bin/python -m citegeist --db library.sqlite3 apply-topic-phrases topic-phrases.json
```
Stage topic phrase suggestions:
```bash
.venv/bin/python -m citegeist --db library.sqlite3 stage-topic-phrases topic-phrases.json
```
Review one staged phrase:
```bash
.venv/bin/python -m citegeist --db library.sqlite3 review-topic-phrase artificial-life accepted
```
Add notes while reviewing:
```bash
.venv/bin/python -m citegeist --db library.sqlite3 review-topic-phrase artificial-life accepted --notes "good fit for topic expansion"
```
Override the accepted phrase while reviewing:
```bash
.venv/bin/python -m citegeist --db library.sqlite3 review-topic-phrase artificial-life accepted --phrase "artificial life alife artificial organisms autonomous agents"
```
Apply review decisions in bulk:
```bash
.venv/bin/python -m citegeist --db library.sqlite3 review-topic-phrases topic-phrase-review.json
```
List staged phrase reviews:
```bash
.venv/bin/python -m citegeist --db library.sqlite3 topic-phrase-reviews
```
Filter review rows by status:
```bash
.venv/bin/python -m citegeist --db library.sqlite3 topic-phrase-reviews --phrase-review-status pending
```
Export an editable review template:
```bash
.venv/bin/python -m citegeist --db library.sqlite3 export-topic-phrase-reviews
```
Limit exported review rows:
```bash
.venv/bin/python -m citegeist --db library.sqlite3 export-topic-phrase-reviews --limit 10
```
Filter exported rows by status:
```bash
.venv/bin/python -m citegeist --db library.sqlite3 export-topic-phrase-reviews --phrase-review-status rejected
```
Write the review template to a file:
```bash
.venv/bin/python -m citegeist --db library.sqlite3 export-topic-phrase-reviews --output topic-phrase-review.json
```
## Topic Inspection
List topics:
```bash
.venv/bin/python -m citegeist --db library.sqlite3 topics
```
Limit topic rows:
```bash
.venv/bin/python -m citegeist --db library.sqlite3 topics --limit 20
```
Filter topics by phrase review status:
```bash
.venv/bin/python -m citegeist --db library.sqlite3 topics --phrase-review-status pending
```
List entries for a topic:
```bash
.venv/bin/python -m citegeist --db library.sqlite3 topic-entries artificial-life
```
Limit topic entries:
```bash
.venv/bin/python -m citegeist --db library.sqlite3 topic-entries artificial-life --limit 25
```
Export one topic slice as BibTeX:
```bash
.venv/bin/python -m citegeist --db library.sqlite3 export-topic artificial-life
```
Write the topic slice to a file:
```bash
.venv/bin/python -m citegeist --db library.sqlite3 export-topic artificial-life --output artificial-life-topic.bib
```
## TalkOrigins Example Commands
Scrape the example corpus:
```bash
.venv/bin/python -m citegeist example-talkorigins-scrape talkorigins-out
```
Override the source URL:
```bash
.venv/bin/python -m citegeist example-talkorigins-scrape talkorigins-out --base-url https://www.talkorigins.org/origins/biblio/
```
Limit topics and entries:
```bash
.venv/bin/python -m citegeist example-talkorigins-scrape talkorigins-out --limit-topics 5 --limit-entries-per-topic 20
```
Resolve seeds, ingest immediately, and keep expansion disabled:
```bash
.venv/bin/python -m citegeist --db library.sqlite3 example-talkorigins-scrape talkorigins-out --resolve-seeds --ingest --no-expand
```
Disable snapshot reuse:
```bash
.venv/bin/python -m citegeist example-talkorigins-scrape talkorigins-out --no-resume
```
Control generated bootstrap defaults:
```bash
.venv/bin/python -m citegeist example-talkorigins-scrape talkorigins-out --topic-limit 10 --topic-commit-limit 5 --status draft
```
Validate the generated manifest:
```bash
.venv/bin/python -m citegeist example-talkorigins-validate talkorigins-out/talkorigins_manifest.json
```
Suggest phrases from the corpus:
```bash
.venv/bin/python -m citegeist example-talkorigins-suggest-phrases talkorigins-out/talkorigins_manifest.json --topic abiogenesis --limit 10 --output topic-phrases.json
```
Inspect duplicate clusters:
```bash
.venv/bin/python -m citegeist example-talkorigins-duplicates talkorigins-out/talkorigins_manifest.json --limit 20 --min-count 2 --match origin --topic abiogenesis --preview --weak-only
```
Ingest the reconstructed corpus:
```bash
.venv/bin/python -m citegeist --db talkorigins.sqlite3 example-talkorigins-ingest talkorigins-out/talkorigins_manifest.json --status draft
```
Disable deduplication during example ingest:
```bash
.venv/bin/python -m citegeist --db talkorigins.sqlite3 example-talkorigins-ingest talkorigins-out/talkorigins_manifest.json --no-dedupe
```
Enrich weak canonical entries:
```bash
.venv/bin/python -m citegeist --db talkorigins.sqlite3 example-talkorigins-enrich talkorigins-out/talkorigins_manifest.json --limit 20 --min-count 2 --match origin --topic abiogenesis --status enriched
```
Apply enrichment and allow unsafe search matches for experiments:
```bash
.venv/bin/python -m citegeist --db talkorigins-copy.sqlite3 example-talkorigins-enrich talkorigins-out/talkorigins_manifest.json --apply --allow-unsafe-search-matches
```
Export a review artifact:
```bash
.venv/bin/python -m citegeist --db talkorigins.sqlite3 example-talkorigins-review talkorigins-out/talkorigins_manifest.json --limit 20 --min-count 2 --match origin --topic abiogenesis --output talkorigins-review.json
```
Apply curated corrections:
```bash
.venv/bin/python -m citegeist --db talkorigins.sqlite3 example-talkorigins-apply-corrections talkorigins-out/talkorigins_manifest.json talkorigins-corrections.json --status reviewed
```
## Notes
- Some commands depend on live source access.
- For topic-oriented examples, use preview mode before committing changes when possible.
- The older TalkOrigins alias commands remain available, but the example-prefixed names are the preferred surface.