CiteGeist/examples/cli
welsberr f06a68aedc Normalize malformed discovered author names 2026-03-21 03:17:47 -04:00
..
README.md Normalize malformed discovered author names 2026-03-21 03:17:47 -04:00

README.md

CLI Examples

This guide gives example invocations for the citegeist CLI, including the major option combinations for each command.

Where a topic is named, this guide uses:

  • topic phrase: artificial life
  • topic slug: artificial-life
  • topic name: Artificial life

Assume:

cd citegeist
export PYTHONPATH=src

Setup

Purpose: point commands at the right database before doing anything else.

Global Option

Use a non-default database path:

.venv/bin/python -m citegeist --db library.sqlite3 topics

Build And Inspect A Library

Purpose: ingest records, search them, inspect them, and export them.

Ingest

Basic ingest:

.venv/bin/python -m citegeist --db library.sqlite3 ingest references.bib

Set initial review status:

.venv/bin/python -m citegeist --db library.sqlite3 ingest references.bib --status reviewed

Set a provenance label:

.venv/bin/python -m citegeist --db library.sqlite3 ingest references.bib --source-label "examples/artificial-life/references.bib"

Use both ingest options together:

.venv/bin/python -m citegeist --db library.sqlite3 ingest references.bib --status draft --source-label "manual-import:artificial-life"

Basic search:

.venv/bin/python -m citegeist --db library.sqlite3 search "artificial life"

Limit the number of matches:

.venv/bin/python -m citegeist --db library.sqlite3 search "artificial life" --limit 5

Restrict search to one topic slice:

.venv/bin/python -m citegeist --db library.sqlite3 search "artificial life" --topic artificial-life

Show

Show one entry:

.venv/bin/python -m citegeist --db library.sqlite3 show langton1989artificial1

List entries:

.venv/bin/python -m citegeist --db library.sqlite3 show --limit 10

Include provenance:

.venv/bin/python -m citegeist --db library.sqlite3 show langton1989artificial1 --provenance

Include conflicts:

.venv/bin/python -m citegeist --db library.sqlite3 show langton1989artificial1 --conflicts

Use both:

.venv/bin/python -m citegeist --db library.sqlite3 show langton1989artificial1 --provenance --conflicts

Export

Export the whole library:

.venv/bin/python -m citegeist --db library.sqlite3 export

Export selected citation keys:

.venv/bin/python -m citegeist --db library.sqlite3 export langton1989artificial1 bedau2003artificial2

Write BibTeX to a file:

.venv/bin/python -m citegeist --db library.sqlite3 export --output artificial-life.bib

Include DOI-only placeholder records in a broad export:

.venv/bin/python -m citegeist --db library.sqlite3 export --include-stubs --output artificial-life.bib

Review And Clean Metadata

Purpose: inspect merge conflicts, apply corrections, and enrich incomplete records.

Entry Review

Set review status:

.venv/bin/python -m citegeist --db library.sqlite3 set-status langton1989artificial1 reviewed

Resolve field conflicts:

.venv/bin/python -m citegeist --db library.sqlite3 resolve-conflicts langton1989artificial1 title accepted

Reject a conflict instead:

.venv/bin/python -m citegeist --db library.sqlite3 resolve-conflicts langton1989artificial1 title rejected

Apply the latest proposed conflict value:

.venv/bin/python -m citegeist --db library.sqlite3 apply-conflict langton1989artificial1 title

Extract

Extract draft BibTeX from plaintext:

.venv/bin/python -m citegeist extract references.txt

Write extracted BibTeX to a file:

.venv/bin/python -m citegeist extract references.txt --output extracted-artificial-life.bib

Resolve

Resolve one or more entries against remote metadata:

.venv/bin/python -m citegeist --db library.sqlite3 resolve langton1989artificial1 bedau2003artificial2

Preview DOI-bearing placeholder records before enriching them:

.venv/bin/python -m citegeist --db library.sqlite3 resolve-stubs --doi-only --preview --limit 25

Enrich DOI-bearing placeholder records inside one topic slice:

.venv/bin/python -m citegeist --db library.sqlite3 resolve-stubs --doi-only --topic artificial-life --limit 25

Preview all current @misc entries with DOIs, not just placeholder-like stubs:

.venv/bin/python -m citegeist --db library.sqlite3 resolve-stubs --doi-only --all-misc --preview --limit 25

Re-enrich all current @misc entries with DOIs:

.venv/bin/python -m citegeist --db library.sqlite3 resolve-stubs --doi-only --all-misc --limit 25

When Crossref expansion only yields an unstructured citation blob without a DOI, citegeist now skips materializing that discovery instead of storing it as a weak @misc entry. Cleaner fallback cases that infer a more specific type, such as proceedings-like titles, are still admitted. Thesis and dissertation citation blobs are also normalized more aggressively so fallback @phdthesis entries keep the work title instead of the entire ProQuest-style citation string.

OpenAlex expansion now applies the same kind of admission control before writing or previewing discoveries. DOI-backed discoveries prefer DOI-based citation keys, noisy webpage/export abstracts are dropped, generic venue-title stubs are rejected, and weak DOI-less article records that merely shadow an existing book/chapter/dissertation title in your store are suppressed instead of being materialized as parallel duplicates. Both OpenAlex and Crossref discovery also normalize some malformed upstream author strings so records like J., Fogel L. are stored in stable BibTeX form as Fogel, L. J..

Explore Citation Graphs

Purpose: traverse citation edges, export graph data, and render quick visualizations.

Graph Traversal

Basic traversal:

.venv/bin/python -m citegeist --db library.sqlite3 graph langton1989artificial1

Use multiple relation filters:

.venv/bin/python -m citegeist --db library.sqlite3 graph langton1989artificial1 --relation cites --relation cited_by

Set traversal depth:

.venv/bin/python -m citegeist --db library.sqlite3 graph langton1989artificial1 --depth 2

Filter by target review status:

.venv/bin/python -m citegeist --db library.sqlite3 graph langton1989artificial1 --review-status reviewed

Show only unresolved targets:

.venv/bin/python -m citegeist --db library.sqlite3 graph langton1989artificial1 --missing-only

Render DOT instead of traversal rows:

.venv/bin/python -m citegeist --db library.sqlite3 graph langton1989artificial1 --format dot

Render node/edge JSON for visualization:

.venv/bin/python -m citegeist --db library.sqlite3 graph langton1989artificial1 --format json-graph

Write graph output to a file:

.venv/bin/python -m citegeist --db library.sqlite3 graph langton1989artificial1 --depth 2 --format dot --output artificial-life.dot

Graph Viewer

Render a standalone HTML page from a json-graph export:

.venv/bin/python -m citegeist graph-view artificial-life.json --output artificial-life.html

Set the HTML page title:

.venv/bin/python -m citegeist graph-view artificial-life.json --output artificial-life.html --title "Artificial Life Graph"

Graph Expansion

Expand from one or more seed entries:

.venv/bin/python -m citegeist --db library.sqlite3 expand langton1989artificial1

Choose the source:

.venv/bin/python -m citegeist --db library.sqlite3 expand langton1989artificial1 --source openalex

Choose relation direction:

.venv/bin/python -m citegeist --db library.sqlite3 expand langton1989artificial1 --source openalex --relation cited_by

Limit discoveries per seed:

.venv/bin/python -m citegeist --db library.sqlite3 expand langton1989artificial1 --source openalex --limit 10

Build A Topic-Centered Bibliography

Purpose: create, expand, inspect, and export a topic slice such as artificial life.

Topic Expansion

Basic topic expansion from stored topic metadata:

.venv/bin/python -m citegeist --db library.sqlite3 expand-topic artificial-life

Override the topic phrase:

.venv/bin/python -m citegeist --db library.sqlite3 expand-topic artificial-life --topic-phrase "artificial life alife artificial organisms"

Choose source and relation:

.venv/bin/python -m citegeist --db library.sqlite3 expand-topic artificial-life --source openalex --relation cited_by

Control seed and discovery limits:

.venv/bin/python -m citegeist --db library.sqlite3 expand-topic artificial-life --seed-limit 10 --per-seed-limit 5

Restrict to trusted seed entries:

.venv/bin/python -m citegeist --db library.sqlite3 expand-topic artificial-life --seed-key langton1989artificial1 --seed-key bedau2003artificial2

Raise or lower the topic assignment threshold:

.venv/bin/python -m citegeist --db library.sqlite3 expand-topic artificial-life --min-relevance 0.3

Preview without writing:

.venv/bin/python -m citegeist --db library.sqlite3 expand-topic artificial-life --preview

Topic Phrase Storage

Set a stored topic phrase:

.venv/bin/python -m citegeist --db library.sqlite3 set-topic-phrase artificial-life "artificial life alife artificial organisms complex systems evolution simulation"

Clear a stored topic phrase:

.venv/bin/python -m citegeist --db library.sqlite3 set-topic-phrase artificial-life --clear

Topic Inspection

List topics:

.venv/bin/python -m citegeist --db library.sqlite3 topics

Limit topic rows:

.venv/bin/python -m citegeist --db library.sqlite3 topics --limit 20

Filter topics by phrase review status:

.venv/bin/python -m citegeist --db library.sqlite3 topics --phrase-review-status pending

List entries for a topic:

.venv/bin/python -m citegeist --db library.sqlite3 topic-entries artificial-life

Limit topic entries:

.venv/bin/python -m citegeist --db library.sqlite3 topic-entries artificial-life --limit 25

Export one topic slice as BibTeX:

.venv/bin/python -m citegeist --db library.sqlite3 export-topic artificial-life

Write the topic slice to a file:

.venv/bin/python -m citegeist --db library.sqlite3 export-topic artificial-life --output artificial-life-topic.bib

Include DOI-only placeholder records in the topic export:

.venv/bin/python -m citegeist --db library.sqlite3 export-topic artificial-life --include-stubs --output artificial-life-topic.bib

Bootstrap

Seed from a BibTeX file:

.venv/bin/python -m citegeist --db library.sqlite3 bootstrap --seed-bib artificial-life.bib

Seed from a topic phrase alone:

.venv/bin/python -m citegeist --db library.sqlite3 bootstrap --topic "artificial life"

Use both a seed .bib and a topic phrase:

.venv/bin/python -m citegeist --db library.sqlite3 bootstrap --seed-bib artificial-life.bib --topic "artificial life"

Store topic metadata while bootstrapping:

.venv/bin/python -m citegeist --db library.sqlite3 \
  bootstrap \
  --topic "artificial life" \
  --topic-slug artificial-life \
  --topic-name "Artificial life" \
  --store-topic-phrase "artificial life alife artificial organisms complex systems evolution simulation"

Control topic-search candidate count:

.venv/bin/python -m citegeist --db library.sqlite3 bootstrap --topic "artificial life" --topic-limit 10

Control how many topic candidates are actually committed:

.venv/bin/python -m citegeist --db library.sqlite3 bootstrap --topic "artificial life" --topic-commit-limit 5

Disable immediate expansion:

.venv/bin/python -m citegeist --db library.sqlite3 bootstrap --topic "artificial life" --no-expand

Preview without writing:

.venv/bin/python -m citegeist --db library.sqlite3 bootstrap --topic "artificial life" --preview

Set review status for imported entries:

.venv/bin/python -m citegeist --db library.sqlite3 bootstrap --topic "artificial life" --status reviewed

Batch Bootstrap

Run a JSON batch file:

.venv/bin/python -m citegeist --db library.sqlite3 bootstrap-batch artificial-life.json

Topic Phrase Review Workflow

Apply topic phrases directly:

.venv/bin/python -m citegeist --db library.sqlite3 apply-topic-phrases topic-phrases.json

Stage topic phrase suggestions:

.venv/bin/python -m citegeist --db library.sqlite3 stage-topic-phrases topic-phrases.json

Review one staged phrase:

.venv/bin/python -m citegeist --db library.sqlite3 review-topic-phrase artificial-life accepted

Add notes while reviewing:

.venv/bin/python -m citegeist --db library.sqlite3 review-topic-phrase artificial-life accepted --notes "good fit for topic expansion"

Override the accepted phrase while reviewing:

.venv/bin/python -m citegeist --db library.sqlite3 review-topic-phrase artificial-life accepted --phrase "artificial life alife artificial organisms autonomous agents"

Apply review decisions in bulk:

.venv/bin/python -m citegeist --db library.sqlite3 review-topic-phrases topic-phrase-review.json

List staged phrase reviews:

.venv/bin/python -m citegeist --db library.sqlite3 topic-phrase-reviews

Filter review rows by status:

.venv/bin/python -m citegeist --db library.sqlite3 topic-phrase-reviews --phrase-review-status pending

Export an editable review template:

.venv/bin/python -m citegeist --db library.sqlite3 export-topic-phrase-reviews

Limit exported review rows:

.venv/bin/python -m citegeist --db library.sqlite3 export-topic-phrase-reviews --limit 10

Filter exported rows by status:

.venv/bin/python -m citegeist --db library.sqlite3 export-topic-phrase-reviews --phrase-review-status rejected

Write the review template to a file:

.venv/bin/python -m citegeist --db library.sqlite3 export-topic-phrase-reviews --output topic-phrase-review.json

Harvest External Repositories

Purpose: inspect and harvest OAI-PMH repositories into the library.

OAI-PMH Harvesting

Inspect a repository:

.venv/bin/python -m citegeist discover-oai https://example.edu/oai

Harvest with default metadata prefix:

.venv/bin/python -m citegeist --db library.sqlite3 harvest-oai https://example.edu/oai

Use an alternate metadata prefix:

.venv/bin/python -m citegeist --db library.sqlite3 harvest-oai https://example.edu/oai --metadata-prefix mods

Restrict to a set:

.venv/bin/python -m citegeist --db library.sqlite3 harvest-oai https://example.edu/oai --set artificial-life

Harvest a date range:

.venv/bin/python -m citegeist --db library.sqlite3 harvest-oai https://example.edu/oai --from 2024-01-01 --until 2024-12-31

Limit harvested records and set review status:

.venv/bin/python -m citegeist --db library.sqlite3 harvest-oai https://example.edu/oai --limit 10 --status draft

Work Through Example Corpora

Purpose: run the repos example workflows without treating them as the core product surface.

TalkOrigins Example Commands

Scrape the example corpus:

.venv/bin/python -m citegeist example-talkorigins-scrape talkorigins-out

Override the source URL:

.venv/bin/python -m citegeist example-talkorigins-scrape talkorigins-out --base-url https://www.talkorigins.org/origins/biblio/

Limit topics and entries:

.venv/bin/python -m citegeist example-talkorigins-scrape talkorigins-out --limit-topics 5 --limit-entries-per-topic 20

Resolve seeds, ingest immediately, and keep expansion disabled:

.venv/bin/python -m citegeist --db library.sqlite3 example-talkorigins-scrape talkorigins-out --resolve-seeds --ingest --no-expand

Disable snapshot reuse:

.venv/bin/python -m citegeist example-talkorigins-scrape talkorigins-out --no-resume

Control generated bootstrap defaults:

.venv/bin/python -m citegeist example-talkorigins-scrape talkorigins-out --topic-limit 10 --topic-commit-limit 5 --status draft

Validate the generated manifest:

.venv/bin/python -m citegeist example-talkorigins-validate talkorigins-out/talkorigins_manifest.json

Suggest phrases from the corpus:

.venv/bin/python -m citegeist example-talkorigins-suggest-phrases talkorigins-out/talkorigins_manifest.json --topic abiogenesis --limit 10 --output topic-phrases.json

Inspect duplicate clusters:

.venv/bin/python -m citegeist example-talkorigins-duplicates talkorigins-out/talkorigins_manifest.json --limit 20 --min-count 2 --match origin --topic abiogenesis --preview --weak-only

Ingest the reconstructed corpus:

.venv/bin/python -m citegeist --db talkorigins.sqlite3 example-talkorigins-ingest talkorigins-out/talkorigins_manifest.json --status draft

Disable deduplication during example ingest:

.venv/bin/python -m citegeist --db talkorigins.sqlite3 example-talkorigins-ingest talkorigins-out/talkorigins_manifest.json --no-dedupe

Enrich weak canonical entries:

.venv/bin/python -m citegeist --db talkorigins.sqlite3 example-talkorigins-enrich talkorigins-out/talkorigins_manifest.json --limit 20 --min-count 2 --match origin --topic abiogenesis --status enriched

Apply enrichment and allow unsafe search matches for experiments:

.venv/bin/python -m citegeist --db talkorigins-copy.sqlite3 example-talkorigins-enrich talkorigins-out/talkorigins_manifest.json --apply --allow-unsafe-search-matches

Export a review artifact:

.venv/bin/python -m citegeist --db talkorigins.sqlite3 example-talkorigins-review talkorigins-out/talkorigins_manifest.json --limit 20 --min-count 2 --match origin --topic abiogenesis --output talkorigins-review.json

Apply curated corrections:

.venv/bin/python -m citegeist --db talkorigins.sqlite3 example-talkorigins-apply-corrections talkorigins-out/talkorigins_manifest.json talkorigins-corrections.json --status reviewed

Notes

  • Some commands depend on live source access.
  • For topic-oriented examples, use preview mode before committing changes when possible.
  • The older TalkOrigins alias commands remain available, but the example-prefixed names are the preferred surface.