From c1a977b5e215f512b685fa31c16d1a59a10d51e3 Mon Sep 17 00:00:00 2001 From: welsberr Date: Fri, 20 Mar 2026 08:25:36 -0400 Subject: [PATCH] Add artificial life seeding example --- README.md | 7 +- examples/artificial-life/README.md | 100 +++++++++++++++++++++++++++++ 2 files changed, 106 insertions(+), 1 deletion(-) create mode 100644 examples/artificial-life/README.md diff --git a/README.md b/README.md index cef32e0..62ded23 100644 --- a/README.md +++ b/README.md @@ -60,7 +60,10 @@ The initial repo includes: - full-text-search-ready indexing over title, abstract, and fulltext when SQLite FTS5 is available; - tests covering parsing, ingestion, relation storage, and search. -Example applications live alongside the core package rather than defining it. The current example corpus pipeline is the TalkOrigins bibliography workflow under [`citegeist.examples.talkorigins`](./src/citegeist/examples/talkorigins.py) with a usage guide in [examples/talkorigins/README.md](./examples/talkorigins/README.md). +Example applications live alongside the core package rather than defining it. Current examples include: + +- a topic-only bootstrap workflow for `artificial life` in [examples/artificial-life/README.md](./examples/artificial-life/README.md); +- the TalkOrigins bibliography pipeline under [`citegeist.examples.talkorigins`](./src/citegeist/examples/talkorigins.py) with a usage guide in [examples/talkorigins/README.md](./examples/talkorigins/README.md). The prioritized execution plan lives in [ROADMAP.md](./ROADMAP.md). @@ -176,6 +179,8 @@ PYTHONPATH=src .venv/bin/python -m citegeist example-talkorigins-duplicates talk The older `scrape-talkorigins`-style command names remain available as compatibility aliases. The full example workflow and reconstruction notes live in [examples/talkorigins/README.md](./examples/talkorigins/README.md). +For a smaller example that starts from a topic phrase alone, see [examples/artificial-life/README.md](./examples/artificial-life/README.md). + Correction files are simple JSON: ```json diff --git a/examples/artificial-life/README.md b/examples/artificial-life/README.md new file mode 100644 index 0000000..dfb1d7d --- /dev/null +++ b/examples/artificial-life/README.md @@ -0,0 +1,100 @@ +# Artificial Life Topic-Seeding Example + +This example shows the smallest useful `citegeist` workflow that starts from a topic phrase alone. + +The seed phrase is: + +```text +artificial life +``` + +## What It Demonstrates + +- topic-only bootstrap without a seed `.bib`; +- previewing ranked candidate seed entries before writing anything; +- storing a curated topic slug, topic name, and expansion phrase in the database; +- running later topic-aware expansion from that stored phrase. + +## Preview First + +Use a preview run to inspect the best candidate seed entries without changing the database: + +```bash +PYTHONPATH=src .venv/bin/python -m citegeist --db library.sqlite3 \ + bootstrap \ + --topic "artificial life" \ + --topic-slug artificial-life \ + --topic-name "Artificial life" \ + --store-topic-phrase "artificial life alife artificial organisms complex systems evolution simulation" \ + --topic-limit 10 \ + --topic-commit-limit 5 \ + --preview +``` + +That returns ranked candidates gathered through the configured resolver/search stack. + +## Commit The Topic Seeds + +Once the preview looks reasonable, run the same bootstrap without `--preview`: + +```bash +PYTHONPATH=src .venv/bin/python -m citegeist --db library.sqlite3 \ + bootstrap \ + --topic "artificial life" \ + --topic-slug artificial-life \ + --topic-name "Artificial life" \ + --store-topic-phrase "artificial life alife artificial organisms complex systems evolution simulation" \ + --topic-limit 10 \ + --topic-commit-limit 5 +``` + +That does three things: + +1. finds topic-relevant seed entries; +2. stores them in the bibliography database; +3. creates or updates the `artificial-life` topic row with the curated expansion phrase. + +## Inspect The Result + +```bash +PYTHONPATH=src .venv/bin/python -m citegeist --db library.sqlite3 topics +PYTHONPATH=src .venv/bin/python -m citegeist --db library.sqlite3 topic-entries artificial-life +``` + +If you want to adjust the stored phrase later: + +```bash +PYTHONPATH=src .venv/bin/python -m citegeist --db library.sqlite3 \ + set-topic-phrase artificial-life "artificial life alife artificial organisms autonomous agents evolution simulation" +``` + +## Optional Batch Form + +The same topic-only seed can be expressed as a batch job: + +```json +[ + { + "name": "artificial-life-topic-seed", + "topic": "artificial life", + "topic_slug": "artificial-life", + "topic_name": "Artificial life", + "topic_phrase": "artificial life alife artificial organisms complex systems evolution simulation", + "topic_limit": 10, + "topic_commit_limit": 5, + "expand": false + } +] +``` + +Run it with: + +```bash +PYTHONPATH=src .venv/bin/python -m citegeist --db library.sqlite3 bootstrap-batch artificial-life.json +``` + +## Notes + +- This example is intentionally generic and corpus-independent. +- The exact candidate set depends on live source availability and resolver behavior. +- Prefer preview mode before committing topic-only seeds, because topic phrases are noisier than curated seed `.bib` inputs.