Add artificial life seeding example
This commit is contained in:
parent
c76707e45e
commit
c1a977b5e2
|
|
@ -60,7 +60,10 @@ The initial repo includes:
|
||||||
- full-text-search-ready indexing over title, abstract, and fulltext when SQLite FTS5 is available;
|
- full-text-search-ready indexing over title, abstract, and fulltext when SQLite FTS5 is available;
|
||||||
- tests covering parsing, ingestion, relation storage, and search.
|
- tests covering parsing, ingestion, relation storage, and search.
|
||||||
|
|
||||||
Example applications live alongside the core package rather than defining it. The current example corpus pipeline is the TalkOrigins bibliography workflow under [`citegeist.examples.talkorigins`](./src/citegeist/examples/talkorigins.py) with a usage guide in [examples/talkorigins/README.md](./examples/talkorigins/README.md).
|
Example applications live alongside the core package rather than defining it. Current examples include:
|
||||||
|
|
||||||
|
- a topic-only bootstrap workflow for `artificial life` in [examples/artificial-life/README.md](./examples/artificial-life/README.md);
|
||||||
|
- the TalkOrigins bibliography pipeline under [`citegeist.examples.talkorigins`](./src/citegeist/examples/talkorigins.py) with a usage guide in [examples/talkorigins/README.md](./examples/talkorigins/README.md).
|
||||||
|
|
||||||
The prioritized execution plan lives in [ROADMAP.md](./ROADMAP.md).
|
The prioritized execution plan lives in [ROADMAP.md](./ROADMAP.md).
|
||||||
|
|
||||||
|
|
@ -176,6 +179,8 @@ PYTHONPATH=src .venv/bin/python -m citegeist example-talkorigins-duplicates talk
|
||||||
|
|
||||||
The older `scrape-talkorigins`-style command names remain available as compatibility aliases. The full example workflow and reconstruction notes live in [examples/talkorigins/README.md](./examples/talkorigins/README.md).
|
The older `scrape-talkorigins`-style command names remain available as compatibility aliases. The full example workflow and reconstruction notes live in [examples/talkorigins/README.md](./examples/talkorigins/README.md).
|
||||||
|
|
||||||
|
For a smaller example that starts from a topic phrase alone, see [examples/artificial-life/README.md](./examples/artificial-life/README.md).
|
||||||
|
|
||||||
Correction files are simple JSON:
|
Correction files are simple JSON:
|
||||||
|
|
||||||
```json
|
```json
|
||||||
|
|
|
||||||
|
|
@ -0,0 +1,100 @@
|
||||||
|
# Artificial Life Topic-Seeding Example
|
||||||
|
|
||||||
|
This example shows the smallest useful `citegeist` workflow that starts from a topic phrase alone.
|
||||||
|
|
||||||
|
The seed phrase is:
|
||||||
|
|
||||||
|
```text
|
||||||
|
artificial life
|
||||||
|
```
|
||||||
|
|
||||||
|
## What It Demonstrates
|
||||||
|
|
||||||
|
- topic-only bootstrap without a seed `.bib`;
|
||||||
|
- previewing ranked candidate seed entries before writing anything;
|
||||||
|
- storing a curated topic slug, topic name, and expansion phrase in the database;
|
||||||
|
- running later topic-aware expansion from that stored phrase.
|
||||||
|
|
||||||
|
## Preview First
|
||||||
|
|
||||||
|
Use a preview run to inspect the best candidate seed entries without changing the database:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
PYTHONPATH=src .venv/bin/python -m citegeist --db library.sqlite3 \
|
||||||
|
bootstrap \
|
||||||
|
--topic "artificial life" \
|
||||||
|
--topic-slug artificial-life \
|
||||||
|
--topic-name "Artificial life" \
|
||||||
|
--store-topic-phrase "artificial life alife artificial organisms complex systems evolution simulation" \
|
||||||
|
--topic-limit 10 \
|
||||||
|
--topic-commit-limit 5 \
|
||||||
|
--preview
|
||||||
|
```
|
||||||
|
|
||||||
|
That returns ranked candidates gathered through the configured resolver/search stack.
|
||||||
|
|
||||||
|
## Commit The Topic Seeds
|
||||||
|
|
||||||
|
Once the preview looks reasonable, run the same bootstrap without `--preview`:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
PYTHONPATH=src .venv/bin/python -m citegeist --db library.sqlite3 \
|
||||||
|
bootstrap \
|
||||||
|
--topic "artificial life" \
|
||||||
|
--topic-slug artificial-life \
|
||||||
|
--topic-name "Artificial life" \
|
||||||
|
--store-topic-phrase "artificial life alife artificial organisms complex systems evolution simulation" \
|
||||||
|
--topic-limit 10 \
|
||||||
|
--topic-commit-limit 5
|
||||||
|
```
|
||||||
|
|
||||||
|
That does three things:
|
||||||
|
|
||||||
|
1. finds topic-relevant seed entries;
|
||||||
|
2. stores them in the bibliography database;
|
||||||
|
3. creates or updates the `artificial-life` topic row with the curated expansion phrase.
|
||||||
|
|
||||||
|
## Inspect The Result
|
||||||
|
|
||||||
|
```bash
|
||||||
|
PYTHONPATH=src .venv/bin/python -m citegeist --db library.sqlite3 topics
|
||||||
|
PYTHONPATH=src .venv/bin/python -m citegeist --db library.sqlite3 topic-entries artificial-life
|
||||||
|
```
|
||||||
|
|
||||||
|
If you want to adjust the stored phrase later:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
PYTHONPATH=src .venv/bin/python -m citegeist --db library.sqlite3 \
|
||||||
|
set-topic-phrase artificial-life "artificial life alife artificial organisms autonomous agents evolution simulation"
|
||||||
|
```
|
||||||
|
|
||||||
|
## Optional Batch Form
|
||||||
|
|
||||||
|
The same topic-only seed can be expressed as a batch job:
|
||||||
|
|
||||||
|
```json
|
||||||
|
[
|
||||||
|
{
|
||||||
|
"name": "artificial-life-topic-seed",
|
||||||
|
"topic": "artificial life",
|
||||||
|
"topic_slug": "artificial-life",
|
||||||
|
"topic_name": "Artificial life",
|
||||||
|
"topic_phrase": "artificial life alife artificial organisms complex systems evolution simulation",
|
||||||
|
"topic_limit": 10,
|
||||||
|
"topic_commit_limit": 5,
|
||||||
|
"expand": false
|
||||||
|
}
|
||||||
|
]
|
||||||
|
```
|
||||||
|
|
||||||
|
Run it with:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
PYTHONPATH=src .venv/bin/python -m citegeist --db library.sqlite3 bootstrap-batch artificial-life.json
|
||||||
|
```
|
||||||
|
|
||||||
|
## Notes
|
||||||
|
|
||||||
|
- This example is intentionally generic and corpus-independent.
|
||||||
|
- The exact candidate set depends on live source availability and resolver behavior.
|
||||||
|
- Prefer preview mode before committing topic-only seeds, because topic phrases are noisier than curated seed `.bib` inputs.
|
||||||
Loading…
Reference in New Issue