Add stub resolution workflow
This commit is contained in:
parent
425e153bee
commit
4eba64d352
26
README.md
26
README.md
|
|
@ -133,6 +133,7 @@ PYTHONPATH=src .venv/bin/python -m citegeist --db library.sqlite3 bootstrap --se
|
||||||
PYTHONPATH=src .venv/bin/python -m citegeist --db library.sqlite3 bootstrap --topic "bayesian nonparametrics" --preview --topic-commit-limit 5
|
PYTHONPATH=src .venv/bin/python -m citegeist --db library.sqlite3 bootstrap --topic "bayesian nonparametrics" --preview --topic-commit-limit 5
|
||||||
PYTHONPATH=src .venv/bin/python -m citegeist extract references.txt --output draft.bib
|
PYTHONPATH=src .venv/bin/python -m citegeist extract references.txt --output draft.bib
|
||||||
PYTHONPATH=src .venv/bin/python -m citegeist --db library.sqlite3 resolve smith2024graphs
|
PYTHONPATH=src .venv/bin/python -m citegeist --db library.sqlite3 resolve smith2024graphs
|
||||||
|
PYTHONPATH=src .venv/bin/python -m citegeist --db library.sqlite3 resolve-stubs --doi-only --preview --limit 25
|
||||||
PYTHONPATH=src .venv/bin/python -m citegeist --db library.sqlite3 topics
|
PYTHONPATH=src .venv/bin/python -m citegeist --db library.sqlite3 topics
|
||||||
PYTHONPATH=src .venv/bin/python -m citegeist --db library.sqlite3 topic-entries abiogenesis
|
PYTHONPATH=src .venv/bin/python -m citegeist --db library.sqlite3 topic-entries abiogenesis
|
||||||
PYTHONPATH=src .venv/bin/python -m citegeist --db library.sqlite3 export-topic abiogenesis --output abiogenesis.bib
|
PYTHONPATH=src .venv/bin/python -m citegeist --db library.sqlite3 export-topic abiogenesis --output abiogenesis.bib
|
||||||
|
|
@ -157,19 +158,20 @@ For live-source development, prefer fixture-backed or cache-backed source client
|
||||||
|
|
||||||
## Example Application
|
## Example Application
|
||||||
|
|
||||||
Use `stage-topic-phrases` to load those suggestions into the database as review items. Staging stores the candidate in `suggested_phrase` and marks the topic `pending` without changing the active `expansion_phrase`.
|
- Use `stage-topic-phrases` to load those suggestions into the database as review items. Staging stores the candidate in `suggested_phrase` and marks the topic `pending` without changing the active `expansion_phrase`.
|
||||||
Use `export-topic-phrase-reviews` to write an editable JSON template directly from the database for the currently staged suggestions. That gives you a round-trip path from DB review queue to file edits and back into `review-topic-phrases`.
|
|
||||||
Use `review-topic-phrase` to accept or reject one staged suggestion in place. Accepting a suggestion copies it into `expansion_phrase` and clears it from the staged review queue; rejecting it preserves the staged suggestion together with its review state.
|
|
||||||
Use `review-topic-phrases` when you want to apply many accept/reject decisions from one JSON file. Each item should carry `slug`, `status`, and optional `phrase` / `review_notes`.
|
|
||||||
Use `apply-topic-phrases` when you want a direct patch path instead of the staged review flow. It accepts either the raw suggestion list or an object with a `topics` list, and will apply `suggested_phrase` or `phrase` to matching topic slugs immediately.
|
|
||||||
Use `topic-phrase-reviews --phrase-review-status pending` when you want a compact audit view of unresolved staged suggestions, including both the current live phrase and the pending replacement.
|
|
||||||
Use `enrich-talkorigins` when you want to target those weak canonical entries for resolver-based metadata upgrades before retrying graph expansion on imported topic slices.
|
|
||||||
Use `review-talkorigins` when you want one JSON review artifact that combines weak canonical clusters with dry-run enrichment outcomes for manual cleanup.
|
|
||||||
Use `expand-topic` when you already have both a topic phrase and a curated topic seed set in the database: it expands outward from the topic’s existing entries, then only assigns discovered works back to that topic if they clear a topic-relevance threshold. Write-enabled assignment is stricter than preview ranking: a candidate must clear the score threshold and show a non-generic title anchor to the topic phrase, so broad methods papers do not get attached just because their abstracts or related terms overlap. On large noisy topics, prefer `--seed-key` to restrict the run to just the trusted seed entries you want to expand from, and use `--preview` first to inspect discovered candidates and relevance scores before writing anything.
|
|
||||||
|
|
||||||
Use `set-topic-phrase` to store a curated expansion phrase on the topic itself. When a stored phrase exists, `expand-topic` will use it automatically if you do not pass `--topic-phrase`. Batch bootstrap jobs can also set `topic_slug`, `topic_name`, and `topic_phrase` so curated topic metadata is created as part of the run.
|
- Use `export-topic-phrase-reviews` to write an editable JSON template directly from the database for the currently staged suggestions. That gives you a round-trip path from DB review queue to file edits and back into `review-topic-phrases`.
|
||||||
Use `topics --phrase-review-status pending` when you want to audit only topics whose staged phrase suggestions still need review.
|
- Use `review-topic-phrase` to accept or reject one staged suggestion in place. Accepting a suggestion copies it into `expansion_phrase` and clears it from the staged review queue; rejecting it preserves the staged suggestion together with its review state.
|
||||||
`--allow-unsafe-search-matches` exists only for bounded experiments on copied databases when you explicitly want to relax trust to exercise downstream expansion behavior.
|
- Use `review-topic-phrases` when you want to apply many accept/reject decisions from one JSON file. Each item should carry `slug`, `status`, and optional `phrase` / `review_notes`.
|
||||||
|
- Use `apply-topic-phrases` when you want a direct patch path instead of the staged review flow. It accepts either the raw suggestion list or an object with a `topics` list, and will apply `suggested_phrase` or `phrase` to matching topic slugs immediately.
|
||||||
|
- Use `topic-phrase-reviews --phrase-review-status pending` when you want a compact audit view of unresolved staged suggestions, including both the current live phrase and the pending replacement.
|
||||||
|
- Use `enrich-talkorigins` when you want to target those weak canonical entries for resolver-based metadata upgrades before retrying graph expansion on imported topic slices.
|
||||||
|
- Use `review-talkorigins` when you want one JSON review artifact that combines weak canonical clusters with dry-run enrichment outcomes for manual cleanup.
|
||||||
|
- Use `expand-topic` when you already have both a topic phrase and a curated topic seed set in the database: it expands outward from the topic’s existing entries, then only assigns discovered works back to that topic if they clear a topic-relevance threshold. Write-enabled assignment is stricter than preview ranking: a candidate must clear the score threshold and show a non-generic title anchor to the topic phrase, so broad methods papers do not get attached just because their abstracts or related terms overlap. On large noisy topics, prefer `--seed-key` to restrict the run to just the trusted seed entries you want to expand from, and use `--preview` first to inspect discovered candidates and relevance scores before writing anything.
|
||||||
|
|
||||||
|
- Use `set-topic-phrase` to store a curated expansion phrase on the topic itself. When a stored phrase exists, `expand-topic` will use it automatically if you do not pass `--topic-phrase`. Batch bootstrap jobs can also set `topic_slug`, `topic_name`, and `topic_phrase` so curated topic metadata is created as part of the run.
|
||||||
|
- Use `topics --phrase-review-status pending` when you want to audit only topics whose staged phrase suggestions still need review.
|
||||||
|
- `--allow-unsafe-search-matches` exists only for bounded experiments on copied databases when you explicitly want to relax trust to exercise downstream expansion behavior.
|
||||||
|
|
||||||
The TalkOrigins corpus pipeline remains in the repository as an example application rather than a core package surface. Use the example-scoped Python namespace:
|
The TalkOrigins corpus pipeline remains in the repository as an example application rather than a core package surface. Use the example-scoped Python namespace:
|
||||||
|
|
||||||
|
|
|
||||||
|
|
@ -181,6 +181,18 @@ Resolve one or more entries against remote metadata:
|
||||||
.venv/bin/python -m citegeist --db library.sqlite3 resolve langton1989artificial1 bedau2003artificial2
|
.venv/bin/python -m citegeist --db library.sqlite3 resolve langton1989artificial1 bedau2003artificial2
|
||||||
```
|
```
|
||||||
|
|
||||||
|
Preview DOI-bearing placeholder records before enriching them:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
.venv/bin/python -m citegeist --db library.sqlite3 resolve-stubs --doi-only --preview --limit 25
|
||||||
|
```
|
||||||
|
|
||||||
|
Enrich DOI-bearing placeholder records inside one topic slice:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
.venv/bin/python -m citegeist --db library.sqlite3 resolve-stubs --doi-only --topic artificial-life --limit 25
|
||||||
|
```
|
||||||
|
|
||||||
## Explore Citation Graphs
|
## Explore Citation Graphs
|
||||||
|
|
||||||
Purpose: traverse citation edges, export graph data, and render quick visualizations.
|
Purpose: traverse citation edges, export graph data, and render quick visualizations.
|
||||||
|
|
|
||||||
|
|
@ -67,6 +67,26 @@ def build_parser() -> argparse.ArgumentParser:
|
||||||
resolve_parser = subparsers.add_parser("resolve", help="Enrich stored entries from external metadata sources")
|
resolve_parser = subparsers.add_parser("resolve", help="Enrich stored entries from external metadata sources")
|
||||||
resolve_parser.add_argument("citation_keys", nargs="+", help="Citation keys to enrich")
|
resolve_parser.add_argument("citation_keys", nargs="+", help="Citation keys to enrich")
|
||||||
|
|
||||||
|
resolve_stubs_parser = subparsers.add_parser(
|
||||||
|
"resolve-stubs",
|
||||||
|
help="Find and enrich stub-like stored entries, optionally limited to DOI-bearing candidates",
|
||||||
|
)
|
||||||
|
resolve_stubs_parser.add_argument("--limit", type=int, default=25, help="Maximum candidate entries to inspect")
|
||||||
|
resolve_stubs_parser.add_argument(
|
||||||
|
"--doi-only",
|
||||||
|
action="store_true",
|
||||||
|
help="Only consider candidates that already have a DOI",
|
||||||
|
)
|
||||||
|
resolve_stubs_parser.add_argument(
|
||||||
|
"--topic",
|
||||||
|
help="Optional topic slug to limit candidate selection",
|
||||||
|
)
|
||||||
|
resolve_stubs_parser.add_argument(
|
||||||
|
"--preview",
|
||||||
|
action="store_true",
|
||||||
|
help="Show the selected candidate entries without resolving them",
|
||||||
|
)
|
||||||
|
|
||||||
graph_parser = subparsers.add_parser("graph", help="Traverse citation relations from one or more seed entries")
|
graph_parser = subparsers.add_parser("graph", help="Traverse citation relations from one or more seed entries")
|
||||||
graph_parser.add_argument("citation_keys", nargs="+", help="Seed citation keys")
|
graph_parser.add_argument("citation_keys", nargs="+", help="Seed citation keys")
|
||||||
graph_parser.add_argument(
|
graph_parser.add_argument(
|
||||||
|
|
@ -502,6 +522,8 @@ def main(argv: list[str] | None = None) -> int:
|
||||||
return _run_extract(Path(args.input), args.output)
|
return _run_extract(Path(args.input), args.output)
|
||||||
if args.command == "resolve":
|
if args.command == "resolve":
|
||||||
return _run_resolve(store, args.citation_keys)
|
return _run_resolve(store, args.citation_keys)
|
||||||
|
if args.command == "resolve-stubs":
|
||||||
|
return _run_resolve_stubs(store, args.limit, args.doi_only, args.topic, args.preview)
|
||||||
if args.command == "graph":
|
if args.command == "graph":
|
||||||
return _run_graph(
|
return _run_graph(
|
||||||
store,
|
store,
|
||||||
|
|
@ -744,22 +766,25 @@ def _run_resolve(store: BibliographyStore, citation_keys: list[str]) -> int:
|
||||||
resolver = MetadataResolver()
|
resolver = MetadataResolver()
|
||||||
exit_code = 0
|
exit_code = 0
|
||||||
for citation_key in citation_keys:
|
for citation_key in citation_keys:
|
||||||
|
if not _resolve_one(store, resolver, citation_key):
|
||||||
|
exit_code = 1
|
||||||
|
return exit_code
|
||||||
|
|
||||||
|
|
||||||
|
def _resolve_one(store: BibliographyStore, resolver: MetadataResolver, citation_key: str) -> bool:
|
||||||
existing = store.get_entry(citation_key)
|
existing = store.get_entry(citation_key)
|
||||||
if existing is None:
|
if existing is None:
|
||||||
print(f"Entry not found: {citation_key}", file=sys.stderr)
|
print(f"Entry not found: {citation_key}", file=sys.stderr)
|
||||||
exit_code = 1
|
return False
|
||||||
continue
|
|
||||||
bibtex = store.get_entry_bibtex(citation_key)
|
bibtex = store.get_entry_bibtex(citation_key)
|
||||||
if not bibtex:
|
if not bibtex:
|
||||||
print(f"Entry not renderable: {citation_key}", file=sys.stderr)
|
print(f"Entry not renderable: {citation_key}", file=sys.stderr)
|
||||||
exit_code = 1
|
return False
|
||||||
continue
|
|
||||||
current_entry = parse_bibtex(bibtex)[0]
|
current_entry = parse_bibtex(bibtex)[0]
|
||||||
resolution = resolver.resolve_entry(current_entry)
|
resolution = resolver.resolve_entry(current_entry)
|
||||||
if resolution is None:
|
if resolution is None:
|
||||||
print(f"No resolver match: {citation_key}", file=sys.stderr)
|
print(f"No resolver match: {citation_key}", file=sys.stderr)
|
||||||
exit_code = 1
|
return False
|
||||||
continue
|
|
||||||
merged, conflicts = merge_entries_with_conflicts(current_entry, resolution.entry)
|
merged, conflicts = merge_entries_with_conflicts(current_entry, resolution.entry)
|
||||||
store.replace_entry(
|
store.replace_entry(
|
||||||
citation_key,
|
citation_key,
|
||||||
|
|
@ -776,6 +801,31 @@ def _run_resolve(store: BibliographyStore, citation_keys: list[str]) -> int:
|
||||||
source_label=resolution.source_label,
|
source_label=resolution.source_label,
|
||||||
)
|
)
|
||||||
print(f"{citation_key}\t{resolution.source_label}")
|
print(f"{citation_key}\t{resolution.source_label}")
|
||||||
|
return True
|
||||||
|
|
||||||
|
|
||||||
|
def _run_resolve_stubs(
|
||||||
|
store: BibliographyStore,
|
||||||
|
limit: int,
|
||||||
|
doi_only: bool,
|
||||||
|
topic_slug: str | None,
|
||||||
|
preview: bool,
|
||||||
|
) -> int:
|
||||||
|
candidates = store.list_resolution_candidates(
|
||||||
|
limit=limit,
|
||||||
|
doi_only=doi_only,
|
||||||
|
stub_only=True,
|
||||||
|
topic_slug=topic_slug,
|
||||||
|
)
|
||||||
|
if preview:
|
||||||
|
print(json.dumps(candidates, indent=2))
|
||||||
|
return 0
|
||||||
|
|
||||||
|
resolver = MetadataResolver()
|
||||||
|
exit_code = 0
|
||||||
|
for candidate in candidates:
|
||||||
|
if not _resolve_one(store, resolver, str(candidate["citation_key"])):
|
||||||
|
exit_code = 1
|
||||||
return exit_code
|
return exit_code
|
||||||
|
|
||||||
|
|
||||||
|
|
|
||||||
|
|
@ -239,6 +239,9 @@ def merge_entries_with_conflicts(base: BibEntry, resolved: BibEntry) -> tuple[Bi
|
||||||
if not value:
|
if not value:
|
||||||
continue
|
continue
|
||||||
current_value = merged_fields.get(key, "")
|
current_value = merged_fields.get(key, "")
|
||||||
|
if _is_placeholder_value(key, current_value) and current_value != value:
|
||||||
|
merged_fields[key] = value
|
||||||
|
continue
|
||||||
if current_value and current_value != value:
|
if current_value and current_value != value:
|
||||||
conflicts.append(
|
conflicts.append(
|
||||||
{
|
{
|
||||||
|
|
@ -260,6 +263,16 @@ def merge_entries_with_conflicts(base: BibEntry, resolved: BibEntry) -> tuple[Bi
|
||||||
)
|
)
|
||||||
|
|
||||||
|
|
||||||
|
def _is_placeholder_value(field_name: str, value: str) -> bool:
|
||||||
|
normalized = " ".join((value or "").split()).strip()
|
||||||
|
if not normalized:
|
||||||
|
return True
|
||||||
|
lowered = normalized.lower()
|
||||||
|
if field_name == "title":
|
||||||
|
return bool(re.fullmatch(r"referenced work \d+", lowered)) or lowered.startswith("untitled")
|
||||||
|
return False
|
||||||
|
|
||||||
|
|
||||||
def _crossref_message_to_entry(message: dict) -> BibEntry:
|
def _crossref_message_to_entry(message: dict) -> BibEntry:
|
||||||
entry_type = _crossref_type_to_bibtype(message.get("type", "article"))
|
entry_type = _crossref_type_to_bibtype(message.get("type", "article"))
|
||||||
title_values = message.get("title", [])
|
title_values = message.get("title", [])
|
||||||
|
|
|
||||||
|
|
@ -466,6 +466,72 @@ class BibliographyStore:
|
||||||
).fetchall()
|
).fetchall()
|
||||||
return [dict(row) for row in rows]
|
return [dict(row) for row in rows]
|
||||||
|
|
||||||
|
def list_resolution_candidates(
|
||||||
|
self,
|
||||||
|
*,
|
||||||
|
limit: int = 50,
|
||||||
|
doi_only: bool = False,
|
||||||
|
stub_only: bool = False,
|
||||||
|
topic_slug: str | None = None,
|
||||||
|
) -> list[dict[str, object]]:
|
||||||
|
clauses: list[str] = []
|
||||||
|
params: list[object] = []
|
||||||
|
joins = ""
|
||||||
|
|
||||||
|
if topic_slug is not None:
|
||||||
|
joins = """
|
||||||
|
JOIN entry_topics et ON et.entry_id = e.id
|
||||||
|
JOIN topics t ON t.id = et.topic_id
|
||||||
|
"""
|
||||||
|
clauses.append("t.slug = ?")
|
||||||
|
params.append(topic_slug)
|
||||||
|
|
||||||
|
if doi_only:
|
||||||
|
clauses.append("e.doi IS NOT NULL AND TRIM(e.doi) <> ''")
|
||||||
|
|
||||||
|
if stub_only:
|
||||||
|
clauses.append(
|
||||||
|
"""
|
||||||
|
(
|
||||||
|
e.title IS NULL
|
||||||
|
OR TRIM(e.title) = ''
|
||||||
|
OR LOWER(TRIM(e.title)) GLOB 'referenced work *'
|
||||||
|
OR LOWER(TRIM(e.title)) GLOB 'untitled*'
|
||||||
|
OR (
|
||||||
|
e.entry_type = 'misc'
|
||||||
|
AND (
|
||||||
|
e.abstract IS NULL
|
||||||
|
OR TRIM(e.abstract) = ''
|
||||||
|
)
|
||||||
|
)
|
||||||
|
)
|
||||||
|
"""
|
||||||
|
)
|
||||||
|
|
||||||
|
where_clause = ""
|
||||||
|
if clauses:
|
||||||
|
where_clause = "WHERE " + " AND ".join(clauses)
|
||||||
|
|
||||||
|
rows = self.connection.execute(
|
||||||
|
f"""
|
||||||
|
SELECT DISTINCT
|
||||||
|
e.citation_key,
|
||||||
|
e.entry_type,
|
||||||
|
e.review_status,
|
||||||
|
e.title,
|
||||||
|
e.year,
|
||||||
|
e.doi,
|
||||||
|
e.abstract
|
||||||
|
FROM entries e
|
||||||
|
{joins}
|
||||||
|
{where_clause}
|
||||||
|
ORDER BY COALESCE(e.year, ''), e.citation_key
|
||||||
|
LIMIT ?
|
||||||
|
""",
|
||||||
|
(*params, limit),
|
||||||
|
).fetchall()
|
||||||
|
return [dict(row) for row in rows]
|
||||||
|
|
||||||
def ensure_topic(
|
def ensure_topic(
|
||||||
self,
|
self,
|
||||||
slug: str,
|
slug: str,
|
||||||
|
|
|
||||||
|
|
@ -154,6 +154,87 @@ def test_cli_resolve_updates_entry(tmp_path: Path):
|
||||||
assert payload["field_conflicts"][0]["field_name"] == "title"
|
assert payload["field_conflicts"][0]["field_name"] == "title"
|
||||||
|
|
||||||
|
|
||||||
|
def test_cli_resolve_stubs_preview_lists_doi_stub_candidates(tmp_path: Path):
|
||||||
|
bib_path = tmp_path / "input.bib"
|
||||||
|
bib_path.write_text(
|
||||||
|
"""
|
||||||
|
@misc{stubdoi,
|
||||||
|
title = {Referenced work 6},
|
||||||
|
doi = {10.1200/JCO.2002.04.117},
|
||||||
|
url = {https://doi.org/10.1200/JCO.2002.04.117}
|
||||||
|
}
|
||||||
|
|
||||||
|
@article{complete,
|
||||||
|
author = {Smith, Jane},
|
||||||
|
title = {Complete Record},
|
||||||
|
year = {2024},
|
||||||
|
doi = {10.1000/complete}
|
||||||
|
}
|
||||||
|
""",
|
||||||
|
encoding="utf-8",
|
||||||
|
)
|
||||||
|
assert run_cli(tmp_path, "ingest", str(bib_path)).returncode == 0
|
||||||
|
|
||||||
|
result = run_cli(tmp_path, "resolve-stubs", "--doi-only", "--preview", "--limit", "10")
|
||||||
|
assert result.returncode == 0
|
||||||
|
payload = json.loads(result.stdout)
|
||||||
|
assert [row["citation_key"] for row in payload] == ["stubdoi"]
|
||||||
|
assert payload[0]["title"] == "Referenced work 6"
|
||||||
|
|
||||||
|
|
||||||
|
def test_cli_resolve_stubs_enriches_matching_candidates(tmp_path: Path):
|
||||||
|
bib_path = tmp_path / "input.bib"
|
||||||
|
bib_path.write_text(
|
||||||
|
"""
|
||||||
|
@misc{stubdoi,
|
||||||
|
title = {Referenced work 6},
|
||||||
|
doi = {10.1200/JCO.2002.04.117},
|
||||||
|
url = {https://doi.org/10.1200/JCO.2002.04.117}
|
||||||
|
}
|
||||||
|
""",
|
||||||
|
encoding="utf-8",
|
||||||
|
)
|
||||||
|
assert run_cli(tmp_path, "ingest", str(bib_path)).returncode == 0
|
||||||
|
|
||||||
|
from citegeist.bibtex import BibEntry
|
||||||
|
from citegeist.resolve import Resolution
|
||||||
|
|
||||||
|
database = tmp_path / "library.sqlite3"
|
||||||
|
|
||||||
|
with patch("citegeist.cli.MetadataResolver.resolve_entry") as mocked_resolve:
|
||||||
|
mocked_resolve.return_value = Resolution(
|
||||||
|
entry=BibEntry(
|
||||||
|
entry_type="article",
|
||||||
|
citation_key="resolvedkey",
|
||||||
|
fields={
|
||||||
|
"author": "Doe, Alex",
|
||||||
|
"title": "Resolved Work",
|
||||||
|
"year": "2002",
|
||||||
|
"doi": "10.1200/JCO.2002.04.117",
|
||||||
|
"journal": "Journal of Clinical Oncology",
|
||||||
|
},
|
||||||
|
),
|
||||||
|
source_type="resolver",
|
||||||
|
source_label="crossref:doi:10.1200/JCO.2002.04.117",
|
||||||
|
)
|
||||||
|
exit_code = main(
|
||||||
|
[
|
||||||
|
"--db",
|
||||||
|
str(database),
|
||||||
|
"resolve-stubs",
|
||||||
|
"--doi-only",
|
||||||
|
"--limit",
|
||||||
|
"10",
|
||||||
|
]
|
||||||
|
)
|
||||||
|
|
||||||
|
assert exit_code == 0
|
||||||
|
show = run_cli(tmp_path, "show", "stubdoi")
|
||||||
|
payload = json.loads(show.stdout)
|
||||||
|
assert payload["title"] == "Resolved Work"
|
||||||
|
assert payload["review_status"] == "enriched"
|
||||||
|
|
||||||
|
|
||||||
def test_cli_resolve_conflicts_updates_status(tmp_path: Path):
|
def test_cli_resolve_conflicts_updates_status(tmp_path: Path):
|
||||||
bib_path = tmp_path / "input.bib"
|
bib_path = tmp_path / "input.bib"
|
||||||
bib_path.write_text(
|
bib_path.write_text(
|
||||||
|
|
|
||||||
|
|
@ -108,6 +108,25 @@ def test_merge_entries_with_conflicts_records_disagreements():
|
||||||
]
|
]
|
||||||
|
|
||||||
|
|
||||||
|
def test_merge_entries_replaces_placeholder_titles_without_conflict():
|
||||||
|
base = BibEntry(
|
||||||
|
entry_type="misc",
|
||||||
|
citation_key="stubdoi",
|
||||||
|
fields={"title": "Referenced work 6", "doi": "10.1200/JCO.2002.04.117"},
|
||||||
|
)
|
||||||
|
resolved = BibEntry(
|
||||||
|
entry_type="article",
|
||||||
|
citation_key="resolved",
|
||||||
|
fields={"title": "Resolved Work", "journal": "Journal of Clinical Oncology"},
|
||||||
|
)
|
||||||
|
|
||||||
|
merged, conflicts = merge_entries_with_conflicts(base, resolved)
|
||||||
|
|
||||||
|
assert merged.fields["title"] == "Resolved Work"
|
||||||
|
assert merged.fields["journal"] == "Journal of Clinical Oncology"
|
||||||
|
assert conflicts == []
|
||||||
|
|
||||||
|
|
||||||
def test_resolver_tries_doi_before_dblp():
|
def test_resolver_tries_doi_before_dblp():
|
||||||
resolver = MetadataResolver()
|
resolver = MetadataResolver()
|
||||||
calls: list[tuple[str, str]] = []
|
calls: list[tuple[str, str]] = []
|
||||||
|
|
|
||||||
|
|
@ -281,6 +281,46 @@ def test_store_can_set_topic_expansion_phrase():
|
||||||
store.close()
|
store.close()
|
||||||
|
|
||||||
|
|
||||||
|
def test_store_lists_stub_resolution_candidates():
|
||||||
|
store = BibliographyStore()
|
||||||
|
try:
|
||||||
|
store.ingest_bibtex(
|
||||||
|
"""
|
||||||
|
@misc{stubdoi,
|
||||||
|
title = {Referenced work 6},
|
||||||
|
doi = {10.1200/JCO.2002.04.117},
|
||||||
|
url = {https://doi.org/10.1200/JCO.2002.04.117}
|
||||||
|
}
|
||||||
|
|
||||||
|
@article{complete,
|
||||||
|
author = {Smith, Jane},
|
||||||
|
title = {Complete Record},
|
||||||
|
year = {2024},
|
||||||
|
doi = {10.1000/complete}
|
||||||
|
}
|
||||||
|
"""
|
||||||
|
)
|
||||||
|
store.add_entry_topic(
|
||||||
|
"stubdoi",
|
||||||
|
topic_slug="artificial-life",
|
||||||
|
topic_name="Artificial life",
|
||||||
|
source_label="test",
|
||||||
|
)
|
||||||
|
|
||||||
|
candidates = store.list_resolution_candidates(limit=10, doi_only=True, stub_only=True)
|
||||||
|
assert [row["citation_key"] for row in candidates] == ["stubdoi"]
|
||||||
|
|
||||||
|
topic_candidates = store.list_resolution_candidates(
|
||||||
|
limit=10,
|
||||||
|
doi_only=True,
|
||||||
|
stub_only=True,
|
||||||
|
topic_slug="artificial-life",
|
||||||
|
)
|
||||||
|
assert [row["citation_key"] for row in topic_candidates] == ["stubdoi"]
|
||||||
|
finally:
|
||||||
|
store.close()
|
||||||
|
|
||||||
|
|
||||||
def test_store_can_stage_and_review_topic_phrase_suggestion():
|
def test_store_can_stage_and_review_topic_phrase_suggestion():
|
||||||
store = BibliographyStore()
|
store = BibliographyStore()
|
||||||
try:
|
try:
|
||||||
|
|
|
||||||
Loading…
Reference in New Issue