Verification added to process.

This commit is contained in:
welsberr 2026-03-29 05:09:18 -04:00
parent fc7e1f1844
commit 35d8eb8386
8 changed files with 638 additions and 3 deletions

1
.gitignore vendored
View File

@ -3,4 +3,5 @@ __pycache__/
.venv/
.cache/
*.pyc
*.egg-info/
library.sqlite3

View File

@ -48,6 +48,7 @@ The initial repo includes:
- a small CLI for ingest, search, inspection, and export;
- review-state tracking on entries, per-field ingest provenance, and field-level conflict review;
- plaintext reference extraction into draft BibTeX for numbered, APA-like, wrapped-line, and simple book-style references;
- standalone verification and disambiguation of free-text references or partial BibTeX into auditable BibTeX/JSON results with `x_status`, `x_confidence`, `x_source`, `x_query`, and alternate-candidate traces;
- identifier-first metadata resolution for DOI, OpenAlex, DBLP, arXiv, and DataCite-backed entries, with OpenAlex/DataCite title-search fallback;
- local citation-graph traversal over stored `cites`, `cited_by`, and `crossref` edges;
- Crossref- and OpenAlex-backed graph expansion that materializes draft related works and edge provenance;
@ -132,6 +133,8 @@ PYTHONPATH=src .venv/bin/python -m citegeist --db library.sqlite3 apply-conflict
PYTHONPATH=src .venv/bin/python -m citegeist --db library.sqlite3 bootstrap --seed-bib seed.bib --topic "bayesian nonparametrics"
PYTHONPATH=src .venv/bin/python -m citegeist --db library.sqlite3 bootstrap --topic "bayesian nonparametrics" --preview --topic-commit-limit 5
PYTHONPATH=src .venv/bin/python -m citegeist extract references.txt --output draft.bib
PYTHONPATH=src .venv/bin/python -m citegeist verify --string '"Graph-first bibliography augmentation" Smith 2024' --context "citation graphs" --format json
PYTHONPATH=src .venv/bin/python -m citegeist verify --bib draft.bib --output verified.bib
PYTHONPATH=src .venv/bin/python -m citegeist --db library.sqlite3 resolve smith2024graphs
PYTHONPATH=src .venv/bin/python -m citegeist --db library.sqlite3 resolve-stubs --doi-only --preview --limit 25
PYTHONPATH=src .venv/bin/python -m citegeist --db library.sqlite3 resolve-stubs --doi-only --all-misc --limit 25
@ -167,6 +170,13 @@ OpenAlex expansion is also conservative about noisy secondary records. Discoveri
For live-source development, prefer fixture-backed or cache-backed source clients so resolver and expansion work can be exercised repeatedly without re-hitting upstream APIs on every run.
## Adopted Ideas From Earlier Repos
`citegeist` now absorbs two useful patterns from adjacent bibliography tools while keeping them inside the main Python 3 package boundary:
- From `VeriBib`: a standalone `verify` workflow for ambiguous strings or rough BibTeX, with explicit confidence/status audit fields and alternate-candidate traces before you commit changes to the main library.
- From `TOA-Bib-Updater`: resumable, artifact-oriented corpus processing remains the preferred process model for large imports. In practice this already appears in the TalkOrigins example pipeline through saved manifests, review exports, duplicate reports, and staged topic-phrase review flows.
## Example Application
- Use `stage-topic-phrases` to load those suggestions into the database as review items. Staging stores the candidate in `suggested_phrase` and marks the topic `pending` without changing the active `expansion_phrase`.

View File

@ -25,8 +25,22 @@ Completed:
- lightweight BibTeX parsing;
- SQLite storage for entries, creators, identifiers, and relations;
- local text search using SQLite FTS5 when available;
- standalone verification/disambiguation output for free-text references and partial BibTeX with auditable match metadata;
- tests for ingest, relation storage, and search.
## Comparison Notes From Related Repos
The adjacent `TOA-Bib-Updater` and `VeriBib` repositories are useful prior art, but they contribute different things:
- `VeriBib` contributes a good pre-ingest verification pattern: inspect ambiguous strings or partial BibTeX, rank candidates from legal metadata sources, and emit explicit audit fields instead of silently trusting a single match.
- `TOA-Bib-Updater` contributes process discipline more than core data modeling: resumable long-running jobs, preserved source artifacts, and generated review outputs for manual inspection.
`citegeist` should absorb those ideas where they improve the main local research workflow:
1. keep verification and auditability in the core package, not just entry resolution after ingest;
2. keep resumable manifests and review exports for large acquisition workflows, especially example pipelines and batch imports;
3. avoid coupling the core model to brittle source-specific scraping logic.
## Phase 1: Core Ingestion And Export
Priority: P0
@ -67,7 +81,8 @@ Tasks:
- support ingestion of OCR- or PDF-derived plaintext bibliography sections;
- add normalization for author names, years, title casing, and page ranges;
- prefer sentence-boundary venue detection over naive keyword splits so title text containing words like `report` is not truncated;
- repair partially extracted venue stubs such as `Occas.` or `Proc.` by reparsing the full raw reference line when the structured fields are obviously incomplete;
- repair partially extracted venue stubs such as `Occas.` or `Proc.` by reparsing the full raw reference line when the structured fields are obviously
incomplete;
- preserve improved local draft parses even when remote enrichment remains unresolved, so later parser fixes can refresh stored BibTeX without requiring a successful metadata match;
- build gold-test fixtures from real, messy reference examples.
@ -122,7 +137,6 @@ Tasks:
- expose unresolved nodes so the user can decide what to enrich next.
Why this matters:
- this is central to literature discovery rather than mere bibliography cleanup;
- it turns the database into a research navigation tool.
@ -164,7 +178,6 @@ Goal:
Broaden source acquisition without mixing that complexity into the core model.
Tasks:
- add source adapters for open-access theses and dissertation repositories;
- add support for harvesting publisher citation pages and preprint metadata pages;
- define per-source import provenance and rate-limit behavior;

View File

@ -7,12 +7,14 @@ from .harvest import OaiMetadataFormat, OaiPmhHarvester, OaiSet
from .resolve import MetadataResolver, merge_entries, merge_entries_with_conflicts
from .sources import SourceClient
from .storage import BibliographyStore
from .verify import BibliographyVerifier, VerificationResult, VerificationMatch
__all__ = [
"BibEntry",
"BatchBootstrapRunner",
"BatchJobResult",
"BibliographyStore",
"BibliographyVerifier",
"BootstrapResult",
"Bootstrapper",
"CrossrefExpander",
@ -22,6 +24,8 @@ __all__ = [
"OaiMetadataFormat",
"OaiSet",
"SourceClient",
"VerificationMatch",
"VerificationResult",
"extract_references",
"load_batch_jobs",
"merge_entries",

View File

@ -16,6 +16,7 @@ from .extract import extract_references
from .harvest import OaiPmhHarvester
from .resolve import MetadataResolver, merge_entries_with_conflicts
from .storage import BibliographyStore
from .verify import BibliographyVerifier, render_verification_results
def build_parser() -> argparse.ArgumentParser:
@ -69,6 +70,24 @@ def build_parser() -> argparse.ArgumentParser:
extract_parser.add_argument("input", help="Plaintext file containing bibliography-style references")
extract_parser.add_argument("--output", help="Write extracted BibTeX to a file instead of stdout")
verify_parser = subparsers.add_parser(
"verify",
help="Verify or disambiguate free-text references or BibTeX entries without modifying the database",
)
verify_group = verify_parser.add_mutually_exclusive_group(required=True)
verify_group.add_argument("--string", help="Single free-text reference query")
verify_group.add_argument("--list", dest="list_input", help="Path to a text file with one query per line")
verify_group.add_argument("--bib", help="Path to a BibTeX file whose entries should be verified")
verify_parser.add_argument("--context", default="", help="Optional topic context used for scoring")
verify_parser.add_argument("--limit", type=int, default=5, help="Maximum candidates to inspect per input")
verify_parser.add_argument(
"--format",
choices=["bibtex", "json"],
default="bibtex",
help="Output format for verification results",
)
verify_parser.add_argument("--output", help="Write verification results to a file instead of stdout")
resolve_parser = subparsers.add_parser("resolve", help="Enrich stored entries from external metadata sources")
resolve_parser.add_argument("citation_keys", nargs="+", help="Citation keys to enrich")
@ -535,6 +554,8 @@ def main(argv: list[str] | None = None) -> int:
return _run_apply_conflict(store, args.citation_key, args.field_name)
if args.command == "extract":
return _run_extract(Path(args.input), args.output)
if args.command == "verify":
return _run_verify(args.string, args.list_input, args.bib, args.context, args.limit, args.format, args.output)
if args.command == "resolve":
return _run_resolve(store, args.citation_keys)
if args.command == "resolve-stubs":
@ -783,6 +804,36 @@ def _run_extract(input_path: Path, output: str | None) -> int:
return 0
def _run_verify(
string_input: str | None,
list_input: str | None,
bib_input: str | None,
context: str,
limit: int,
output_format: str,
output: str | None,
) -> int:
verifier = BibliographyVerifier()
if string_input is not None:
results = [verifier.verify_string(string_input, context=context, limit=limit)]
elif list_input is not None:
values = [line.strip() for line in Path(list_input).read_text(encoding="utf-8").splitlines() if line.strip()]
results = verifier.verify_strings(values, context=context, limit=limit)
elif bib_input is not None:
results = verifier.verify_bib_file(bib_input, context=context, limit=limit)
else:
print("verify requires one input source", file=sys.stderr)
return 1
rendered = render_verification_results(results, output_format)
if output:
Path(output).write_text(rendered + ("\n" if rendered and not rendered.endswith("\n") else ""), encoding="utf-8")
else:
if rendered:
print(rendered)
return 0
def _print_progress(label: str, index: int, total: int, detail: str | None = None) -> None:
message = f"[{index}/{total}] {label}"
if detail:

358
src/citegeist/verify.py Normal file
View File

@ -0,0 +1,358 @@
from __future__ import annotations
import json
import re
from dataclasses import dataclass
from pathlib import Path
from .bibtex import BibEntry, parse_bibtex, render_bibtex
from .resolve import MetadataResolver, Resolution
@dataclass(slots=True)
class VerificationMatch:
entry: BibEntry
score: float
source_label: str
@dataclass(slots=True)
class VerificationResult:
query: str
context: str
status: str
confidence: float
entry: BibEntry
source_label: str
alternates: list[VerificationMatch]
input_type: str
input_key: str | None = None
def to_bib_entry(self) -> BibEntry:
fields = dict(self.entry.fields)
fields["x_status"] = self.status
fields["x_confidence"] = f"{self.confidence:.2f}"
fields["x_source"] = self.source_label
fields["x_query"] = self.query
fields["x_context"] = self.context
if self.input_type == "bib" and self.input_key:
fields["x_input_key"] = self.input_key
if self.alternates:
fields["x_alternates"] = " || ".join(
_serialize_alternate(match) for match in self.alternates
)
return BibEntry(
entry_type=self.entry.entry_type,
citation_key=self.entry.citation_key,
fields=fields,
)
def to_dict(self) -> dict[str, object]:
return {
"query": self.query,
"context": self.context,
"input_type": self.input_type,
"input_key": self.input_key,
"status": self.status,
"confidence": round(self.confidence, 4),
"source_label": self.source_label,
"entry": {
"citation_key": self.entry.citation_key,
"entry_type": self.entry.entry_type,
"fields": dict(self.entry.fields),
},
"alternates": [
{
"citation_key": match.entry.citation_key,
"entry_type": match.entry.entry_type,
"score": round(match.score, 4),
"source_label": match.source_label,
"fields": dict(match.entry.fields),
}
for match in self.alternates
],
}
class BibliographyVerifier:
def __init__(self, resolver: MetadataResolver | None = None) -> None:
self.resolver = resolver or MetadataResolver()
def verify_string(self, value: str, context: str = "", limit: int = 5) -> VerificationResult:
query_fields = _fields_from_string(value)
return self._verify_query(
query_fields,
query=value,
context=context,
limit=limit,
input_type="string",
)
def verify_bib_entry(self, entry: BibEntry, context: str = "", limit: int = 5) -> VerificationResult:
query = " ".join(
part
for part in (
entry.fields.get("doi", ""),
entry.fields.get("title", ""),
entry.fields.get("author", ""),
entry.fields.get("year", ""),
)
if part
).strip()
query_fields = {
"title": entry.fields.get("title", ""),
"authors": _split_authors(entry.fields.get("author", "")),
"year": entry.fields.get("year", ""),
"venue": entry.fields.get("journal", "") or entry.fields.get("booktitle", ""),
}
return self._verify_query(
query_fields,
query=query or entry.citation_key,
context=context,
limit=limit,
input_type="bib",
input_key=entry.citation_key,
source_entry=entry,
)
def verify_strings(self, values: list[str], context: str = "", limit: int = 5) -> list[VerificationResult]:
return [self.verify_string(value, context=context, limit=limit) for value in values if value.strip()]
def verify_bib_file(self, path: str | Path, context: str = "", limit: int = 5) -> list[VerificationResult]:
entries = parse_bibtex(Path(path).read_text(encoding="utf-8"))
return [self.verify_bib_entry(entry, context=context, limit=limit) for entry in entries]
def _verify_query(
self,
query_fields: dict[str, object],
*,
query: str,
context: str,
limit: int,
input_type: str,
input_key: str | None = None,
source_entry: BibEntry | None = None,
) -> VerificationResult:
if source_entry is not None and source_entry.fields.get("doi"):
direct = self.resolver.resolve_doi(source_entry.fields["doi"]) or self.resolver.resolve_datacite_doi(
source_entry.fields["doi"]
)
if direct is not None:
return VerificationResult(
query=query,
context=context,
status="exact",
confidence=1.0,
entry=direct.entry,
source_label=direct.source_label,
alternates=[],
input_type=input_type,
input_key=input_key,
)
candidate_limit = max(1, limit)
candidates = self._collect_candidates(
title=str(query_fields.get("title", "")),
query=query,
limit=candidate_limit,
)
scored = [
VerificationMatch(
entry=entry,
score=_score_candidate(query_fields, context, entry),
source_label=source_label,
)
for entry, source_label in candidates
]
scored.sort(
key=lambda item: (
-item.score,
item.entry.fields.get("year", ""),
item.entry.citation_key,
)
)
best = scored[0] if scored else None
if best is None:
fallback_entry = source_entry or _placeholder_entry(query_fields, query, input_key)
return VerificationResult(
query=query,
context=context,
status="not_found",
confidence=0.0,
entry=fallback_entry,
source_label="none",
alternates=[],
input_type=input_type,
input_key=input_key,
)
status = _status_from_match(best)
return VerificationResult(
query=query,
context=context,
status=status,
confidence=best.score,
entry=best.entry,
source_label=best.source_label,
alternates=scored[1: min(len(scored), 4)],
input_type=input_type,
input_key=input_key,
)
def _collect_candidates(self, *, title: str, query: str, limit: int) -> list[tuple[BibEntry, str]]:
candidates: list[tuple[BibEntry, str]] = []
seen: set[str] = set()
search_title = title or query
for source_name, source_entries in (
("crossref", self.resolver.search_crossref(search_title, limit=limit)),
("openalex", self.resolver.search_openalex(search_title, limit=limit)),
("datacite", self.resolver.search_datacite(search_title, limit=limit)),
):
for entry in source_entries:
signature = _candidate_signature(entry)
if signature in seen:
continue
seen.add(signature)
candidates.append((entry, f"{source_name}:search:{search_title}"))
return candidates
def render_verification_results(results: list[VerificationResult], output_format: str) -> str:
if output_format == "json":
return json.dumps([result.to_dict() for result in results], indent=2)
return render_bibtex([result.to_bib_entry() for result in results])
def _fields_from_string(value: str) -> dict[str, object]:
year_match = re.search(r"\b(1[6-9]\d{2}|20\d{2}|21\d{2})\b", value)
year = year_match.group(1) if year_match else ""
quoted_title = re.search(r"[\"“”‘’'`](.+?)[\"“”‘’'`]", value)
title = quoted_title.group(1).strip() if quoted_title else ""
author_source = value
if quoted_title:
author_source = author_source.replace(quoted_title.group(0), " ")
if year:
author_source = author_source.replace(year, " ")
author_tokens = [token.strip(",.;:") for token in author_source.split() if token.strip(",.;:")]
authors: list[str] = [author_tokens[0]] if author_tokens else []
return {"title": title, "authors": authors, "year": year, "venue": ""}
def _score_candidate(query_fields: dict[str, object], context: str, entry: BibEntry) -> float:
score = 0.0
query_title = _tokenize(str(query_fields.get("title", "")))
candidate_title = _tokenize(entry.fields.get("title", ""))
if query_title:
overlap = len(query_title & candidate_title) / max(1, len(query_title))
if overlap >= 0.9:
score += 0.55
elif overlap >= 0.7:
score += 0.40
elif overlap >= 0.5:
score += 0.20
query_authors = [author for author in query_fields.get("authors", []) if author]
if query_authors:
query_surname = _surname(query_authors[0])
candidate_surname = _surname(_split_authors(entry.fields.get("author", ""))[0]) if entry.fields.get("author") else ""
if query_surname and query_surname == candidate_surname:
score += 0.25
query_year = str(query_fields.get("year", "")).strip()
candidate_year = entry.fields.get("year", "").strip()
if query_year and candidate_year:
if query_year == candidate_year:
score += 0.15
else:
try:
delta = abs(int(query_year) - int(candidate_year))
if delta == 1:
score += 0.07
except ValueError:
pass
query_venue = str(query_fields.get("venue", "")).strip()
candidate_venue = entry.fields.get("journal", "").strip() or entry.fields.get("booktitle", "").strip()
if query_venue and candidate_venue and _normalize(query_venue) == _normalize(candidate_venue):
score += 0.05
if context:
context_tokens = _tokenize(context)
abstract_tokens = _tokenize(entry.fields.get("abstract", ""))
if context_tokens & abstract_tokens:
score += 0.05
return min(score, 1.0)
def _status_from_match(match: VerificationMatch) -> str:
if match.entry.fields.get("doi") and match.score >= 0.95:
return "exact"
if match.score >= 0.75:
return "high_confidence"
return "ambiguous"
def _split_authors(value: str) -> list[str]:
return [part.strip() for part in value.split(" and ") if part.strip()]
def _surname(value: str) -> str:
text = value.strip()
if not text:
return ""
if "," in text:
return text.split(",", 1)[0].strip().lower()
return text.split()[-1].strip().lower()
def _tokenize(value: str) -> set[str]:
return {token for token in re.split(r"\W+", value.lower()) if token}
def _normalize(value: str) -> str:
return " ".join(value.lower().split())
def _serialize_alternate(match: VerificationMatch) -> str:
authors = _split_authors(match.entry.fields.get("author", ""))
first_author = authors[0] if authors else ""
return "|".join(
(
match.entry.fields.get("doi", ""),
match.entry.fields.get("title", ""),
first_author,
match.entry.fields.get("year", ""),
f"{match.score:.2f}",
)
)
def _candidate_signature(entry: BibEntry) -> str:
return "|".join(
(
entry.fields.get("doi", "").lower(),
_normalize(entry.fields.get("title", "")),
entry.fields.get("year", ""),
)
)
def _placeholder_entry(query_fields: dict[str, object], query: str, input_key: str | None) -> BibEntry:
title = str(query_fields.get("title", "")) or query
authors = query_fields.get("authors", [])
year = str(query_fields.get("year", ""))
citation_key = input_key or _slugify_key(title or query)
fields = {"title": title}
if authors:
fields["author"] = " and ".join(str(author) for author in authors)
if year:
fields["year"] = year
return BibEntry(entry_type="misc", citation_key=citation_key, fields=fields)
def _slugify_key(value: str) -> str:
slug = re.sub(r"[^a-z0-9]+", "", value.lower())
return slug[:40] or "verification"

View File

@ -138,6 +138,115 @@ def test_cli_provenance_and_status_updates(tmp_path: Path):
assert "reviewed" in status.stdout
def test_cli_verify_string_outputs_json_with_audit_fields(tmp_path: Path):
from citegeist.bibtex import BibEntry
database = tmp_path / "library.sqlite3"
with patch("citegeist.cli.BibliographyVerifier.verify_string") as mocked_verify:
from citegeist.verify import VerificationResult
mocked_verify.return_value = VerificationResult(
query='"Graph-first bibliography augmentation" Smith 2024',
context="citation graphs",
status="high_confidence",
confidence=0.82,
entry=BibEntry(
entry_type="article",
citation_key="smith2024graphs",
fields={
"author": "Smith, Jane",
"title": "Graph-first bibliography augmentation",
"year": "2024",
"doi": "10.1000/example-doi",
},
),
source_label="crossref:search:Graph-first bibliography augmentation",
alternates=[],
input_type="string",
input_key=None,
)
stdout_buffer = io.StringIO()
with redirect_stdout(stdout_buffer):
exit_code = main(
[
"--db",
str(database),
"verify",
"--string",
'"Graph-first bibliography augmentation" Smith 2024',
"--context",
"citation graphs",
"--format",
"json",
]
)
assert exit_code == 0
payload = json.loads(stdout_buffer.getvalue())
assert payload[0]["status"] == "high_confidence"
assert payload[0]["source_label"] == "crossref:search:Graph-first bibliography augmentation"
assert payload[0]["entry"]["citation_key"] == "smith2024graphs"
def test_cli_verify_bib_outputs_json(tmp_path: Path):
bib_path = tmp_path / "partial.bib"
bib_path.write_text(
"""
@misc{roughentry,
title = {Graph-first bibliography augmentation},
year = {2024}
}
""",
encoding="utf-8",
)
with patch("citegeist.cli.BibliographyVerifier.verify_bib_file") as mocked_verify:
from citegeist.bibtex import BibEntry
from citegeist.verify import VerificationResult
mocked_verify.return_value = [
VerificationResult(
query="Graph-first bibliography augmentation 2024",
context="",
status="ambiguous",
confidence=0.61,
entry=BibEntry(
entry_type="article",
citation_key="candidate2024",
fields={
"title": "Graph-first bibliography augmentation",
"year": "2024",
},
),
source_label="openalex:search:Graph-first bibliography augmentation",
alternates=[],
input_type="bib",
input_key="roughentry",
)
]
stdout_buffer = io.StringIO()
with redirect_stdout(stdout_buffer):
exit_code = main(
[
"--db",
str(tmp_path / "library.sqlite3"),
"verify",
"--bib",
str(bib_path),
"--format",
"json",
]
)
assert exit_code == 0
payload = json.loads(stdout_buffer.getvalue())
assert payload[0]["status"] == "ambiguous"
assert payload[0]["input_key"] == "roughentry"
assert payload[0]["entry"]["citation_key"] == "candidate2024"
def test_cli_resolve_updates_entry(tmp_path: Path):
bib_path = tmp_path / "input.bib"
bib_path.write_text(

89
tests/test_verify.py Normal file
View File

@ -0,0 +1,89 @@
from __future__ import annotations
from citegeist.bibtex import BibEntry
from citegeist.resolve import Resolution
from citegeist.verify import BibliographyVerifier
def test_verifier_uses_direct_doi_resolution_for_bib_entries():
verifier = BibliographyVerifier()
verifier.resolver.resolve_doi = lambda value: Resolution( # type: ignore[method-assign]
entry=BibEntry(
entry_type="article",
citation_key="doi101000example",
fields={
"author": "Smith, Jane",
"title": "Resolved Work",
"year": "2024",
"doi": value,
},
),
source_type="resolver",
source_label=f"crossref:doi:{value}",
)
result = verifier.verify_bib_entry(
BibEntry(
entry_type="misc",
citation_key="seed2024",
fields={"title": "Rough Work", "doi": "10.1000/example"},
)
)
assert result.status == "exact"
assert result.confidence == 1.0
assert result.entry.fields["title"] == "Resolved Work"
assert result.source_label == "crossref:doi:10.1000/example"
def test_verifier_scores_and_sorts_search_candidates():
verifier = BibliographyVerifier()
verifier.resolver.search_crossref = lambda title, limit=5: [ # type: ignore[method-assign]
BibEntry(
entry_type="article",
citation_key="goodmatch",
fields={
"author": "Smith, Jane",
"title": "Graph-first bibliography augmentation",
"year": "2024",
"doi": "10.1000/good",
},
),
BibEntry(
entry_type="article",
citation_key="weaker",
fields={
"author": "Doe, Alex",
"title": "Graph search methods",
"year": "2023",
},
),
]
verifier.resolver.search_openalex = lambda title, limit=5: [] # type: ignore[method-assign]
verifier.resolver.search_datacite = lambda title, limit=5: [] # type: ignore[method-assign]
result = verifier.verify_string('"Graph-first bibliography augmentation" Smith 2024')
assert result.entry.citation_key == "goodmatch"
assert result.status in {"high_confidence", "exact"}
assert result.alternates[0].entry.citation_key == "weaker"
def test_verification_result_to_bib_entry_contains_audit_fields():
verifier = BibliographyVerifier()
verifier.resolver.search_crossref = lambda title, limit=5: [] # type: ignore[method-assign]
verifier.resolver.search_openalex = lambda title, limit=5: [] # type: ignore[method-assign]
verifier.resolver.search_datacite = lambda title, limit=5: [] # type: ignore[method-assign]
result = verifier._verify_query( # type: ignore[attr-defined]
{"title": "Missing Work", "authors": [], "year": "", "venue": ""},
query="Missing Work",
context="",
limit=1,
input_type="string",
)
bib_entry = result.to_bib_entry()
assert bib_entry.fields["x_status"] == "not_found"
assert bib_entry.fields["x_query"] == "Missing Work"