CiteGeist/docs/phase-completion.md

3.6 KiB

Sources-First Status

Current Focus: identify and prioritize the next open bibliographic sources to add, using the existing SQLite-based workflow as the baseline.


Phase Matrix

Phase Title Status Outcome
0 Scope Reframe Complete Planning now answers the source question directly
1 Source Layer Tightening Complete Registry, CrossRef plugin, compatibility seam, and source catalog repaired
2 Next Open Source Additions 🚧 In Progress OpenCitations, Unpaywall, and Europe PMC integrated
3 Optional Source Evaluation Planned OpenAIRE evaluated later if acquisition breadth matters
D Database / Vector Expansion ⏸ Deferred Not required for the current source-incorporation decision

Test Coverage Summary

✅ test_sources_plugin.py
✅ test_sources_catalog.py
✅ existing full suite still expected to pass

Key Artifacts

Documentation

docs/
├── source-landscape.md          ✅ Source inventory and recommendations
├── implementation-progress.md   ✅ Sources-first progress tracker
└── phase-completion.md          ✅ Short status summary

Source Layer

src/citegeist/sources/
├── base.py                      ✅ Base source interface
├── catalog.py                   ✅ Source inventory in code
├── registry.py                  ✅ Registry for known source classes
├── crossref.py                  ✅ Repaired CrossRef plugin
└── _old_sources_compat.py       ✅ Repo-relative compatibility bridge

Tests

tests/
├── test_sources_plugin.py       ✅ Source plugin tests
└── test_sources_catalog.py      ✅ Source catalog/registry tests

Key Features Implemented

  • Source catalog covering current and candidate open sources
  • Config-driven registry loading for known real source classes
  • CrossRef normalization that works for both single-record and search-result payloads
  • Compatibility bridge that no longer depends on one checkout path
  • OpenCitations DOI-based graph expansion with CLI support
  • Unpaywall OA-link enrichment with CLI support
  • Europe PMC biomedical resolver/search support
  • Semantic Scholar broad-science resolver/search support

Next Milestones

Immediate

  1. Decide whether repository-acquisition scope justifies OpenAIRE
  2. Keep the OA-enrichment flow aligned with review/export needs
  3. Keep graph-source scope disciplined as broader coverage grows

Later

  1. Evaluate Semantic Scholar
  2. Evaluate OpenAIRE
  3. Revisit database/vector work only if a concrete source need demands it

Success Metrics

Completed

  • Planning now matches the actual source question
  • Source-layer defects from the first pass have been corrected
  • OpenCitations is now a working integrated source
  • Unpaywall is now a working integrated source
  • Europe PMC is now a working integrated source
  • Semantic Scholar is now a working integrated source
  • The next source priorities are explicit

Planned

  • Better source selection discipline before adding more integrations

Recommendations

  1. Treat the current SQLite/FTS workflow as the baseline, not as a blocker.
  2. Add source integrations only when they materially improve bibliographic coverage, citation coverage, or open-access linkage.
  3. Keep database/vector work explicitly subordinate to source-incorporation needs.

Last Updated: 2026-04-25 Status: Sources-first plan in effect Confidence: High