CiteGeist/docs/phase-completion.md

112 lines
3.6 KiB
Markdown

# Sources-First Status
**Current Focus:** identify and prioritize the next open bibliographic sources to add, using the existing SQLite-based workflow as the baseline.
---
## Phase Matrix
| Phase | Title | Status | Outcome |
|-------|-------|--------|---------|
| **0** | Scope Reframe | ✅ Complete | Planning now answers the source question directly |
| **1** | Source Layer Tightening | ✅ Complete | Registry, CrossRef plugin, compatibility seam, and source catalog repaired |
| **2** | Next Open Source Additions | 🚧 In Progress | OpenCitations, Unpaywall, and Europe PMC integrated |
| **3** | Optional Source Evaluation | ⏳ Planned | OpenAIRE evaluated later if acquisition breadth matters |
| **D** | Database / Vector Expansion | ⏸ Deferred | Not required for the current source-incorporation decision |
---
## Test Coverage Summary
```
✅ test_sources_plugin.py
✅ test_sources_catalog.py
✅ existing full suite still expected to pass
```
---
## Key Artifacts
### Documentation
```
docs/
├── source-landscape.md ✅ Source inventory and recommendations
├── implementation-progress.md ✅ Sources-first progress tracker
└── phase-completion.md ✅ Short status summary
```
### Source Layer
```
src/citegeist/sources/
├── base.py ✅ Base source interface
├── catalog.py ✅ Source inventory in code
├── registry.py ✅ Registry for known source classes
├── crossref.py ✅ Repaired CrossRef plugin
└── _old_sources_compat.py ✅ Repo-relative compatibility bridge
```
### Tests
```
tests/
├── test_sources_plugin.py ✅ Source plugin tests
└── test_sources_catalog.py ✅ Source catalog/registry tests
```
---
## Key Features Implemented
- ✅ Source catalog covering current and candidate open sources
- ✅ Config-driven registry loading for known real source classes
- ✅ CrossRef normalization that works for both single-record and search-result payloads
- ✅ Compatibility bridge that no longer depends on one checkout path
- ✅ OpenCitations DOI-based graph expansion with CLI support
- ✅ Unpaywall OA-link enrichment with CLI support
- ✅ Europe PMC biomedical resolver/search support
- ✅ Semantic Scholar broad-science resolver/search support
---
## Next Milestones
### Immediate
1. Decide whether repository-acquisition scope justifies `OpenAIRE`
2. Keep the OA-enrichment flow aligned with review/export needs
3. Keep graph-source scope disciplined as broader coverage grows
### Later
1. Evaluate `Semantic Scholar`
2. Evaluate `OpenAIRE`
3. Revisit database/vector work only if a concrete source need demands it
---
## Success Metrics
### Completed
- ✅ Planning now matches the actual source question
- ✅ Source-layer defects from the first pass have been corrected
- ✅ OpenCitations is now a working integrated source
- ✅ Unpaywall is now a working integrated source
- ✅ Europe PMC is now a working integrated source
- ✅ Semantic Scholar is now a working integrated source
- ✅ The next source priorities are explicit
### Planned
- ⏳ Better source selection discipline before adding more integrations
---
## Recommendations
1. Treat the current SQLite/FTS workflow as the baseline, not as a blocker.
2. Add source integrations only when they materially improve bibliographic coverage, citation coverage, or open-access linkage.
3. Keep database/vector work explicitly subordinate to source-incorporation needs.
---
**Last Updated:** 2026-04-25
**Status:** Sources-first plan in effect
**Confidence:** High