Initial commit
This commit is contained in:
parent
bf5c1a0e4a
commit
e819f17607
|
|
@ -0,0 +1,20 @@
|
||||||
|
# Docs
|
||||||
|
|
||||||
|
The top-level documentation in this repository is intended to describe `GroundRecall` as a standalone project.
|
||||||
|
|
||||||
|
Primary docs:
|
||||||
|
|
||||||
|
- [quickstart.md](quickstart.md)
|
||||||
|
- [architecture.md](architecture.md)
|
||||||
|
- [llmwiki-import.md](llmwiki-import.md)
|
||||||
|
- [sync-roadmap.md](sync-roadmap.md)
|
||||||
|
|
||||||
|
Legacy extraction notes:
|
||||||
|
|
||||||
|
- [legacy/groundrecall-assistant-architecture.md](legacy/groundrecall-assistant-architecture.md)
|
||||||
|
- [legacy/groundrecall-ingestion-refactor.md](legacy/groundrecall-ingestion-refactor.md)
|
||||||
|
- [legacy/groundrecall-llmwiki-import.md](legacy/groundrecall-llmwiki-import.md)
|
||||||
|
- [legacy/groundrecall-migration-plan.md](legacy/groundrecall-migration-plan.md)
|
||||||
|
- [legacy/groundrecall-repo-bootstrap.md](legacy/groundrecall-repo-bootstrap.md)
|
||||||
|
|
||||||
|
Those legacy documents were carried over from the earlier `Didactopus`-embedded phase. They remain useful as design history, but they are not the preferred starting point for current standalone `GroundRecall` work.
|
||||||
|
|
@ -0,0 +1,93 @@
|
||||||
|
# Architecture
|
||||||
|
|
||||||
|
`GroundRecall` is the grounded knowledge substrate in a larger stack:
|
||||||
|
|
||||||
|
- `GroundRecall`: canonical knowledge ingestion, promotion, query, export, and future sync
|
||||||
|
- `Didactopus`: learner-facing workflows and educational tooling
|
||||||
|
- `GenieHive`: model and routing layer where runtime assistant/service resolution is needed
|
||||||
|
|
||||||
|
## Core Design
|
||||||
|
|
||||||
|
The system is built around one canonical flow:
|
||||||
|
|
||||||
|
1. ingest weakly structured sources
|
||||||
|
2. normalize them into stable knowledge objects
|
||||||
|
3. lint and queue them for review
|
||||||
|
4. promote reviewed objects into a canonical store
|
||||||
|
5. query and export promoted state
|
||||||
|
|
||||||
|
## Core Objects
|
||||||
|
|
||||||
|
The canonical store is built from these object families:
|
||||||
|
|
||||||
|
- `Source`
|
||||||
|
- `Fragment`
|
||||||
|
- `Artifact`
|
||||||
|
- `Observation`
|
||||||
|
- `Claim`
|
||||||
|
- `Concept`
|
||||||
|
- `Relation`
|
||||||
|
- `ReviewCandidate`
|
||||||
|
- `PromotionRecord`
|
||||||
|
- `GroundRecallSnapshot`
|
||||||
|
|
||||||
|
These objects are assistant-neutral. Assistant-specific formatting belongs at the adapter layer.
|
||||||
|
|
||||||
|
## Package Surface
|
||||||
|
|
||||||
|
The main standalone package surface is:
|
||||||
|
|
||||||
|
- `groundrecall.ingest`
|
||||||
|
- `groundrecall.lint`
|
||||||
|
- `groundrecall.models`
|
||||||
|
- `groundrecall.store`
|
||||||
|
- `groundrecall.promotion`
|
||||||
|
- `groundrecall.query`
|
||||||
|
- `groundrecall.export`
|
||||||
|
- `groundrecall.assistant_export`
|
||||||
|
- `groundrecall.inspect`
|
||||||
|
- `groundrecall.source_adapters.*`
|
||||||
|
- `groundrecall.assistants.*`
|
||||||
|
|
||||||
|
There are also compatibility-style helper modules prefixed with `groundrecall_` inside the package. Those exist because the standalone repo was extracted from an earlier monorepo layout.
|
||||||
|
|
||||||
|
## Source Adapters
|
||||||
|
|
||||||
|
Adapters handle source-shape-specific discovery and mapping while the downstream pipeline stays generic.
|
||||||
|
|
||||||
|
Current adapter families include:
|
||||||
|
|
||||||
|
- `llmwiki`
|
||||||
|
- `markdown_notes`
|
||||||
|
- `transcript`
|
||||||
|
- `didactopus_pack`
|
||||||
|
|
||||||
|
## Assistant Boundary
|
||||||
|
|
||||||
|
Assistant integration is intentionally outside the core store and query semantics.
|
||||||
|
|
||||||
|
The rule is:
|
||||||
|
|
||||||
|
- core `GroundRecall` owns truth, provenance, lifecycle, and retrieval semantics
|
||||||
|
- assistant adapters own presentation, bundle shaping, and tool-specific exports
|
||||||
|
|
||||||
|
Current adapters include:
|
||||||
|
|
||||||
|
- `codex`
|
||||||
|
- `claude_code`
|
||||||
|
|
||||||
|
## Alpha Boundary
|
||||||
|
|
||||||
|
The current alpha is strong enough for:
|
||||||
|
|
||||||
|
- local import and promotion
|
||||||
|
- canonical query and export
|
||||||
|
- assistant-neutral bundles
|
||||||
|
- assistant-targeted bundle generation
|
||||||
|
|
||||||
|
It is not yet complete for:
|
||||||
|
|
||||||
|
- multi-node sync and merge
|
||||||
|
- re-import/update semantics
|
||||||
|
- richer review adjudication
|
||||||
|
- large-scale distributed corpus integration
|
||||||
|
|
@ -0,0 +1,174 @@
|
||||||
|
# GroundRecall Assistant Integration Architecture
|
||||||
|
|
||||||
|
This document defines how GroundRecall should support Codex, Claude Code, and
|
||||||
|
future assistant environments without treating any single assistant as the
|
||||||
|
authoritative integration target.
|
||||||
|
|
||||||
|
## Design rule
|
||||||
|
|
||||||
|
GroundRecall core must be assistant-agnostic.
|
||||||
|
|
||||||
|
Assistant-specific formats are derived views over promoted GroundRecall objects,
|
||||||
|
not the canonical representation of knowledge.
|
||||||
|
|
||||||
|
## Why this boundary matters
|
||||||
|
|
||||||
|
If assistant-specific prompt packaging leaks into the core model too early,
|
||||||
|
GroundRecall becomes:
|
||||||
|
|
||||||
|
- harder to evolve
|
||||||
|
- harder to validate
|
||||||
|
- harder to sync across machines
|
||||||
|
- harder to support across multiple assistant environments
|
||||||
|
|
||||||
|
The stable boundary should instead be:
|
||||||
|
|
||||||
|
- canonical grounded knowledge objects in core
|
||||||
|
- assistant adapters at the edge
|
||||||
|
|
||||||
|
## Core vs adapter split
|
||||||
|
|
||||||
|
### Core GroundRecall responsibilities
|
||||||
|
|
||||||
|
These should remain assistant-neutral:
|
||||||
|
|
||||||
|
- schemas for `Source`, `Fragment`, `Artifact`, `Observation`, `Claim`,
|
||||||
|
`Concept`, `Relation`, `ReviewCandidate`, and `PromotionRecord`
|
||||||
|
- provenance and confidence modeling
|
||||||
|
- contradiction and supersession handling
|
||||||
|
- linting and review queue generation
|
||||||
|
- review and promotion workflows
|
||||||
|
- persistent storage for promoted objects
|
||||||
|
- query and retrieval semantics
|
||||||
|
- sync and multi-machine consolidation
|
||||||
|
- canonical export formats
|
||||||
|
|
||||||
|
### Assistant adapter responsibilities
|
||||||
|
|
||||||
|
These should be adapter-specific:
|
||||||
|
|
||||||
|
- prompt/context packaging
|
||||||
|
- assistant-specific bundle layout
|
||||||
|
- memory-file rendering
|
||||||
|
- skill-file rendering
|
||||||
|
- assistant capability declarations
|
||||||
|
- token-budget shaping and truncation policy
|
||||||
|
- tool-specific metadata
|
||||||
|
|
||||||
|
## Canonical export contract
|
||||||
|
|
||||||
|
GroundRecall should export assistant-neutral artifacts first.
|
||||||
|
|
||||||
|
Recommended canonical exports:
|
||||||
|
|
||||||
|
- `groundrecall_snapshot.json`
|
||||||
|
- `claims.jsonl`
|
||||||
|
- `concepts.jsonl`
|
||||||
|
- `relations.jsonl`
|
||||||
|
- `provenance_manifest.json`
|
||||||
|
- `query_bundle.json`
|
||||||
|
|
||||||
|
Assistant adapters then derive secondary outputs from those canonical exports.
|
||||||
|
|
||||||
|
## Assistant adapter interface
|
||||||
|
|
||||||
|
GroundRecall should expose a small adapter protocol.
|
||||||
|
|
||||||
|
Example shape:
|
||||||
|
|
||||||
|
```python
|
||||||
|
class AssistantAdapter(Protocol):
|
||||||
|
name: str
|
||||||
|
|
||||||
|
def export_bundle(self, snapshot: dict, out_dir: Path) -> list[Path]:
|
||||||
|
...
|
||||||
|
|
||||||
|
def build_context(self, query_result: dict) -> dict:
|
||||||
|
...
|
||||||
|
|
||||||
|
def supported_capabilities(self) -> dict[str, bool]:
|
||||||
|
...
|
||||||
|
```
|
||||||
|
|
||||||
|
This is a strategy/plugin boundary. A small registry or factory is acceptable,
|
||||||
|
but the important architectural decision is the separation of concerns, not the
|
||||||
|
factory itself.
|
||||||
|
|
||||||
|
## Recommended package layout
|
||||||
|
|
||||||
|
Recommended modules:
|
||||||
|
|
||||||
|
- `didactopus.groundrecall.models`
|
||||||
|
- `didactopus.groundrecall.store`
|
||||||
|
- `didactopus.groundrecall.promotion`
|
||||||
|
- `didactopus.groundrecall.query`
|
||||||
|
- `didactopus.groundrecall.export`
|
||||||
|
- `didactopus.groundrecall.assistants.base`
|
||||||
|
- `didactopus.groundrecall.assistants.codex`
|
||||||
|
- `didactopus.groundrecall.assistants.claude_code`
|
||||||
|
|
||||||
|
## Export layering
|
||||||
|
|
||||||
|
Recommended filesystem layout:
|
||||||
|
|
||||||
|
- `exports/canonical/`
|
||||||
|
- `exports/assistants/codex/`
|
||||||
|
- `exports/assistants/claude-code/`
|
||||||
|
|
||||||
|
Canonical exports remain the durable interchange format.
|
||||||
|
|
||||||
|
Assistant exports remain reproducible derived artifacts.
|
||||||
|
|
||||||
|
## Query layering
|
||||||
|
|
||||||
|
The query layer should return assistant-neutral structures such as:
|
||||||
|
|
||||||
|
- relevant claims
|
||||||
|
- supporting fragments
|
||||||
|
- provenance
|
||||||
|
- contradictions
|
||||||
|
- supersessions
|
||||||
|
- confidence and recency
|
||||||
|
- suggested next actions
|
||||||
|
|
||||||
|
Adapters may then convert this payload into:
|
||||||
|
|
||||||
|
- Codex skill/context bundles
|
||||||
|
- Claude Code project memory/context bundles
|
||||||
|
- future assistant context packages
|
||||||
|
|
||||||
|
## Stability policy
|
||||||
|
|
||||||
|
GroundRecall should adopt these rules early:
|
||||||
|
|
||||||
|
1. No assistant-specific fields in canonical `Claim` or `Concept` objects.
|
||||||
|
2. No assistant-specific persistence formats as authoritative storage.
|
||||||
|
3. No review or promotion decisions based on assistant-specific packaging.
|
||||||
|
4. Assistant adapters may be added or removed without changing canonical objects.
|
||||||
|
|
||||||
|
## Migration implication
|
||||||
|
|
||||||
|
Current and future GroundRecall work should replace language like:
|
||||||
|
|
||||||
|
- "Codex-facing export"
|
||||||
|
- "Codex skill bundle"
|
||||||
|
|
||||||
|
with:
|
||||||
|
|
||||||
|
- "assistant adapter bundle"
|
||||||
|
- "assistant-facing export"
|
||||||
|
- "assistant-specific derived bundle"
|
||||||
|
|
||||||
|
Codex can still be one adapter and may remain the first implemented adapter, but
|
||||||
|
it should not define the system boundary.
|
||||||
|
|
||||||
|
## Immediate implementation impact
|
||||||
|
|
||||||
|
The next GroundRecall milestones should be interpreted as:
|
||||||
|
|
||||||
|
1. build assistant-neutral canonical models and storage
|
||||||
|
2. build review and promotion over canonical objects
|
||||||
|
3. build canonical query and export layers
|
||||||
|
4. add assistant adapters as thin renderers over those canonical outputs
|
||||||
|
|
||||||
|
This is the lowest-risk path for long-term stability.
|
||||||
|
|
@ -0,0 +1,105 @@
|
||||||
|
# GroundRecall Ingestion Refactor Plan
|
||||||
|
|
||||||
|
GroundRecall should treat `llmwiki` as one upstream source shape, not as the
|
||||||
|
defining architecture for grounded knowledge import.
|
||||||
|
|
||||||
|
Didactopus already has broader ambitions around ingestion of weakly structured
|
||||||
|
materials such as:
|
||||||
|
|
||||||
|
- markdown notes
|
||||||
|
- transcripts
|
||||||
|
- HTML/text course materials
|
||||||
|
- generated draft packs
|
||||||
|
- review sessions
|
||||||
|
- learner artifacts
|
||||||
|
|
||||||
|
The GroundRecall import pipeline should therefore be generalized around a shared
|
||||||
|
normalization and promotion substrate with pluggable source adapters.
|
||||||
|
|
||||||
|
## Design rule
|
||||||
|
|
||||||
|
Source-specific logic should live at the ingestion edge.
|
||||||
|
|
||||||
|
These stages should be generic:
|
||||||
|
|
||||||
|
- segmentation
|
||||||
|
- extraction
|
||||||
|
- normalization
|
||||||
|
- lint
|
||||||
|
- review queue generation
|
||||||
|
- review bridge
|
||||||
|
- promotion
|
||||||
|
- canonical store
|
||||||
|
- query
|
||||||
|
- canonical export
|
||||||
|
|
||||||
|
## Recommended module split
|
||||||
|
|
||||||
|
Recommended package layout:
|
||||||
|
|
||||||
|
- `didactopus.groundrecall_ingest`
|
||||||
|
- `didactopus.groundrecall_source_adapters.base`
|
||||||
|
- `didactopus.groundrecall_source_adapters.llmwiki`
|
||||||
|
- `didactopus.groundrecall_source_adapters.markdown_notes`
|
||||||
|
- `didactopus.groundrecall_source_adapters.transcript`
|
||||||
|
- `didactopus.groundrecall_source_adapters.didactopus_pack`
|
||||||
|
- `didactopus.groundrecall_source_adapters.didactopus_review`
|
||||||
|
|
||||||
|
## Shared intermediate envelope
|
||||||
|
|
||||||
|
Adapters should emit shared discovery records rather than jumping straight into
|
||||||
|
canonical GroundRecall objects.
|
||||||
|
|
||||||
|
Recommended intermediate types:
|
||||||
|
|
||||||
|
- `DiscoveredImportSource`
|
||||||
|
- `SegmentCandidate`
|
||||||
|
- `ImportProfile`
|
||||||
|
|
||||||
|
This keeps adapter-specific parsing separate from the shared import pipeline.
|
||||||
|
|
||||||
|
## Output intent
|
||||||
|
|
||||||
|
Not every imported source should be treated the same way.
|
||||||
|
|
||||||
|
Adapters should declare an output intent:
|
||||||
|
|
||||||
|
- `grounded_knowledge`
|
||||||
|
- `curriculum`
|
||||||
|
- `both`
|
||||||
|
|
||||||
|
Examples:
|
||||||
|
|
||||||
|
- `llmwiki` usually targets `grounded_knowledge`
|
||||||
|
- loose transcripts may target `grounded_knowledge`
|
||||||
|
- syllabus/course folders often target `curriculum`
|
||||||
|
- Didactopus packs or review sessions may target `both`
|
||||||
|
|
||||||
|
## First refactor milestones
|
||||||
|
|
||||||
|
### Milestone 1
|
||||||
|
|
||||||
|
- introduce adapter registry and adapter protocol
|
||||||
|
- move current `llmwiki` discovery/classification behind an adapter
|
||||||
|
- preserve the current import CLI behavior
|
||||||
|
|
||||||
|
### Milestone 2
|
||||||
|
|
||||||
|
- add a `markdown_notes` adapter
|
||||||
|
- add a `transcript` adapter
|
||||||
|
- add import profiles that tune extraction strictness
|
||||||
|
|
||||||
|
### Milestone 3
|
||||||
|
|
||||||
|
- add a `didactopus_pack` adapter for pack and review artifacts
|
||||||
|
- allow current Didactopus outputs to feed into GroundRecall directly
|
||||||
|
|
||||||
|
## Why this matters
|
||||||
|
|
||||||
|
This avoids building two parallel ingestion stacks inside Didactopus:
|
||||||
|
|
||||||
|
- one for packs and educational structures
|
||||||
|
- another for grounded knowledge capture
|
||||||
|
|
||||||
|
Instead, the system gets one generic ingestion substrate with multiple source
|
||||||
|
adapters and multiple downstream promotion/export paths.
|
||||||
|
|
@ -0,0 +1,496 @@
|
||||||
|
# GroundRecall `llmwiki` Import Specification
|
||||||
|
|
||||||
|
This document defines the first-pass import path for users who already have some
|
||||||
|
form of `llmwiki`-style repository and want to migrate it into the broader
|
||||||
|
GroundRecall substrate while staying compatible with Didactopus review and
|
||||||
|
promotion flows.
|
||||||
|
|
||||||
|
## Goal
|
||||||
|
|
||||||
|
The import path should let an existing `llmwiki` corpus become:
|
||||||
|
|
||||||
|
- searchable without immediate manual cleanup
|
||||||
|
- reviewable rather than blindly trusted
|
||||||
|
- grounded in explicit provenance
|
||||||
|
- promotable into durable structured knowledge objects
|
||||||
|
- exportable back into compiled wiki pages, assistant adapter bundles, and
|
||||||
|
queryable graph artifacts
|
||||||
|
|
||||||
|
The key rule is:
|
||||||
|
|
||||||
|
Imported wiki pages are **derived artifacts**, not automatic source truth.
|
||||||
|
|
||||||
|
## Import philosophy
|
||||||
|
|
||||||
|
Users coming from `llmwiki` often have a mixture of:
|
||||||
|
|
||||||
|
- raw notes
|
||||||
|
- compiled markdown pages
|
||||||
|
- local source files
|
||||||
|
- generated summaries
|
||||||
|
- ad hoc link graphs
|
||||||
|
- session transcripts
|
||||||
|
- speculative or weakly-supported synthesis
|
||||||
|
|
||||||
|
GroundRecall should preserve that work without pretending all of it is already
|
||||||
|
promoted knowledge.
|
||||||
|
|
||||||
|
The import pipeline therefore has two responsibilities:
|
||||||
|
|
||||||
|
1. Preserve the original material with minimal loss.
|
||||||
|
2. Reify explicit structured objects that can later be reviewed and promoted.
|
||||||
|
|
||||||
|
## Scope of the first implementation
|
||||||
|
|
||||||
|
The first implementation should support common `llmwiki` layouts such as:
|
||||||
|
|
||||||
|
- `raw/`
|
||||||
|
- `wiki/`
|
||||||
|
- `schema.*`
|
||||||
|
- `logs/`
|
||||||
|
- `sources/`
|
||||||
|
- top-level markdown pages
|
||||||
|
|
||||||
|
The importer should not require a canonical upstream schema. It should operate
|
||||||
|
from directory conventions plus simple heuristics.
|
||||||
|
|
||||||
|
## Import modes
|
||||||
|
|
||||||
|
### 1. `archive`
|
||||||
|
|
||||||
|
Purpose:
|
||||||
|
- preserve an existing `llmwiki` tree as read-only imported artifacts
|
||||||
|
- index it for search and later review
|
||||||
|
|
||||||
|
Behavior:
|
||||||
|
- no claim promotion
|
||||||
|
- minimal extraction
|
||||||
|
- all compiled pages remain `draft`
|
||||||
|
|
||||||
|
Use when:
|
||||||
|
- the user wants backward compatibility first
|
||||||
|
- the corpus quality is unknown
|
||||||
|
|
||||||
|
### 2. `quick`
|
||||||
|
|
||||||
|
Purpose:
|
||||||
|
- bootstrap usable structured objects fast
|
||||||
|
|
||||||
|
Behavior:
|
||||||
|
- import pages and raw sources
|
||||||
|
- extract candidate claims and concepts heuristically
|
||||||
|
- attach lightweight provenance
|
||||||
|
- queue uncertain items for review
|
||||||
|
|
||||||
|
Use when:
|
||||||
|
- the user wants early utility and accepts heuristic noise
|
||||||
|
|
||||||
|
### 3. `grounded`
|
||||||
|
|
||||||
|
Purpose:
|
||||||
|
- perform a migration suitable for long-lived shared knowledge
|
||||||
|
|
||||||
|
Behavior:
|
||||||
|
- require provenance for promoted claims
|
||||||
|
- mark unsupported statements explicitly
|
||||||
|
- produce review records and lint findings
|
||||||
|
- populate promotion queues rather than auto-promoting
|
||||||
|
|
||||||
|
Use when:
|
||||||
|
- the imported corpus will be shared across machines or agents
|
||||||
|
|
||||||
|
## Pipeline stages
|
||||||
|
|
||||||
|
### 1. Capture
|
||||||
|
|
||||||
|
The importer records the source repository as an import artifact.
|
||||||
|
|
||||||
|
Required metadata:
|
||||||
|
|
||||||
|
- `import_id`
|
||||||
|
- `import_mode`
|
||||||
|
- `source_root`
|
||||||
|
- `imported_at`
|
||||||
|
- `machine_id`
|
||||||
|
- `agent_id`
|
||||||
|
- `source_repo_kind=llmwiki`
|
||||||
|
|
||||||
|
Outputs:
|
||||||
|
|
||||||
|
- import manifest
|
||||||
|
- artifact records for all discovered files
|
||||||
|
|
||||||
|
### 2. Segment
|
||||||
|
|
||||||
|
Imported content is split into stable units.
|
||||||
|
|
||||||
|
Primary segment types:
|
||||||
|
|
||||||
|
- `source_document`
|
||||||
|
- `source_fragment`
|
||||||
|
- `compiled_page`
|
||||||
|
- `section_summary`
|
||||||
|
- `candidate_claim`
|
||||||
|
- `candidate_concept`
|
||||||
|
- `candidate_relation`
|
||||||
|
- `session_observation`
|
||||||
|
|
||||||
|
Segmentation should preserve:
|
||||||
|
|
||||||
|
- original path
|
||||||
|
- section heading
|
||||||
|
- line or byte offsets when possible
|
||||||
|
- page title
|
||||||
|
- frontmatter fields
|
||||||
|
|
||||||
|
### 3. Classify
|
||||||
|
|
||||||
|
Each segment gets a semantic role.
|
||||||
|
|
||||||
|
Recommended roles:
|
||||||
|
|
||||||
|
- `source`
|
||||||
|
- `derivation`
|
||||||
|
- `claim`
|
||||||
|
- `summary`
|
||||||
|
- `question`
|
||||||
|
- `todo`
|
||||||
|
- `speculation`
|
||||||
|
- `obsolete`
|
||||||
|
- `transcript`
|
||||||
|
|
||||||
|
This prevents unsupported prose from being confused with grounded knowledge.
|
||||||
|
|
||||||
|
### 4. Ground
|
||||||
|
|
||||||
|
Each imported segment gets provenance and support metadata.
|
||||||
|
|
||||||
|
Required grounding fields:
|
||||||
|
|
||||||
|
- `origin_artifact_id`
|
||||||
|
- `origin_path`
|
||||||
|
- `origin_section`
|
||||||
|
- `source_url` when known
|
||||||
|
- `retrieval_date` when known
|
||||||
|
- `machine_id`
|
||||||
|
- `session_id` when known
|
||||||
|
- `support_kind`
|
||||||
|
- `grounding_status`
|
||||||
|
|
||||||
|
Suggested values:
|
||||||
|
|
||||||
|
- `support_kind`: `direct_source`, `derived_from_page`, `derived_from_session`,
|
||||||
|
`inferred`, `unknown`
|
||||||
|
- `grounding_status`: `grounded`, `partially_grounded`, `ungrounded`
|
||||||
|
|
||||||
|
### 5. Normalize
|
||||||
|
|
||||||
|
The importer emits explicit GroundRecall objects.
|
||||||
|
|
||||||
|
Minimum object set:
|
||||||
|
|
||||||
|
- `Source`
|
||||||
|
- `Fragment`
|
||||||
|
- `Artifact`
|
||||||
|
- `Observation`
|
||||||
|
- `Claim`
|
||||||
|
- `Concept`
|
||||||
|
- `Relation`
|
||||||
|
|
||||||
|
### 6. Lint
|
||||||
|
|
||||||
|
The importer produces machine-readable findings before promotion.
|
||||||
|
|
||||||
|
Required lint checks:
|
||||||
|
|
||||||
|
- claim has no supporting fragment
|
||||||
|
- multiple claims appear text-identical
|
||||||
|
- concept is orphaned
|
||||||
|
- relation points to missing concept
|
||||||
|
- page summary has no cited support
|
||||||
|
- imported item marked `obsolete` still linked as current
|
||||||
|
- same claim imported with conflicting confidence or polarity
|
||||||
|
|
||||||
|
### 7. Promote
|
||||||
|
|
||||||
|
Imported objects enter existing Didactopus review/promotion lanes rather than
|
||||||
|
becoming trusted immediately.
|
||||||
|
|
||||||
|
Recommended states:
|
||||||
|
|
||||||
|
- `draft`
|
||||||
|
- `triaged`
|
||||||
|
- `reviewed`
|
||||||
|
- `promoted`
|
||||||
|
- `superseded`
|
||||||
|
- `archived`
|
||||||
|
|
||||||
|
### 8. Export
|
||||||
|
|
||||||
|
Promoted objects can then be rendered back out as:
|
||||||
|
|
||||||
|
- compiled wiki pages
|
||||||
|
- graph snapshots
|
||||||
|
- assistant adapter bundles
|
||||||
|
- review reports
|
||||||
|
- query bundles for assistant-facing use
|
||||||
|
|
||||||
|
## Object contracts
|
||||||
|
|
||||||
|
### `ImportedArtifact`
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"artifact_id": "ia_001",
|
||||||
|
"import_id": "imp_2026_04_16_a",
|
||||||
|
"artifact_kind": "compiled_page",
|
||||||
|
"path": "wiki/channel-capacity.md",
|
||||||
|
"title": "Channel Capacity",
|
||||||
|
"sha256": "abc123",
|
||||||
|
"created_at": "2026-04-16T14:00:00Z",
|
||||||
|
"metadata": {
|
||||||
|
"frontmatter": {},
|
||||||
|
"headings": ["Definition", "Examples"]
|
||||||
|
},
|
||||||
|
"current_status": "draft"
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### `ImportedObservation`
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"observation_id": "obs_001",
|
||||||
|
"import_id": "imp_2026_04_16_a",
|
||||||
|
"artifact_id": "ia_001",
|
||||||
|
"role": "summary",
|
||||||
|
"text": "Capacity bounds reliable communication over a noisy channel.",
|
||||||
|
"origin_path": "wiki/channel-capacity.md",
|
||||||
|
"origin_section": "Definition",
|
||||||
|
"line_start": 12,
|
||||||
|
"line_end": 14,
|
||||||
|
"grounding_status": "partially_grounded",
|
||||||
|
"support_kind": "derived_from_page",
|
||||||
|
"confidence_hint": 0.63,
|
||||||
|
"current_status": "draft"
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### `ImportedClaim`
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"claim_id": "clm_001",
|
||||||
|
"import_id": "imp_2026_04_16_a",
|
||||||
|
"claim_text": "Channel capacity is the maximum reliable communication rate for a channel model.",
|
||||||
|
"claim_kind": "definition",
|
||||||
|
"source_observation_ids": ["obs_001"],
|
||||||
|
"supporting_fragment_ids": ["frag_014"],
|
||||||
|
"concept_ids": ["concept::channel-capacity"],
|
||||||
|
"confidence_hint": 0.74,
|
||||||
|
"grounding_status": "grounded",
|
||||||
|
"current_status": "triaged"
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### `ImportedConcept`
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"concept_id": "concept::channel-capacity",
|
||||||
|
"import_id": "imp_2026_04_16_a",
|
||||||
|
"title": "Channel Capacity",
|
||||||
|
"aliases": [],
|
||||||
|
"description": "Imported concept from llmwiki corpus.",
|
||||||
|
"source_artifact_ids": ["ia_001"],
|
||||||
|
"current_status": "triaged"
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### `ImportedRelation`
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"relation_id": "rel_001",
|
||||||
|
"import_id": "imp_2026_04_16_a",
|
||||||
|
"source_id": "concept::shannon-entropy",
|
||||||
|
"target_id": "concept::channel-capacity",
|
||||||
|
"relation_type": "supports_understanding_of",
|
||||||
|
"evidence_ids": ["obs_015"],
|
||||||
|
"current_status": "draft"
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
## Mapping from `llmwiki` into GroundRecall
|
||||||
|
|
||||||
|
Recommended first-pass mapping:
|
||||||
|
|
||||||
|
- `raw/*` -> `Source` or `Artifact(kind=raw_note)`
|
||||||
|
- `wiki/*.md` -> `Artifact(kind=compiled_page)`
|
||||||
|
- frontmatter -> artifact metadata
|
||||||
|
- headings -> section boundaries
|
||||||
|
- linked page names -> candidate `Concept` and `Relation`
|
||||||
|
- bullet or sentence extraction -> candidate `Observation` and `Claim`
|
||||||
|
- chat or session logs -> `Observation(kind=session_note)`
|
||||||
|
- schema files -> import metadata only unless a future adapter exists
|
||||||
|
|
||||||
|
## Confidence and trust policy
|
||||||
|
|
||||||
|
Imported confidence must remain clearly separate from reviewed confidence.
|
||||||
|
|
||||||
|
Recommended fields:
|
||||||
|
|
||||||
|
- `confidence_hint`
|
||||||
|
- `review_confidence`
|
||||||
|
- `grounding_status`
|
||||||
|
- `review_verdict`
|
||||||
|
|
||||||
|
Policy:
|
||||||
|
|
||||||
|
- `confidence_hint` comes from heuristic import scoring
|
||||||
|
- `review_confidence` exists only after review
|
||||||
|
- promotion requires at least `partially_grounded`
|
||||||
|
- fully ungrounded claims can be stored, but only as `draft` or `archived`
|
||||||
|
|
||||||
|
## Provenance policy
|
||||||
|
|
||||||
|
The importer should follow the existing Didactopus provenance direction:
|
||||||
|
|
||||||
|
- preserve source identity
|
||||||
|
- preserve retrieval date when available
|
||||||
|
- preserve adaptation status
|
||||||
|
- keep both human-readable and machine-readable provenance
|
||||||
|
|
||||||
|
When only a compiled wiki page exists and the original source is missing:
|
||||||
|
|
||||||
|
- the compiled page becomes the immediate origin artifact
|
||||||
|
- all extracted claims must be marked `derived_from_page`
|
||||||
|
- such claims should not auto-promote in `grounded` mode
|
||||||
|
|
||||||
|
## Review and promotion integration
|
||||||
|
|
||||||
|
Imported `Claim` and `Concept` objects should feed into the same general review
|
||||||
|
machinery already used for pack-oriented promotion:
|
||||||
|
|
||||||
|
- create candidate records
|
||||||
|
- attach lint findings
|
||||||
|
- route to a triage lane
|
||||||
|
- collect review verdicts
|
||||||
|
- emit promotion records
|
||||||
|
|
||||||
|
Suggested triage lanes:
|
||||||
|
|
||||||
|
- `knowledge_capture`
|
||||||
|
- `pack_improvement`
|
||||||
|
- `skill_export`
|
||||||
|
- `source_cleanup`
|
||||||
|
- `conflict_resolution`
|
||||||
|
|
||||||
|
## Module layout
|
||||||
|
|
||||||
|
First-pass module layout:
|
||||||
|
|
||||||
|
- `didactopus.groundrecall_import`
|
||||||
|
Entry points and top-level orchestration.
|
||||||
|
- `didactopus.groundrecall_discovery`
|
||||||
|
Finds `llmwiki`-style files and classifies paths.
|
||||||
|
- `didactopus.groundrecall_segmenter`
|
||||||
|
Splits pages and logs into stable observations and candidate claims.
|
||||||
|
- `didactopus.groundrecall_normalizer`
|
||||||
|
Emits normalized import objects.
|
||||||
|
- `didactopus.groundrecall_lint`
|
||||||
|
Import-time lint checks.
|
||||||
|
- `didactopus.groundrecall_review_bridge`
|
||||||
|
Converts imported objects into review candidates and promotion records.
|
||||||
|
- `didactopus.groundrecall_export`
|
||||||
|
Renders promoted objects back to wiki, graph, and skill artifacts.
|
||||||
|
|
||||||
|
## CLI shape
|
||||||
|
|
||||||
|
Suggested CLI:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
python -m didactopus.groundrecall.cli import /path/to/llmwiki --mode archive
|
||||||
|
python -m didactopus.groundrecall.cli import /path/to/llmwiki --mode quick
|
||||||
|
python -m didactopus.groundrecall.cli import /path/to/llmwiki --mode grounded
|
||||||
|
python -m didactopus.groundrecall.cli lint imports/<import-id>
|
||||||
|
python -m didactopus.groundrecall.cli promote imports/<import-id> /path/to/store
|
||||||
|
python -m didactopus.groundrecall.cli export /path/to/store exports/groundrecall --concept channel-capacity
|
||||||
|
```
|
||||||
|
|
||||||
|
Compatibility wrappers still exist during migration:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
python -m didactopus.groundrecall_import /path/to/llmwiki --mode grounded
|
||||||
|
python -m didactopus.groundrecall_lint imports/<import-id>
|
||||||
|
python -m didactopus.groundrecall_export /path/to/store exports/groundrecall --concept channel-capacity
|
||||||
|
```
|
||||||
|
|
||||||
|
## Filesystem layout
|
||||||
|
|
||||||
|
Suggested repository-local layout:
|
||||||
|
|
||||||
|
- `imports/<import-id>/manifest.json`
|
||||||
|
- `imports/<import-id>/artifacts.jsonl`
|
||||||
|
- `imports/<import-id>/observations.jsonl`
|
||||||
|
- `imports/<import-id>/claims.jsonl`
|
||||||
|
- `imports/<import-id>/concepts.jsonl`
|
||||||
|
- `imports/<import-id>/relations.jsonl`
|
||||||
|
- `imports/<import-id>/lint_findings.json`
|
||||||
|
- `imports/<import-id>/review_queue.json`
|
||||||
|
|
||||||
|
This keeps imported state auditable and easy to sync across machines.
|
||||||
|
|
||||||
|
## Multi-machine sync implication
|
||||||
|
|
||||||
|
For distributed assistant use, imported state should be append-oriented and
|
||||||
|
rebuildable.
|
||||||
|
|
||||||
|
Recommended sync primitives:
|
||||||
|
|
||||||
|
- import manifests
|
||||||
|
- normalized jsonl object streams
|
||||||
|
- review records
|
||||||
|
- promotion records
|
||||||
|
|
||||||
|
Non-authoritative derived artifacts:
|
||||||
|
|
||||||
|
- rendered wiki pages
|
||||||
|
- local indexes
|
||||||
|
- embeddings
|
||||||
|
- cache files
|
||||||
|
|
||||||
|
This allows multiple machines to contribute import events without making the
|
||||||
|
compiled page tree the merge primitive.
|
||||||
|
|
||||||
|
## First implementation milestones
|
||||||
|
|
||||||
|
### Milestone 1
|
||||||
|
|
||||||
|
- discover `raw/` and `wiki/`
|
||||||
|
- import artifacts
|
||||||
|
- segment markdown by headings
|
||||||
|
- emit observations and candidate claims
|
||||||
|
- write import manifest and jsonl outputs
|
||||||
|
|
||||||
|
### Milestone 2
|
||||||
|
|
||||||
|
- add grounding metadata
|
||||||
|
- add lint checks
|
||||||
|
- add triage lanes and review queue output
|
||||||
|
|
||||||
|
### Milestone 3
|
||||||
|
|
||||||
|
- map promoted claims into assistant-neutral exports plus assistant adapter bundles
|
||||||
|
- render compiled wiki views from promoted objects
|
||||||
|
- support multi-machine import manifests and merge-safe event storage
|
||||||
|
|
||||||
|
## Non-goals for the first pass
|
||||||
|
|
||||||
|
- perfect semantic claim extraction
|
||||||
|
- automatic trust assignment
|
||||||
|
- full upstream `llmwiki` schema compatibility
|
||||||
|
- lossless import of every custom plugin or script
|
||||||
|
- embeddings-first retrieval
|
||||||
|
|
||||||
|
The first pass should be conservative, inspectable, and easy to improve.
|
||||||
|
|
@ -0,0 +1,281 @@
|
||||||
|
# GroundRecall Migration Plan
|
||||||
|
|
||||||
|
This document turns the boundary decisions in [deployment-modes.md](deployment-modes.md) into an implementation plan.
|
||||||
|
|
||||||
|
The goal is not an immediate repo split. The goal is to let `GroundRecall` become independently deployable and operable without destabilizing ongoing `Didactopus` learner work.
|
||||||
|
|
||||||
|
## Current State
|
||||||
|
|
||||||
|
Today, GroundRecall exists as a set of modules under `src/didactopus/`:
|
||||||
|
|
||||||
|
- `groundrecall_import`
|
||||||
|
- `groundrecall_source_adapters/*`
|
||||||
|
- `groundrecall_lint`
|
||||||
|
- `groundrecall_review_queue`
|
||||||
|
- `groundrecall_review_bridge`
|
||||||
|
- `groundrecall_models`
|
||||||
|
- `groundrecall_store`
|
||||||
|
- `groundrecall_promotion`
|
||||||
|
- `groundrecall_query`
|
||||||
|
- `groundrecall_export`
|
||||||
|
- `groundrecall_assistant_export`
|
||||||
|
- `groundrecall_assistants/*`
|
||||||
|
|
||||||
|
This is acceptable as an implementation phase, but it creates two risks:
|
||||||
|
|
||||||
|
1. generic knowledge-substrate functionality may continue to accrete under `didactopus.main`
|
||||||
|
2. feature work may silently assume the presence of learner-facing Didactopus components
|
||||||
|
|
||||||
|
## Migration Goal
|
||||||
|
|
||||||
|
Target state:
|
||||||
|
|
||||||
|
- `Didactopus` remains the learner-facing application
|
||||||
|
- `GroundRecall` becomes the standalone grounded knowledge substrate
|
||||||
|
- `GenieHive` remains the model and routing control plane
|
||||||
|
|
||||||
|
The package, CLI, and deployment boundaries should eventually reflect that.
|
||||||
|
|
||||||
|
## Target Ownership
|
||||||
|
|
||||||
|
### GroundRecall should own
|
||||||
|
|
||||||
|
- source ingestion and normalization
|
||||||
|
- claim/concept/relation/artifact/provenance schemas
|
||||||
|
- canonical store and snapshots
|
||||||
|
- lint and review queue generation
|
||||||
|
- promotion and merge semantics
|
||||||
|
- assistant-neutral query and export
|
||||||
|
- assistant adapter export
|
||||||
|
- sync, merge, and team/shared knowledge operations
|
||||||
|
|
||||||
|
### Didactopus should own
|
||||||
|
|
||||||
|
- learner session flows
|
||||||
|
- mentor/practice/evaluator/project-advisor workflows
|
||||||
|
- pack and curriculum-specific review UX
|
||||||
|
- mastery-ledger and learner evidence experiences
|
||||||
|
- educational packaging over grounded knowledge
|
||||||
|
|
||||||
|
### Shared boundary helpers should stay narrow
|
||||||
|
|
||||||
|
- provider policy that depends on GenieHive route resolution but serves learner workflows
|
||||||
|
- review bridges where GroundRecall needs to feed an existing Didactopus review process during the transition
|
||||||
|
|
||||||
|
## Packaging Direction
|
||||||
|
|
||||||
|
### Phase 0: Present layout, stricter discipline
|
||||||
|
|
||||||
|
Keep the code in `src/didactopus/`, but use naming and imports that preserve the eventual split.
|
||||||
|
|
||||||
|
Rules:
|
||||||
|
|
||||||
|
- new generic knowledge features go into `groundrecall_*` modules
|
||||||
|
- new learner-facing features go into `didactopus` learner modules
|
||||||
|
- do not add generic knowledge operations to `didactopus.main`
|
||||||
|
- treat review bridges as bridges, not permanent core ownership
|
||||||
|
|
||||||
|
### Phase 1: Explicit namespace inside the repo
|
||||||
|
|
||||||
|
Preferred direction:
|
||||||
|
|
||||||
|
- move GroundRecall modules under `src/didactopus/groundrecall/`
|
||||||
|
|
||||||
|
Target structure:
|
||||||
|
|
||||||
|
- `src/didactopus/groundrecall/ingest.py`
|
||||||
|
- `src/didactopus/groundrecall/source_adapters/`
|
||||||
|
- `src/didactopus/groundrecall/models.py`
|
||||||
|
- `src/didactopus/groundrecall/store.py`
|
||||||
|
- `src/didactopus/groundrecall/promotion.py`
|
||||||
|
- `src/didactopus/groundrecall/query.py`
|
||||||
|
- `src/didactopus/groundrecall/export.py`
|
||||||
|
- `src/didactopus/groundrecall/assistants/`
|
||||||
|
- `src/didactopus/groundrecall/sync.py`
|
||||||
|
- `src/didactopus/groundrecall/merge.py`
|
||||||
|
- `src/didactopus/groundrecall/cli.py`
|
||||||
|
|
||||||
|
Benefits:
|
||||||
|
|
||||||
|
- cleaner conceptual grouping
|
||||||
|
- easier extraction later
|
||||||
|
- clearer import discipline
|
||||||
|
|
||||||
|
Compatibility path:
|
||||||
|
|
||||||
|
- keep thin wrapper modules at old import paths during transition
|
||||||
|
- deprecate wrappers only after tests and docs have moved
|
||||||
|
|
||||||
|
### Phase 2: Dual CLI identity
|
||||||
|
|
||||||
|
Before any repo split, expose GroundRecall as a first-class CLI namespace.
|
||||||
|
|
||||||
|
Desired commands:
|
||||||
|
|
||||||
|
- `python -m didactopus.groundrecall.cli import ...`
|
||||||
|
- `python -m didactopus.groundrecall.cli lint ...`
|
||||||
|
- `python -m didactopus.groundrecall.cli promote ...`
|
||||||
|
- `python -m didactopus.groundrecall.cli query ...`
|
||||||
|
- `python -m didactopus.groundrecall.cli export ...`
|
||||||
|
- `python -m didactopus.groundrecall.cli inspect ...`
|
||||||
|
|
||||||
|
At that point, `didactopus.main` should only surface:
|
||||||
|
|
||||||
|
- learner-facing commands
|
||||||
|
- review-workflow commands with educational intent
|
||||||
|
- possibly a pointer to GroundRecall commands, but not ownership of them
|
||||||
|
|
||||||
|
### Phase 3: Optional package extraction
|
||||||
|
|
||||||
|
Only after sync/merge and standalone use are mature:
|
||||||
|
|
||||||
|
- move GroundRecall to its own package or repo if that becomes operationally useful
|
||||||
|
- keep Didactopus consuming it as a dependency
|
||||||
|
|
||||||
|
This step is optional. A clean package boundary inside one repo may be sufficient for a long time.
|
||||||
|
|
||||||
|
## CLI Migration Plan
|
||||||
|
|
||||||
|
### Keep under `didactopus.main`
|
||||||
|
|
||||||
|
- `review`
|
||||||
|
- future learner-facing workbench commands
|
||||||
|
|
||||||
|
### Move toward GroundRecall CLI
|
||||||
|
|
||||||
|
- import
|
||||||
|
- lint
|
||||||
|
- review queue
|
||||||
|
- promotion
|
||||||
|
- canonical query
|
||||||
|
- canonical export
|
||||||
|
- assistant export
|
||||||
|
- sync and merge
|
||||||
|
|
||||||
|
### Transitional exception
|
||||||
|
|
||||||
|
`provider-inspect` can remain on the Didactopus umbrella CLI for now because:
|
||||||
|
|
||||||
|
- it is already useful operationally
|
||||||
|
- it supports learner-node deployments
|
||||||
|
- it is not a GroundRecall-specific operation
|
||||||
|
|
||||||
|
Longer term, it may also belong on a separate operator surface depending on whether Didactopus becomes the standard local application shell.
|
||||||
|
|
||||||
|
## Module Mapping
|
||||||
|
|
||||||
|
### Move first
|
||||||
|
|
||||||
|
Current -> target
|
||||||
|
|
||||||
|
- `didactopus.groundrecall_import` -> `didactopus.groundrecall.ingest`
|
||||||
|
- `didactopus.groundrecall_source_adapters.*` -> `didactopus.groundrecall.source_adapters.*`
|
||||||
|
- `didactopus.groundrecall_models` -> `didactopus.groundrecall.models`
|
||||||
|
- `didactopus.groundrecall_store` -> `didactopus.groundrecall.store`
|
||||||
|
- `didactopus.groundrecall_promotion` -> `didactopus.groundrecall.promotion`
|
||||||
|
- `didactopus.groundrecall_query` -> `didactopus.groundrecall.query`
|
||||||
|
- `didactopus.groundrecall_export` -> `didactopus.groundrecall.export`
|
||||||
|
- `didactopus.groundrecall_assistants.*` -> `didactopus.groundrecall.assistants.*`
|
||||||
|
|
||||||
|
### Keep as transitional bridges
|
||||||
|
|
||||||
|
- `didactopus.groundrecall_review_bridge`
|
||||||
|
- source adapters that ingest Didactopus-native artifacts
|
||||||
|
|
||||||
|
These are legitimate but should be documented as cross-boundary adapters rather than intrinsic ownership proof.
|
||||||
|
|
||||||
|
### Stay in Didactopus
|
||||||
|
|
||||||
|
- `learner_session`
|
||||||
|
- `learner_session_demo`
|
||||||
|
- `mentor`
|
||||||
|
- `practice`
|
||||||
|
- `project_advisor`
|
||||||
|
- educational review UX modules
|
||||||
|
- pack and graph-planning modules
|
||||||
|
|
||||||
|
## Service Boundary Direction
|
||||||
|
|
||||||
|
### GroundRecall service candidates
|
||||||
|
|
||||||
|
Once needed, a GroundRecall service should focus on:
|
||||||
|
|
||||||
|
- canonical knowledge query
|
||||||
|
- import status and queue inspection
|
||||||
|
- promotion status
|
||||||
|
- sync/merge status
|
||||||
|
- assistant-neutral bundle retrieval
|
||||||
|
|
||||||
|
### Didactopus service candidates
|
||||||
|
|
||||||
|
- learner session orchestration
|
||||||
|
- learner progress and evaluation
|
||||||
|
- pack/workbench interactions
|
||||||
|
|
||||||
|
### GenieHive service candidates
|
||||||
|
|
||||||
|
- model and service inspection
|
||||||
|
- route resolution
|
||||||
|
- cluster health
|
||||||
|
|
||||||
|
## Milestones
|
||||||
|
|
||||||
|
### Milestone 1: Namespace discipline
|
||||||
|
|
||||||
|
Done when:
|
||||||
|
|
||||||
|
- new generic knowledge work lands only in GroundRecall-oriented modules
|
||||||
|
- `didactopus.main` stops growing generic knowledge commands
|
||||||
|
- docs consistently describe GroundRecall as a substrate, not a learner feature
|
||||||
|
|
||||||
|
### Milestone 2: Internal package reorganization
|
||||||
|
|
||||||
|
Done when:
|
||||||
|
|
||||||
|
- GroundRecall modules live under an explicit package path
|
||||||
|
- old flat import paths are wrappers only
|
||||||
|
- tests target the new package paths
|
||||||
|
|
||||||
|
### Milestone 3: First-class GroundRecall CLI
|
||||||
|
|
||||||
|
Done when:
|
||||||
|
|
||||||
|
- import/lint/promote/query/export/inspect are available under one GroundRecall CLI surface
|
||||||
|
- operator docs no longer require `Didactopus` framing for generic knowledge tasks
|
||||||
|
|
||||||
|
### Milestone 4: Sync and merge maturity
|
||||||
|
|
||||||
|
Done when:
|
||||||
|
|
||||||
|
- append-only event ingestion exists
|
||||||
|
- promoted-state merge semantics exist
|
||||||
|
- team/shared knowledge workflows are practical without learner workflows
|
||||||
|
|
||||||
|
### Milestone 5: Extraction decision
|
||||||
|
|
||||||
|
Done when:
|
||||||
|
|
||||||
|
- the project can make an informed choice between:
|
||||||
|
- one repo, multiple packages
|
||||||
|
- separate GroundRecall package/repo
|
||||||
|
|
||||||
|
## Immediate Next Work
|
||||||
|
|
||||||
|
Recommended next implementation steps:
|
||||||
|
|
||||||
|
1. Introduce `didactopus.groundrecall` as an internal package namespace.
|
||||||
|
2. Add a single GroundRecall umbrella CLI module.
|
||||||
|
3. Keep thin wrapper modules for compatibility.
|
||||||
|
4. Start moving docs and tests to the new namespace.
|
||||||
|
5. Begin implementing sync/merge primitives under GroundRecall rather than under Didactopus learner flows.
|
||||||
|
|
||||||
|
## Decision Rule For New Work
|
||||||
|
|
||||||
|
Before adding a new command, module, or service, ask:
|
||||||
|
|
||||||
|
1. Would this still be needed if there were no learner session?
|
||||||
|
2. Would a team using only shared knowledge still need it?
|
||||||
|
3. Is the canonical artifact knowledge state or educational interaction?
|
||||||
|
4. Would it still matter if Didactopus UI vanished?
|
||||||
|
|
||||||
|
If yes, default toward GroundRecall.
|
||||||
|
|
@ -0,0 +1,286 @@
|
||||||
|
# GroundRecall Repo Bootstrap Checklist
|
||||||
|
|
||||||
|
This document turns the broader [groundrecall-migration-plan.md](groundrecall-migration-plan.md) into a practical checklist for creating a standalone `GroundRecall` repository.
|
||||||
|
|
||||||
|
The goal here is narrower than full feature completion. The goal is to get to a standalone repository that can be installed, run locally, and used for real `llmwiki++`-style work without requiring `Didactopus` as the primary shell.
|
||||||
|
|
||||||
|
## Bootstrap Goal
|
||||||
|
|
||||||
|
Minimum viable standalone `GroundRecall` repo:
|
||||||
|
|
||||||
|
- installable as its own Python package
|
||||||
|
- exposes a first-class `groundrecall` CLI
|
||||||
|
- imports and normalizes knowledge sources
|
||||||
|
- promotes reviewed knowledge into a canonical store
|
||||||
|
- supports query and export over promoted state
|
||||||
|
- supports assistant-neutral exports plus adapter exports
|
||||||
|
- remains consumable by `Didactopus` as a dependency or sibling package
|
||||||
|
|
||||||
|
This is enough for a local standalone alpha. It is not yet the full distributed team and corpus-scale vision.
|
||||||
|
|
||||||
|
## What Already Exists
|
||||||
|
|
||||||
|
The current `Didactopus` codebase already contains most of the implementation spine:
|
||||||
|
|
||||||
|
- `didactopus.groundrecall.ingest`
|
||||||
|
- `didactopus.groundrecall.source_adapters.*`
|
||||||
|
- `didactopus.groundrecall.models`
|
||||||
|
- `didactopus.groundrecall.store`
|
||||||
|
- `didactopus.groundrecall.promotion`
|
||||||
|
- `didactopus.groundrecall.query`
|
||||||
|
- `didactopus.groundrecall.export`
|
||||||
|
- `didactopus.groundrecall.assistant_export`
|
||||||
|
- `didactopus.groundrecall.assistants.*`
|
||||||
|
- `didactopus.groundrecall.inspect`
|
||||||
|
- `didactopus.groundrecall.cli`
|
||||||
|
|
||||||
|
This means the repo bootstrap is primarily a packaging and boundary exercise, not a greenfield implementation.
|
||||||
|
|
||||||
|
## Target Repo Shape
|
||||||
|
|
||||||
|
Suggested standalone layout:
|
||||||
|
|
||||||
|
```text
|
||||||
|
groundrecall/
|
||||||
|
pyproject.toml
|
||||||
|
README.md
|
||||||
|
LICENSE
|
||||||
|
src/
|
||||||
|
groundrecall/
|
||||||
|
__init__.py
|
||||||
|
cli.py
|
||||||
|
ingest.py
|
||||||
|
inspect.py
|
||||||
|
lint.py
|
||||||
|
models.py
|
||||||
|
store.py
|
||||||
|
promotion.py
|
||||||
|
query.py
|
||||||
|
export.py
|
||||||
|
assistant_export.py
|
||||||
|
review_queue.py
|
||||||
|
review_bridge.py
|
||||||
|
source_adapters/
|
||||||
|
assistants/
|
||||||
|
tests/
|
||||||
|
docs/
|
||||||
|
quickstart.md
|
||||||
|
llmwiki-import.md
|
||||||
|
deployment-modes.md
|
||||||
|
assistant-architecture.md
|
||||||
|
sync-roadmap.md
|
||||||
|
```
|
||||||
|
|
||||||
|
Notes:
|
||||||
|
|
||||||
|
- `review_bridge.py` may remain optional if the standalone repo only needs generic review artifacts.
|
||||||
|
- `review_queue.py` belongs in `GroundRecall`; it is not a learner-only concern.
|
||||||
|
- `review_bridge.py` is the most likely file to stay transitional if it depends too directly on Didactopus review objects.
|
||||||
|
|
||||||
|
## Move / Keep / Bridge
|
||||||
|
|
||||||
|
### Move into standalone `GroundRecall`
|
||||||
|
|
||||||
|
Move first:
|
||||||
|
|
||||||
|
- `didactopus.groundrecall.ingest`
|
||||||
|
- `didactopus.groundrecall.inspect`
|
||||||
|
- `didactopus.groundrecall.lint`
|
||||||
|
- `didactopus.groundrecall.models`
|
||||||
|
- `didactopus.groundrecall.store`
|
||||||
|
- `didactopus.groundrecall.promotion`
|
||||||
|
- `didactopus.groundrecall.query`
|
||||||
|
- `didactopus.groundrecall.export`
|
||||||
|
- `didactopus.groundrecall.assistant_export`
|
||||||
|
- `didactopus.groundrecall.review_queue`
|
||||||
|
- `didactopus.groundrecall.source_adapters.*`
|
||||||
|
- `didactopus.groundrecall.assistants.*`
|
||||||
|
- `didactopus.groundrecall.cli`
|
||||||
|
|
||||||
|
### Keep in `Didactopus`
|
||||||
|
|
||||||
|
These should not move:
|
||||||
|
|
||||||
|
- learner session and mentor/practice flows
|
||||||
|
- educational pack authoring and pack-specific UX
|
||||||
|
- mastery/evidence learner experiences
|
||||||
|
- provider demos that exist to support Didactopus learner workflows
|
||||||
|
|
||||||
|
### Keep as temporary bridges
|
||||||
|
|
||||||
|
These may need a staged treatment:
|
||||||
|
|
||||||
|
- `groundrecall_review_bridge`
|
||||||
|
- `didactopus_pack` source adapter
|
||||||
|
|
||||||
|
Those are useful during transition, but they are cross-boundary integrations, not proof that `GroundRecall` must remain inside `Didactopus`.
|
||||||
|
|
||||||
|
## Bootstrap Checklist
|
||||||
|
|
||||||
|
### 1. Create the new repo skeleton
|
||||||
|
|
||||||
|
Required:
|
||||||
|
|
||||||
|
- create a new repo root
|
||||||
|
- add `pyproject.toml`
|
||||||
|
- add `src/groundrecall/`
|
||||||
|
- add `tests/`
|
||||||
|
- add `docs/`
|
||||||
|
- add a minimal `README.md`
|
||||||
|
- add `LICENSE`
|
||||||
|
|
||||||
|
Definition of done:
|
||||||
|
|
||||||
|
- `pip install -e .` works
|
||||||
|
- `python -m groundrecall.cli --help` works
|
||||||
|
|
||||||
|
### 2. Move the package code
|
||||||
|
|
||||||
|
Required:
|
||||||
|
|
||||||
|
- copy the current `didactopus.groundrecall.*` package into `src/groundrecall/`
|
||||||
|
- update relative imports as needed
|
||||||
|
- remove `didactopus`-prefixed assumptions in docstrings and parser help text
|
||||||
|
|
||||||
|
Definition of done:
|
||||||
|
|
||||||
|
- module imports succeed under `groundrecall.*`
|
||||||
|
- no package file requires `didactopus` imports except explicit transition bridges
|
||||||
|
|
||||||
|
### 3. Extract the tests
|
||||||
|
|
||||||
|
Required:
|
||||||
|
|
||||||
|
- move GroundRecall-focused tests into the new repo
|
||||||
|
- keep Didactopus integration tests in Didactopus
|
||||||
|
- add an end-to-end CLI smoke test that runs:
|
||||||
|
- `import`
|
||||||
|
- `promote`
|
||||||
|
- `query`
|
||||||
|
- `export`
|
||||||
|
- `inspect`
|
||||||
|
|
||||||
|
Definition of done:
|
||||||
|
|
||||||
|
- the new repo has its own passing test suite
|
||||||
|
- Didactopus retains only integration tests that prove interoperability
|
||||||
|
|
||||||
|
### 4. Harden the standalone CLI
|
||||||
|
|
||||||
|
Required commands:
|
||||||
|
|
||||||
|
- `groundrecall import`
|
||||||
|
- `groundrecall lint`
|
||||||
|
- `groundrecall promote`
|
||||||
|
- `groundrecall query`
|
||||||
|
- `groundrecall export`
|
||||||
|
- `groundrecall inspect`
|
||||||
|
|
||||||
|
Recommended additions:
|
||||||
|
|
||||||
|
- `groundrecall assistant-export`
|
||||||
|
- `groundrecall review-queue`
|
||||||
|
|
||||||
|
Definition of done:
|
||||||
|
|
||||||
|
- the CLI help text is standalone and does not refer users back to `Didactopus`
|
||||||
|
|
||||||
|
### 5. Publish a repo-local data layout
|
||||||
|
|
||||||
|
Pick and document a stable layout such as:
|
||||||
|
|
||||||
|
```text
|
||||||
|
.groundrecall/
|
||||||
|
imports/
|
||||||
|
store/
|
||||||
|
exports/
|
||||||
|
events/
|
||||||
|
```
|
||||||
|
|
||||||
|
Required:
|
||||||
|
|
||||||
|
- make these paths configurable
|
||||||
|
- define sane defaults
|
||||||
|
- remove assumptions that the caller already knows the Didactopus workspace layout
|
||||||
|
|
||||||
|
Definition of done:
|
||||||
|
|
||||||
|
- a new user can run GroundRecall in an empty directory and get predictable local state
|
||||||
|
|
||||||
|
### 6. Document the standalone workflows
|
||||||
|
|
||||||
|
At minimum:
|
||||||
|
|
||||||
|
- quickstart
|
||||||
|
- migrate from `llmwiki`
|
||||||
|
- query and export patterns
|
||||||
|
- assistant adapter exports
|
||||||
|
- relationship to `Didactopus`
|
||||||
|
- relationship to `GenieHive`
|
||||||
|
|
||||||
|
Definition of done:
|
||||||
|
|
||||||
|
- the README can orient a new user without requiring Didactopus-specific context
|
||||||
|
|
||||||
|
### 7. Leave compatibility shims in `Didactopus`
|
||||||
|
|
||||||
|
Required:
|
||||||
|
|
||||||
|
- keep thin wrappers at `didactopus.groundrecall_*` or `didactopus.groundrecall.*` integration paths as needed
|
||||||
|
- make `Didactopus` import the extracted package where possible
|
||||||
|
- clearly mark the wrappers as compatibility paths
|
||||||
|
|
||||||
|
Definition of done:
|
||||||
|
|
||||||
|
- existing Didactopus workflows do not break during the split
|
||||||
|
|
||||||
|
## Alpha Completion Criteria
|
||||||
|
|
||||||
|
The standalone repo is alpha-ready when:
|
||||||
|
|
||||||
|
- `llmwiki` import works
|
||||||
|
- `markdown_notes` import works
|
||||||
|
- at least one Didactopus-native adapter still works as an integration adapter
|
||||||
|
- canonical store creation and snapshot export work
|
||||||
|
- query works over promoted objects
|
||||||
|
- assistant-neutral export works
|
||||||
|
- at least two assistant adapters export usable bundles
|
||||||
|
|
||||||
|
This is the right threshold for “functional GroundRecall repo.”
|
||||||
|
|
||||||
|
## Still Missing After Alpha
|
||||||
|
|
||||||
|
A standalone alpha is not yet the full target system. These remain post-bootstrap priorities:
|
||||||
|
|
||||||
|
- re-import and update semantics
|
||||||
|
- append-only event logs for multi-node merge
|
||||||
|
- shared/private scope support
|
||||||
|
- merge and sync conflict handling
|
||||||
|
- stronger claim extraction
|
||||||
|
- richer claim-level review and adjudication
|
||||||
|
- corpus-scale distributed coordination
|
||||||
|
|
||||||
|
Those features should be built in `GroundRecall`, but they do not need to block repo extraction.
|
||||||
|
|
||||||
|
## Recommended Execution Order
|
||||||
|
|
||||||
|
Use this order:
|
||||||
|
|
||||||
|
1. create the repo and package skeleton
|
||||||
|
2. copy the current `groundrecall` package and make imports pass
|
||||||
|
3. move tests and get the standalone suite green
|
||||||
|
4. finalize CLI and README
|
||||||
|
5. switch Didactopus integration points to consume the extracted package
|
||||||
|
6. only then continue with sync/merge and corpus-scale features
|
||||||
|
|
||||||
|
This keeps the boundary clean without stalling feature progress.
|
||||||
|
|
||||||
|
## First PR-Sized Steps
|
||||||
|
|
||||||
|
If this were executed as concrete work, the first three small changes should be:
|
||||||
|
|
||||||
|
1. create the new repo with package skeleton and copy `src/didactopus/groundrecall/`
|
||||||
|
2. move the existing namespace-focused tests and make them pass under `groundrecall.*`
|
||||||
|
3. add a standalone README quickstart and one end-to-end CLI smoke test
|
||||||
|
|
||||||
|
After that, the repo is real enough to iterate in place rather than continuing to plan around it.
|
||||||
|
|
@ -0,0 +1,85 @@
|
||||||
|
# llmwiki Import
|
||||||
|
|
||||||
|
`GroundRecall` treats `llmwiki` as one important source shape, not as the defining architecture.
|
||||||
|
|
||||||
|
An imported `llmwiki` tree is treated as:
|
||||||
|
|
||||||
|
- raw source material
|
||||||
|
- prior synthesized artifacts
|
||||||
|
- candidate claims and concepts
|
||||||
|
- provenance that needs to be normalized and reviewed
|
||||||
|
|
||||||
|
Compiled wiki pages are useful artifacts, but they are not automatically promoted as canonical truth.
|
||||||
|
|
||||||
|
## Import Modes
|
||||||
|
|
||||||
|
### `archive`
|
||||||
|
|
||||||
|
- preserve source material with minimal interpretation
|
||||||
|
- index and normalize without assuming promotion readiness
|
||||||
|
- useful for long-tail historical corpora
|
||||||
|
|
||||||
|
### `quick`
|
||||||
|
|
||||||
|
- fast bootstrap mode
|
||||||
|
- extracts candidate concepts, claims, and relations heuristically
|
||||||
|
- useful when getting an old corpus into GroundRecall quickly matters more than perfect grounding
|
||||||
|
|
||||||
|
### `grounded`
|
||||||
|
|
||||||
|
- stricter mode
|
||||||
|
- expects better provenance and cleaner support signals
|
||||||
|
- better fit for shared or promoted knowledge
|
||||||
|
|
||||||
|
## Import Flow
|
||||||
|
|
||||||
|
The normalized import flow is:
|
||||||
|
|
||||||
|
1. capture source files
|
||||||
|
2. discover and classify artifacts
|
||||||
|
3. segment content into observations
|
||||||
|
4. normalize claims, concepts, and relations
|
||||||
|
5. lint the import
|
||||||
|
6. emit a review queue and review bundle
|
||||||
|
7. promote reviewed artifacts into the canonical store
|
||||||
|
|
||||||
|
## Commands
|
||||||
|
|
||||||
|
```bash
|
||||||
|
groundrecall import /path/to/llmwiki --mode archive
|
||||||
|
groundrecall import /path/to/llmwiki --mode quick
|
||||||
|
groundrecall import /path/to/llmwiki --mode grounded
|
||||||
|
|
||||||
|
groundrecall lint imports/<import-id>
|
||||||
|
groundrecall promote imports/<import-id> store/
|
||||||
|
groundrecall export store/ exports/groundrecall --concept channel-capacity
|
||||||
|
```
|
||||||
|
|
||||||
|
## Current Heuristics
|
||||||
|
|
||||||
|
Today’s importer already supports:
|
||||||
|
|
||||||
|
- `raw/` and `wiki/` discovery
|
||||||
|
- markdown and log segmentation
|
||||||
|
- claim extraction with inline contradiction and supersession markers
|
||||||
|
- review queue generation
|
||||||
|
- review bundle export
|
||||||
|
|
||||||
|
Areas still planned:
|
||||||
|
|
||||||
|
- stronger re-import/update semantics
|
||||||
|
- more robust transcript and semi-structured document handling
|
||||||
|
- stronger large-corpus extraction and consolidation
|
||||||
|
|
||||||
|
## Recommended Promotion Rule
|
||||||
|
|
||||||
|
Treat imported wiki pages as derived artifacts.
|
||||||
|
|
||||||
|
That means:
|
||||||
|
|
||||||
|
- preserve them
|
||||||
|
- mine them for claims and concepts
|
||||||
|
- review what matters
|
||||||
|
- promote canonical claims and concepts into the store
|
||||||
|
|
||||||
|
This is the main difference between `GroundRecall` and a plain markdown wiki.
|
||||||
|
|
@ -0,0 +1,97 @@
|
||||||
|
# Quickstart
|
||||||
|
|
||||||
|
`GroundRecall` is a local-first grounded knowledge substrate for `llmwiki++`-style workflows.
|
||||||
|
|
||||||
|
This quickstart assumes a fresh checkout of the standalone repository.
|
||||||
|
|
||||||
|
## Install
|
||||||
|
|
||||||
|
```bash
|
||||||
|
pip install -e .
|
||||||
|
groundrecall --help
|
||||||
|
```
|
||||||
|
|
||||||
|
You can also use the module entry point:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
PYTHONPATH=src python -m groundrecall --help
|
||||||
|
```
|
||||||
|
|
||||||
|
## Import A Knowledge Source
|
||||||
|
|
||||||
|
Fast import from an `llmwiki`-style tree:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
groundrecall import /path/to/llmwiki --mode quick
|
||||||
|
```
|
||||||
|
|
||||||
|
More conservative import with stronger grounding expectations:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
groundrecall import /path/to/llmwiki --mode grounded
|
||||||
|
```
|
||||||
|
|
||||||
|
The importer writes normalized artifacts under `imports/<import-id>/`.
|
||||||
|
|
||||||
|
## Review And Promote
|
||||||
|
|
||||||
|
Inspect the import outputs:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
groundrecall lint imports/<import-id>
|
||||||
|
```
|
||||||
|
|
||||||
|
Promote the imported review artifacts into a canonical store:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
groundrecall promote imports/<import-id> store/
|
||||||
|
```
|
||||||
|
|
||||||
|
## Query The Canonical Store
|
||||||
|
|
||||||
|
Query a concept:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
groundrecall query store/ channel-capacity
|
||||||
|
```
|
||||||
|
|
||||||
|
Inspect the overall store:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
groundrecall inspect store/
|
||||||
|
```
|
||||||
|
|
||||||
|
## Export
|
||||||
|
|
||||||
|
Export assistant-neutral artifacts:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
groundrecall export store/ exports/groundrecall --concept channel-capacity
|
||||||
|
```
|
||||||
|
|
||||||
|
Export assistant-targeted bundles:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
groundrecall assistant-export store/ codex exports/codex --concept channel-capacity
|
||||||
|
groundrecall assistant-export store/ claude_code exports/claude --concept channel-capacity
|
||||||
|
```
|
||||||
|
|
||||||
|
## Default Working Layout
|
||||||
|
|
||||||
|
A simple local layout is:
|
||||||
|
|
||||||
|
```text
|
||||||
|
.groundrecall/
|
||||||
|
imports/
|
||||||
|
store/
|
||||||
|
exports/
|
||||||
|
events/
|
||||||
|
```
|
||||||
|
|
||||||
|
The current alpha does not require this exact layout, but it is a sensible starting point.
|
||||||
|
|
||||||
|
## Next Reading
|
||||||
|
|
||||||
|
- [architecture.md](architecture.md)
|
||||||
|
- [llmwiki-import.md](llmwiki-import.md)
|
||||||
|
- [sync-roadmap.md](sync-roadmap.md)
|
||||||
|
|
@ -0,0 +1,73 @@
|
||||||
|
# Sync Roadmap
|
||||||
|
|
||||||
|
The current standalone alpha is local-first. Sync and merge are planned next-stage features.
|
||||||
|
|
||||||
|
## Goal
|
||||||
|
|
||||||
|
Support these use cases cleanly:
|
||||||
|
|
||||||
|
- one user across multiple machines
|
||||||
|
- teams with shared and individual knowledge
|
||||||
|
- parallel corpus transformation and consolidation
|
||||||
|
|
||||||
|
## Planned Model
|
||||||
|
|
||||||
|
The intended model is:
|
||||||
|
|
||||||
|
- append-only event capture at the edge
|
||||||
|
- canonical promoted store as the durable reviewed state
|
||||||
|
- generated exports and assistant bundles as derived artifacts
|
||||||
|
|
||||||
|
This avoids treating compiled wiki pages or generated bundles as merge primitives.
|
||||||
|
|
||||||
|
## Likely Local Layout
|
||||||
|
|
||||||
|
```text
|
||||||
|
.groundrecall/
|
||||||
|
events/
|
||||||
|
imports/
|
||||||
|
store/
|
||||||
|
exports/
|
||||||
|
```
|
||||||
|
|
||||||
|
## Planned Phases
|
||||||
|
|
||||||
|
### Phase 1: Re-import And Update Semantics
|
||||||
|
|
||||||
|
- import the same source tree repeatedly without duplicating everything
|
||||||
|
- support import lineage and supersession
|
||||||
|
- track object continuity across imports
|
||||||
|
|
||||||
|
### Phase 2: Event Log Capture
|
||||||
|
|
||||||
|
- record machine-local observations and import events
|
||||||
|
- distinguish machine-local state from promoted shared state
|
||||||
|
- preserve provenance and timestamps explicitly
|
||||||
|
|
||||||
|
### Phase 3: Merge And Consolidation
|
||||||
|
|
||||||
|
- merge append-only events from multiple machines
|
||||||
|
- consolidate draft claims and review candidates
|
||||||
|
- preserve contradiction and supersession history
|
||||||
|
|
||||||
|
### Phase 4: Shared And Private Scopes
|
||||||
|
|
||||||
|
- private notes and private candidate knowledge
|
||||||
|
- shared promoted knowledge
|
||||||
|
- controlled promotion from private to shared
|
||||||
|
|
||||||
|
### Phase 5: Team And Corpus Workflows
|
||||||
|
|
||||||
|
- parallel ingestion over large corpora
|
||||||
|
- coordinated claim review and adjudication
|
||||||
|
- export of consolidated assistant-neutral snapshots
|
||||||
|
|
||||||
|
## Non-Goals For The Current Alpha
|
||||||
|
|
||||||
|
The current repo does not yet provide:
|
||||||
|
|
||||||
|
- real-time networked sync
|
||||||
|
- conflict-free replicated data types
|
||||||
|
- hosted review services
|
||||||
|
|
||||||
|
The next useful milestone is a practical local event-log and re-import model, not a full distributed platform in one step.
|
||||||
|
|
@ -0,0 +1,31 @@
|
||||||
|
[build-system]
|
||||||
|
requires = ["setuptools>=68", "wheel"]
|
||||||
|
build-backend = "setuptools.build_meta"
|
||||||
|
|
||||||
|
[project]
|
||||||
|
name = "groundrecall"
|
||||||
|
version = "0.1.0a0"
|
||||||
|
description = "Grounded knowledge substrate for llmwiki++ style workflows."
|
||||||
|
readme = "README.md"
|
||||||
|
requires-python = ">=3.10"
|
||||||
|
license = { text = "MIT" }
|
||||||
|
authors = [
|
||||||
|
{ name = "GroundRecall contributors" }
|
||||||
|
]
|
||||||
|
dependencies = [
|
||||||
|
"pydantic>=2,<3",
|
||||||
|
"PyYAML>=6,<7",
|
||||||
|
]
|
||||||
|
|
||||||
|
[project.scripts]
|
||||||
|
groundrecall = "groundrecall.cli:main"
|
||||||
|
|
||||||
|
[tool.setuptools]
|
||||||
|
package-dir = { "" = "src" }
|
||||||
|
|
||||||
|
[tool.setuptools.packages.find]
|
||||||
|
where = ["src"]
|
||||||
|
|
||||||
|
[tool.pytest.ini_options]
|
||||||
|
pythonpath = ["src"]
|
||||||
|
testpaths = ["tests"]
|
||||||
|
|
@ -0,0 +1,34 @@
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
from .inspect import inspect_store, summarize_store
|
||||||
|
from .ingest import ImportResult, build_parser as build_import_parser, main as import_main, run_groundrecall_import
|
||||||
|
from .models import * # noqa: F403
|
||||||
|
from .promotion import build_parser as build_promotion_parser, main as promotion_main, promote_import_to_store
|
||||||
|
from .query import (
|
||||||
|
build_parser as build_query_parser,
|
||||||
|
build_query_bundle_for_concept,
|
||||||
|
main as query_main,
|
||||||
|
query_concept,
|
||||||
|
query_provenance,
|
||||||
|
search_claims,
|
||||||
|
)
|
||||||
|
from .store import GroundRecallStore
|
||||||
|
|
||||||
|
__all__ = [
|
||||||
|
"GroundRecallStore",
|
||||||
|
"ImportResult",
|
||||||
|
"run_groundrecall_import",
|
||||||
|
"build_import_parser",
|
||||||
|
"import_main",
|
||||||
|
"promote_import_to_store",
|
||||||
|
"build_promotion_parser",
|
||||||
|
"promotion_main",
|
||||||
|
"query_concept",
|
||||||
|
"query_provenance",
|
||||||
|
"search_claims",
|
||||||
|
"build_query_bundle_for_concept",
|
||||||
|
"build_query_parser",
|
||||||
|
"query_main",
|
||||||
|
"summarize_store",
|
||||||
|
"inspect_store",
|
||||||
|
]
|
||||||
|
|
@ -0,0 +1,5 @@
|
||||||
|
from .cli import main
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
main()
|
||||||
|
|
@ -0,0 +1,81 @@
|
||||||
|
from typing import Any
|
||||||
|
from pydantic import BaseModel, Field
|
||||||
|
|
||||||
|
|
||||||
|
class DependencySpec(BaseModel):
|
||||||
|
name: str
|
||||||
|
min_version: str = "0.0.0"
|
||||||
|
max_version: str = "9999.9999.9999"
|
||||||
|
|
||||||
|
|
||||||
|
class MasteryProfileSpec(BaseModel):
|
||||||
|
template: str | None = None
|
||||||
|
required_dimensions: list[str] = Field(default_factory=list)
|
||||||
|
dimension_threshold_overrides: dict[str, float] = Field(default_factory=dict)
|
||||||
|
|
||||||
|
|
||||||
|
class CrossPackLinkSpec(BaseModel):
|
||||||
|
source_concept: str
|
||||||
|
target_concept: str
|
||||||
|
relation: str
|
||||||
|
|
||||||
|
|
||||||
|
class ProfileTemplateSpec(BaseModel):
|
||||||
|
required_dimensions: list[str] = Field(default_factory=list)
|
||||||
|
dimension_threshold_overrides: dict[str, float] = Field(default_factory=dict)
|
||||||
|
|
||||||
|
|
||||||
|
class PackManifest(BaseModel):
|
||||||
|
name: str
|
||||||
|
display_name: str
|
||||||
|
version: str
|
||||||
|
schema_version: str
|
||||||
|
didactopus_min_version: str
|
||||||
|
didactopus_max_version: str
|
||||||
|
description: str = ""
|
||||||
|
author: str = ""
|
||||||
|
license: str = "unspecified"
|
||||||
|
dependencies: list[DependencySpec] = Field(default_factory=list)
|
||||||
|
overrides: list[str] = Field(default_factory=list)
|
||||||
|
profile_templates: dict[str, ProfileTemplateSpec] = Field(default_factory=dict)
|
||||||
|
cross_pack_links: list[CrossPackLinkSpec] = Field(default_factory=list)
|
||||||
|
|
||||||
|
|
||||||
|
class ConceptEntry(BaseModel):
|
||||||
|
id: str
|
||||||
|
title: str
|
||||||
|
description: str = ""
|
||||||
|
prerequisites: list[str] = Field(default_factory=list)
|
||||||
|
mastery_signals: list[str] = Field(default_factory=list)
|
||||||
|
mastery_profile: MasteryProfileSpec = Field(default_factory=MasteryProfileSpec)
|
||||||
|
|
||||||
|
|
||||||
|
class ConceptsFile(BaseModel):
|
||||||
|
concepts: list[ConceptEntry]
|
||||||
|
|
||||||
|
|
||||||
|
class RoadmapStageEntry(BaseModel):
|
||||||
|
id: str
|
||||||
|
title: str
|
||||||
|
concepts: list[str] = Field(default_factory=list)
|
||||||
|
checkpoint: list[str] = Field(default_factory=list)
|
||||||
|
|
||||||
|
|
||||||
|
class RoadmapFile(BaseModel):
|
||||||
|
stages: list[RoadmapStageEntry]
|
||||||
|
|
||||||
|
|
||||||
|
class ProjectEntry(BaseModel):
|
||||||
|
id: str
|
||||||
|
title: str
|
||||||
|
difficulty: str = ""
|
||||||
|
prerequisites: list[str] = Field(default_factory=list)
|
||||||
|
deliverables: list[str] = Field(default_factory=list)
|
||||||
|
|
||||||
|
|
||||||
|
class ProjectsFile(BaseModel):
|
||||||
|
projects: list[ProjectEntry]
|
||||||
|
|
||||||
|
|
||||||
|
class RubricsFile(BaseModel):
|
||||||
|
rubrics: list[dict[str, Any]]
|
||||||
|
|
@ -0,0 +1,59 @@
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import argparse
|
||||||
|
import json
|
||||||
|
from pathlib import Path
|
||||||
|
from typing import Any
|
||||||
|
|
||||||
|
from .assistants.base import get_assistant_adapter
|
||||||
|
from .query import build_query_bundle_for_concept
|
||||||
|
from .store import GroundRecallStore
|
||||||
|
|
||||||
|
|
||||||
|
def export_assistant_bundle(
|
||||||
|
store_dir: str | Path,
|
||||||
|
assistant: str,
|
||||||
|
out_dir: str | Path,
|
||||||
|
concept_refs: list[str] | None = None,
|
||||||
|
) -> dict[str, Any]:
|
||||||
|
store = GroundRecallStore(store_dir)
|
||||||
|
snapshot = store.build_snapshot(
|
||||||
|
snapshot_id="assistant-export",
|
||||||
|
created_at="",
|
||||||
|
metadata={"export_kind": "assistant_adapter", "assistant": assistant},
|
||||||
|
).model_dump()
|
||||||
|
query_bundles = []
|
||||||
|
for concept_ref in concept_refs or []:
|
||||||
|
payload = build_query_bundle_for_concept(store_dir, concept_ref)
|
||||||
|
if payload is not None:
|
||||||
|
query_bundles.append(payload)
|
||||||
|
adapter = get_assistant_adapter(assistant)
|
||||||
|
paths = adapter.export_bundle(snapshot, query_bundles, out_dir)
|
||||||
|
manifest = {
|
||||||
|
"assistant": assistant,
|
||||||
|
"output_paths": [str(path) for path in paths],
|
||||||
|
"query_bundle_count": len(query_bundles),
|
||||||
|
}
|
||||||
|
Path(out_dir).mkdir(parents=True, exist_ok=True)
|
||||||
|
(Path(out_dir) / "assistant_export_manifest.json").write_text(json.dumps(manifest, indent=2), encoding="utf-8")
|
||||||
|
return manifest
|
||||||
|
|
||||||
|
|
||||||
|
def build_parser() -> argparse.ArgumentParser:
|
||||||
|
parser = argparse.ArgumentParser(description="Export assistant-specific GroundRecall bundles from canonical store data.")
|
||||||
|
parser.add_argument("store_dir")
|
||||||
|
parser.add_argument("assistant")
|
||||||
|
parser.add_argument("out_dir")
|
||||||
|
parser.add_argument("--concept", action="append", default=[])
|
||||||
|
return parser
|
||||||
|
|
||||||
|
|
||||||
|
def main() -> None:
|
||||||
|
args = build_parser().parse_args()
|
||||||
|
payload = export_assistant_bundle(
|
||||||
|
store_dir=args.store_dir,
|
||||||
|
assistant=args.assistant,
|
||||||
|
out_dir=args.out_dir,
|
||||||
|
concept_refs=list(args.concept or []),
|
||||||
|
)
|
||||||
|
print(json.dumps(payload, indent=2))
|
||||||
|
|
@ -0,0 +1,2 @@
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
|
@ -0,0 +1,3 @@
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
from ..groundrecall_assistants.base import * # noqa: F403
|
||||||
|
|
@ -0,0 +1,3 @@
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
from ..groundrecall_assistants.claude_code import * # noqa: F403
|
||||||
|
|
@ -0,0 +1,3 @@
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
from ..groundrecall_assistants.codex import * # noqa: F403
|
||||||
|
|
@ -0,0 +1,239 @@
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import json
|
||||||
|
from pathlib import Path
|
||||||
|
import re
|
||||||
|
from typing import Any
|
||||||
|
|
||||||
|
|
||||||
|
def _load_citegeist_symbols() -> dict[str, Any] | None:
|
||||||
|
import sys
|
||||||
|
|
||||||
|
citegeist_src = Path("/home/netuser/bin/CiteGeist/src")
|
||||||
|
if citegeist_src.exists():
|
||||||
|
sys.path.insert(0, str(citegeist_src))
|
||||||
|
try:
|
||||||
|
from citegeist.app_api import LiteratureExplorerApi # type: ignore
|
||||||
|
from citegeist.bibtex import BibEntry, parse_bibtex, render_bibtex # type: ignore
|
||||||
|
from citegeist.storage import BibliographyStore # type: ignore
|
||||||
|
except Exception:
|
||||||
|
return None
|
||||||
|
return {
|
||||||
|
"LiteratureExplorerApi": LiteratureExplorerApi,
|
||||||
|
"BibEntry": BibEntry,
|
||||||
|
"parse_bibtex": parse_bibtex,
|
||||||
|
"render_bibtex": render_bibtex,
|
||||||
|
"BibliographyStore": BibliographyStore,
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
def discover_bib_files(source_root: str | Path) -> list[Path]:
|
||||||
|
root = Path(source_root)
|
||||||
|
if not root.exists():
|
||||||
|
return []
|
||||||
|
candidates = [
|
||||||
|
path
|
||||||
|
for path in root.rglob("*.bib")
|
||||||
|
if path.is_file() and not path.name.endswith("-bak.bib") and not path.name.startswith(".")
|
||||||
|
]
|
||||||
|
|
||||||
|
def rank(path: Path) -> tuple[int, int, str]:
|
||||||
|
rel = path.relative_to(root)
|
||||||
|
name = path.name
|
||||||
|
if rel == Path("refs.bib"):
|
||||||
|
return (0, len(rel.parts), str(rel))
|
||||||
|
if rel == Path("biblio.bib"):
|
||||||
|
return (1, len(rel.parts), str(rel))
|
||||||
|
if name == "refs.bib":
|
||||||
|
return (2, len(rel.parts), str(rel))
|
||||||
|
if name == "biblio.bib":
|
||||||
|
return (3, len(rel.parts), str(rel))
|
||||||
|
return (4, len(rel.parts), str(rel))
|
||||||
|
|
||||||
|
return sorted(candidates, key=rank)
|
||||||
|
|
||||||
|
|
||||||
|
def load_bibliography_index(source_root: str | Path) -> dict[str, dict[str, Any]]:
|
||||||
|
symbols = _load_citegeist_symbols()
|
||||||
|
root = Path(source_root)
|
||||||
|
index: dict[str, dict[str, Any]] = {}
|
||||||
|
for bib_path in discover_bib_files(root):
|
||||||
|
try:
|
||||||
|
entries = _parse_bib_entries(bib_path.read_text(encoding="utf-8"), symbols=symbols)
|
||||||
|
except Exception:
|
||||||
|
continue
|
||||||
|
for entry in entries:
|
||||||
|
raw_bibtex = _render_entry_bibtex(entry, symbols=symbols)
|
||||||
|
payload = {
|
||||||
|
"citation_key": entry.citation_key,
|
||||||
|
"entry_type": entry.entry_type,
|
||||||
|
"fields": dict(entry.fields),
|
||||||
|
"source_bib_path": str(bib_path.relative_to(root)),
|
||||||
|
"raw_bibtex": raw_bibtex,
|
||||||
|
"duplicate_source_bib_paths": [],
|
||||||
|
}
|
||||||
|
existing = index.get(entry.citation_key)
|
||||||
|
if existing is None:
|
||||||
|
index[entry.citation_key] = payload
|
||||||
|
else:
|
||||||
|
existing.setdefault("duplicate_source_bib_paths", []).append(str(bib_path.relative_to(root)))
|
||||||
|
return index
|
||||||
|
|
||||||
|
|
||||||
|
def materialize_citegeist_store(import_dir: str | Path, source_root: str | Path) -> dict[str, Any]:
|
||||||
|
symbols = _load_citegeist_symbols()
|
||||||
|
if symbols is None:
|
||||||
|
return {"available": False}
|
||||||
|
BibliographyStore = symbols["BibliographyStore"]
|
||||||
|
LiteratureExplorerApi = symbols["LiteratureExplorerApi"]
|
||||||
|
|
||||||
|
import_root = Path(import_dir)
|
||||||
|
db_path = import_root / "citegeist.sqlite3"
|
||||||
|
if db_path.exists():
|
||||||
|
db_path.unlink()
|
||||||
|
store = BibliographyStore(db_path)
|
||||||
|
ingested_files: list[str] = []
|
||||||
|
for bib_path in discover_bib_files(source_root):
|
||||||
|
try:
|
||||||
|
text = bib_path.read_text(encoding="utf-8")
|
||||||
|
entries = _parse_bib_entries(text, symbols=symbols)
|
||||||
|
for entry in entries:
|
||||||
|
store.upsert_entry(
|
||||||
|
entry,
|
||||||
|
raw_bibtex=_render_entry_bibtex(entry, symbols=symbols),
|
||||||
|
source_type="bibtex",
|
||||||
|
source_label=str(bib_path.relative_to(Path(source_root))),
|
||||||
|
review_status="draft",
|
||||||
|
)
|
||||||
|
store.connection.commit()
|
||||||
|
ingested_files.append(str(bib_path.relative_to(Path(source_root))))
|
||||||
|
except Exception:
|
||||||
|
continue
|
||||||
|
api = LiteratureExplorerApi(store)
|
||||||
|
return {
|
||||||
|
"available": True,
|
||||||
|
"db_path": str(db_path),
|
||||||
|
"ingested_files": ingested_files,
|
||||||
|
"api": api,
|
||||||
|
"store": store,
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
def bibliography_summary_payload(source_root: str | Path) -> dict[str, Any]:
|
||||||
|
index = load_bibliography_index(source_root)
|
||||||
|
source_files = discover_bib_files(source_root)
|
||||||
|
return {
|
||||||
|
"enabled": bool(index),
|
||||||
|
"entry_count": len(index),
|
||||||
|
"source_files": [str(path.relative_to(Path(source_root))) for path in source_files],
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
def serialize_bib_entry(entry: dict[str, Any] | None) -> dict[str, Any] | None:
|
||||||
|
if entry is None:
|
||||||
|
return None
|
||||||
|
return {
|
||||||
|
"citation_key": entry.get("citation_key", ""),
|
||||||
|
"entry_type": entry.get("entry_type", ""),
|
||||||
|
"fields": dict(entry.get("fields", {})),
|
||||||
|
"source_bib_path": entry.get("source_bib_path", ""),
|
||||||
|
"raw_bibtex": entry.get("raw_bibtex", ""),
|
||||||
|
"duplicate_source_bib_paths": list(entry.get("duplicate_source_bib_paths", [])),
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
def serialize_citegeist_entry_payload(payload: dict[str, Any] | None) -> dict[str, Any] | None:
|
||||||
|
if payload is None:
|
||||||
|
return None
|
||||||
|
result = dict(payload)
|
||||||
|
if "raw_bibtex" in result and isinstance(result["raw_bibtex"], str):
|
||||||
|
return result
|
||||||
|
return json.loads(json.dumps(result))
|
||||||
|
|
||||||
|
|
||||||
|
def _parse_bib_entries(text: str, *, symbols: dict[str, Any] | None) -> list[Any]:
|
||||||
|
if symbols is not None:
|
||||||
|
try:
|
||||||
|
return symbols["parse_bibtex"](text)
|
||||||
|
except Exception:
|
||||||
|
pass
|
||||||
|
return _fallback_parse_bibtex(text, symbols=symbols)
|
||||||
|
|
||||||
|
|
||||||
|
def _render_entry_bibtex(entry: Any, *, symbols: dict[str, Any] | None) -> str:
|
||||||
|
if symbols is not None:
|
||||||
|
try:
|
||||||
|
return symbols["render_bibtex"]([entry])
|
||||||
|
except Exception:
|
||||||
|
pass
|
||||||
|
fields = []
|
||||||
|
for key, value in entry.fields.items():
|
||||||
|
fields.append(f" {key} = {{{value}}}")
|
||||||
|
body = ",\n".join(fields)
|
||||||
|
return f"@{entry.entry_type}{{{entry.citation_key},\n{body}\n}}"
|
||||||
|
|
||||||
|
|
||||||
|
def _fallback_parse_bibtex(text: str, *, symbols: dict[str, Any] | None) -> list[Any]:
|
||||||
|
BibEntry = symbols["BibEntry"] if symbols is not None else None
|
||||||
|
entries: list[Any] = []
|
||||||
|
pattern = re.compile(r"@(?P<entry_type>[A-Za-z]+)\s*\{\s*(?P<citation_key>[^,\s]+)\s*,", re.MULTILINE)
|
||||||
|
matches = list(pattern.finditer(text))
|
||||||
|
for index, match in enumerate(matches):
|
||||||
|
start = match.end()
|
||||||
|
end = matches[index + 1].start() if index + 1 < len(matches) else len(text)
|
||||||
|
body = text[start:end]
|
||||||
|
fields = _fallback_parse_fields(body)
|
||||||
|
if BibEntry is not None:
|
||||||
|
entries.append(BibEntry(entry_type=match.group("entry_type").lower(), citation_key=match.group("citation_key").strip(), fields=fields))
|
||||||
|
else:
|
||||||
|
entries.append(type("BibEntryFallback", (), {"entry_type": match.group("entry_type").lower(), "citation_key": match.group("citation_key").strip(), "fields": fields})())
|
||||||
|
return entries
|
||||||
|
|
||||||
|
|
||||||
|
def _fallback_parse_fields(body: str) -> dict[str, str]:
|
||||||
|
fields: dict[str, str] = {}
|
||||||
|
index = 0
|
||||||
|
length = len(body)
|
||||||
|
while index < length:
|
||||||
|
while index < length and body[index] in " \t\r\n,":
|
||||||
|
index += 1
|
||||||
|
if index >= length or body[index] == "}":
|
||||||
|
break
|
||||||
|
key_start = index
|
||||||
|
while index < length and re.match(r"[A-Za-z0-9_:-]", body[index]):
|
||||||
|
index += 1
|
||||||
|
key = body[key_start:index].strip().lower()
|
||||||
|
while index < length and body[index] in " \t\r\n=":
|
||||||
|
index += 1
|
||||||
|
value = ""
|
||||||
|
if index < length and body[index] == "{":
|
||||||
|
depth = 1
|
||||||
|
index += 1
|
||||||
|
value_start = index
|
||||||
|
while index < length and depth > 0:
|
||||||
|
if body[index] == "{":
|
||||||
|
depth += 1
|
||||||
|
elif body[index] == "}":
|
||||||
|
depth -= 1
|
||||||
|
if depth == 0:
|
||||||
|
break
|
||||||
|
index += 1
|
||||||
|
value = body[value_start:index].strip()
|
||||||
|
index += 1
|
||||||
|
elif index < length and body[index] == '"':
|
||||||
|
index += 1
|
||||||
|
value_start = index
|
||||||
|
while index < length and body[index] != '"':
|
||||||
|
if body[index] == "\\":
|
||||||
|
index += 1
|
||||||
|
index += 1
|
||||||
|
value = body[value_start:index].strip()
|
||||||
|
index += 1
|
||||||
|
else:
|
||||||
|
value_start = index
|
||||||
|
while index < length and body[index] not in ",\n":
|
||||||
|
index += 1
|
||||||
|
value = body[value_start:index].strip()
|
||||||
|
if key:
|
||||||
|
fields[key] = value.rstrip(",")
|
||||||
|
return fields
|
||||||
|
|
@ -0,0 +1,40 @@
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import argparse
|
||||||
|
import sys
|
||||||
|
|
||||||
|
from . import assistant_export, export, ingest, inspect, lint, promotion, query, review_server
|
||||||
|
|
||||||
|
|
||||||
|
COMMANDS = {
|
||||||
|
"import": ingest.main,
|
||||||
|
"lint": lint.main,
|
||||||
|
"promote": promotion.main,
|
||||||
|
"query": query.main,
|
||||||
|
"export": export.main,
|
||||||
|
"assistant-export": assistant_export.main,
|
||||||
|
"inspect": inspect.main,
|
||||||
|
"review-server": review_server.main,
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
def build_parser() -> argparse.ArgumentParser:
|
||||||
|
parser = argparse.ArgumentParser(description="GroundRecall command-line tools")
|
||||||
|
parser.add_argument("command", nargs="?", choices=sorted(COMMANDS))
|
||||||
|
return parser
|
||||||
|
|
||||||
|
|
||||||
|
def main() -> None:
|
||||||
|
argv = sys.argv[1:]
|
||||||
|
parser = build_parser()
|
||||||
|
args, remainder = parser.parse_known_args(argv)
|
||||||
|
if not args.command:
|
||||||
|
parser.print_help()
|
||||||
|
return
|
||||||
|
handler = COMMANDS[args.command]
|
||||||
|
original_argv = sys.argv
|
||||||
|
try:
|
||||||
|
sys.argv = [f"groundrecall.cli {args.command}", *remainder]
|
||||||
|
handler()
|
||||||
|
finally:
|
||||||
|
sys.argv = original_argv
|
||||||
|
|
@ -0,0 +1,136 @@
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import argparse
|
||||||
|
import json
|
||||||
|
from datetime import datetime, timezone
|
||||||
|
from pathlib import Path
|
||||||
|
from typing import Any
|
||||||
|
|
||||||
|
from .query import build_query_bundle_for_concept
|
||||||
|
from .store import GroundRecallStore
|
||||||
|
|
||||||
|
|
||||||
|
def _now() -> str:
|
||||||
|
return datetime.now(timezone.utc).replace(microsecond=0).isoformat().replace("+00:00", "Z")
|
||||||
|
|
||||||
|
|
||||||
|
def _write_json(path: Path, payload: dict[str, Any]) -> None:
|
||||||
|
path.write_text(json.dumps(payload, indent=2), encoding="utf-8")
|
||||||
|
|
||||||
|
|
||||||
|
def _write_jsonl(path: Path, rows: list[dict[str, Any]]) -> None:
|
||||||
|
text = "\n".join(json.dumps(row, sort_keys=True) for row in rows)
|
||||||
|
if text:
|
||||||
|
text += "\n"
|
||||||
|
path.write_text(text, encoding="utf-8")
|
||||||
|
|
||||||
|
|
||||||
|
def export_canonical_snapshot(
|
||||||
|
store_dir: str | Path,
|
||||||
|
out_dir: str | Path,
|
||||||
|
snapshot_id: str | None = None,
|
||||||
|
metadata: dict[str, Any] | None = None,
|
||||||
|
) -> dict[str, str]:
|
||||||
|
store = GroundRecallStore(store_dir)
|
||||||
|
target = Path(out_dir)
|
||||||
|
target.mkdir(parents=True, exist_ok=True)
|
||||||
|
|
||||||
|
actual_snapshot_id = snapshot_id or f"snapshot-export-{datetime.now(timezone.utc).strftime('%Y%m%dT%H%M%SZ')}"
|
||||||
|
snapshot = store.build_snapshot(
|
||||||
|
snapshot_id=actual_snapshot_id,
|
||||||
|
created_at=_now(),
|
||||||
|
metadata={"export_kind": "canonical", **(metadata or {})},
|
||||||
|
)
|
||||||
|
store.save_snapshot(snapshot)
|
||||||
|
|
||||||
|
snapshot_path = target / "groundrecall_snapshot.json"
|
||||||
|
_write_json(snapshot_path, snapshot.model_dump())
|
||||||
|
_write_jsonl(target / "claims.jsonl", [item.model_dump() for item in snapshot.claims])
|
||||||
|
_write_jsonl(target / "concepts.jsonl", [item.model_dump() for item in snapshot.concepts])
|
||||||
|
_write_jsonl(target / "relations.jsonl", [item.model_dump() for item in snapshot.relations])
|
||||||
|
provenance_manifest = {
|
||||||
|
"snapshot_id": snapshot.snapshot_id,
|
||||||
|
"created_at": snapshot.created_at,
|
||||||
|
"source_count": len(snapshot.sources),
|
||||||
|
"artifact_count": len(snapshot.artifacts),
|
||||||
|
"observation_count": len(snapshot.observations),
|
||||||
|
}
|
||||||
|
_write_json(target / "provenance_manifest.json", provenance_manifest)
|
||||||
|
manifest = {
|
||||||
|
"export_kind": "canonical",
|
||||||
|
"snapshot_id": snapshot.snapshot_id,
|
||||||
|
"files": [
|
||||||
|
"groundrecall_snapshot.json",
|
||||||
|
"claims.jsonl",
|
||||||
|
"concepts.jsonl",
|
||||||
|
"relations.jsonl",
|
||||||
|
"provenance_manifest.json",
|
||||||
|
],
|
||||||
|
}
|
||||||
|
_write_json(target / "export_manifest.json", manifest)
|
||||||
|
return {
|
||||||
|
"snapshot_json": str(snapshot_path),
|
||||||
|
"claims_jsonl": str(target / "claims.jsonl"),
|
||||||
|
"concepts_jsonl": str(target / "concepts.jsonl"),
|
||||||
|
"relations_jsonl": str(target / "relations.jsonl"),
|
||||||
|
"provenance_manifest_json": str(target / "provenance_manifest.json"),
|
||||||
|
"export_manifest_json": str(target / "export_manifest.json"),
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
def export_query_bundle(
|
||||||
|
store_dir: str | Path,
|
||||||
|
concept_ref: str,
|
||||||
|
out_path: str | Path,
|
||||||
|
) -> dict[str, Any]:
|
||||||
|
payload = build_query_bundle_for_concept(store_dir, concept_ref)
|
||||||
|
if payload is None:
|
||||||
|
raise KeyError(f"Unknown concept reference: {concept_ref}")
|
||||||
|
path = Path(out_path)
|
||||||
|
path.parent.mkdir(parents=True, exist_ok=True)
|
||||||
|
_write_json(path, payload)
|
||||||
|
return payload
|
||||||
|
|
||||||
|
|
||||||
|
def export_canonical_bundle(
|
||||||
|
store_dir: str | Path,
|
||||||
|
out_dir: str | Path,
|
||||||
|
concept_refs: list[str] | None = None,
|
||||||
|
snapshot_id: str | None = None,
|
||||||
|
) -> dict[str, Any]:
|
||||||
|
target = Path(out_dir)
|
||||||
|
target.mkdir(parents=True, exist_ok=True)
|
||||||
|
outputs = export_canonical_snapshot(store_dir, target, snapshot_id=snapshot_id)
|
||||||
|
query_bundle_paths: list[str] = []
|
||||||
|
for concept_ref in concept_refs or []:
|
||||||
|
safe_name = concept_ref.lower().replace(" ", "-").replace("::", "-")
|
||||||
|
bundle_path = target / f"query_bundle__{safe_name}.json"
|
||||||
|
export_query_bundle(store_dir, concept_ref, bundle_path)
|
||||||
|
query_bundle_paths.append(str(bundle_path))
|
||||||
|
manifest = json.loads((target / "export_manifest.json").read_text(encoding="utf-8"))
|
||||||
|
manifest["query_bundles"] = query_bundle_paths
|
||||||
|
_write_json(target / "export_manifest.json", manifest)
|
||||||
|
return {
|
||||||
|
"canonical_outputs": outputs,
|
||||||
|
"query_bundles": query_bundle_paths,
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
def build_parser() -> argparse.ArgumentParser:
|
||||||
|
parser = argparse.ArgumentParser(description="Export canonical GroundRecall artifacts.")
|
||||||
|
parser.add_argument("store_dir")
|
||||||
|
parser.add_argument("out_dir")
|
||||||
|
parser.add_argument("--snapshot-id", default=None)
|
||||||
|
parser.add_argument("--concept", action="append", default=[])
|
||||||
|
return parser
|
||||||
|
|
||||||
|
|
||||||
|
def main() -> None:
|
||||||
|
args = build_parser().parse_args()
|
||||||
|
payload = export_canonical_bundle(
|
||||||
|
store_dir=args.store_dir,
|
||||||
|
out_dir=args.out_dir,
|
||||||
|
concept_refs=list(args.concept or []),
|
||||||
|
snapshot_id=args.snapshot_id,
|
||||||
|
)
|
||||||
|
print(json.dumps(payload, indent=2))
|
||||||
|
|
@ -0,0 +1,12 @@
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
"""Legacy flat GroundRecall assistant export module.
|
||||||
|
|
||||||
|
Compatibility path retained during the internal namespace migration.
|
||||||
|
Prefer imports under ``didactopus.groundrecall.assistant_export`` or CLI usage
|
||||||
|
via ``didactopus.groundrecall.cli`` for new code.
|
||||||
|
"""
|
||||||
|
|
||||||
|
from .groundrecall.assistant_export import build_parser, export_assistant_bundle, main
|
||||||
|
|
||||||
|
__all__ = ["export_assistant_bundle", "build_parser", "main"]
|
||||||
|
|
@ -0,0 +1,9 @@
|
||||||
|
"""Legacy flat GroundRecall assistants package.
|
||||||
|
|
||||||
|
Compatibility path retained during the internal namespace migration.
|
||||||
|
Prefer imports under ``didactopus.groundrecall.assistants`` for new code.
|
||||||
|
"""
|
||||||
|
|
||||||
|
from .base import get_assistant_adapter, list_assistant_adapters
|
||||||
|
|
||||||
|
__all__ = ["get_assistant_adapter", "list_assistant_adapters"]
|
||||||
|
|
@ -0,0 +1,43 @@
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
"""Legacy flat GroundRecall assistant adapter base module.
|
||||||
|
|
||||||
|
Compatibility path retained during the internal namespace migration.
|
||||||
|
Prefer imports under ``didactopus.groundrecall.assistants.base`` for new code.
|
||||||
|
"""
|
||||||
|
|
||||||
|
from dataclasses import dataclass
|
||||||
|
from pathlib import Path
|
||||||
|
from typing import Protocol
|
||||||
|
|
||||||
|
|
||||||
|
class AssistantAdapter(Protocol):
|
||||||
|
name: str
|
||||||
|
|
||||||
|
def export_bundle(self, snapshot: dict, query_bundles: list[dict], out_dir: str | Path) -> list[Path]:
|
||||||
|
...
|
||||||
|
|
||||||
|
def build_context(self, query_result: dict) -> dict:
|
||||||
|
...
|
||||||
|
|
||||||
|
def supported_capabilities(self) -> dict[str, bool]:
|
||||||
|
...
|
||||||
|
|
||||||
|
|
||||||
|
_REGISTRY: dict[str, AssistantAdapter] = {}
|
||||||
|
|
||||||
|
|
||||||
|
def register_assistant_adapter(adapter: AssistantAdapter) -> AssistantAdapter:
|
||||||
|
_REGISTRY[adapter.name] = adapter
|
||||||
|
return adapter
|
||||||
|
|
||||||
|
|
||||||
|
def get_assistant_adapter(name: str) -> AssistantAdapter:
|
||||||
|
try:
|
||||||
|
return _REGISTRY[name]
|
||||||
|
except KeyError as exc:
|
||||||
|
raise KeyError(f"Unknown assistant adapter: {name}") from exc
|
||||||
|
|
||||||
|
|
||||||
|
def list_assistant_adapters() -> list[str]:
|
||||||
|
return sorted(_REGISTRY)
|
||||||
|
|
@ -0,0 +1,69 @@
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import json
|
||||||
|
from pathlib import Path
|
||||||
|
|
||||||
|
from .base import register_assistant_adapter
|
||||||
|
|
||||||
|
|
||||||
|
class ClaudeCodeAdapter:
|
||||||
|
name = "claude_code"
|
||||||
|
|
||||||
|
def export_bundle(self, snapshot: dict, query_bundles: list[dict], out_dir: str | Path) -> list[Path]:
|
||||||
|
target = Path(out_dir)
|
||||||
|
target.mkdir(parents=True, exist_ok=True)
|
||||||
|
paths: list[Path] = []
|
||||||
|
|
||||||
|
memory_md = "\n".join(
|
||||||
|
[
|
||||||
|
"# GroundRecall Memory",
|
||||||
|
"",
|
||||||
|
f"- Snapshot: `{snapshot.get('snapshot_id', '')}`",
|
||||||
|
f"- Concepts: {len(snapshot.get('concepts', []))}",
|
||||||
|
f"- Claims: {len(snapshot.get('claims', []))}",
|
||||||
|
"",
|
||||||
|
"Prefer the canonical GroundRecall snapshot and query bundles over free-form recollection.",
|
||||||
|
"",
|
||||||
|
"## Query Bundles",
|
||||||
|
]
|
||||||
|
+ [f"- `{bundle.get('concept', {}).get('concept_id', 'unknown')}`" for bundle in query_bundles]
|
||||||
|
)
|
||||||
|
memory_path = target / "CLAUDE.md"
|
||||||
|
memory_path.write_text(memory_md, encoding="utf-8")
|
||||||
|
paths.append(memory_path)
|
||||||
|
|
||||||
|
bundle_path = target / "claude_code_bundle.json"
|
||||||
|
bundle_path.write_text(
|
||||||
|
json.dumps(
|
||||||
|
{
|
||||||
|
"assistant": "claude_code",
|
||||||
|
"snapshot_id": snapshot.get("snapshot_id", ""),
|
||||||
|
"query_bundle_count": len(query_bundles),
|
||||||
|
"query_bundles": query_bundles,
|
||||||
|
},
|
||||||
|
indent=2,
|
||||||
|
),
|
||||||
|
encoding="utf-8",
|
||||||
|
)
|
||||||
|
paths.append(bundle_path)
|
||||||
|
return paths
|
||||||
|
|
||||||
|
def build_context(self, query_result: dict) -> dict:
|
||||||
|
return {
|
||||||
|
"assistant": "claude_code",
|
||||||
|
"memory_kind": "groundrecall_query_bundle",
|
||||||
|
"concept": query_result.get("concept", {}),
|
||||||
|
"claims": query_result.get("relevant_claims", []),
|
||||||
|
"support": query_result.get("supporting_observations", []),
|
||||||
|
"next_actions": query_result.get("suggested_next_actions", []),
|
||||||
|
}
|
||||||
|
|
||||||
|
def supported_capabilities(self) -> dict[str, bool]:
|
||||||
|
return {
|
||||||
|
"skill_markdown": False,
|
||||||
|
"json_bundle": True,
|
||||||
|
"project_memory": True,
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
register_assistant_adapter(ClaudeCodeAdapter())
|
||||||
|
|
@ -0,0 +1,78 @@
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import json
|
||||||
|
from pathlib import Path
|
||||||
|
|
||||||
|
from .base import register_assistant_adapter
|
||||||
|
|
||||||
|
|
||||||
|
class CodexAdapter:
|
||||||
|
name = "codex"
|
||||||
|
|
||||||
|
def export_bundle(self, snapshot: dict, query_bundles: list[dict], out_dir: str | Path) -> list[Path]:
|
||||||
|
target = Path(out_dir)
|
||||||
|
target.mkdir(parents=True, exist_ok=True)
|
||||||
|
paths: list[Path] = []
|
||||||
|
|
||||||
|
skill_payload = {
|
||||||
|
"name": f"groundrecall-{snapshot.get('snapshot_id', 'snapshot')}",
|
||||||
|
"description": "GroundRecall assistant adapter bundle for Codex.",
|
||||||
|
"snapshot_id": snapshot.get("snapshot_id", ""),
|
||||||
|
"concept_count": len(snapshot.get("concepts", [])),
|
||||||
|
"claim_count": len(snapshot.get("claims", [])),
|
||||||
|
}
|
||||||
|
skill_md = "\n".join(
|
||||||
|
[
|
||||||
|
"---",
|
||||||
|
f"name: {skill_payload['name']}",
|
||||||
|
f"description: {skill_payload['description']}",
|
||||||
|
"---",
|
||||||
|
"",
|
||||||
|
"# GroundRecall Codex Bundle",
|
||||||
|
"",
|
||||||
|
f"- Snapshot: `{skill_payload['snapshot_id']}`",
|
||||||
|
f"- Concepts: {skill_payload['concept_count']}",
|
||||||
|
f"- Claims: {skill_payload['claim_count']}",
|
||||||
|
"",
|
||||||
|
"Use the accompanying canonical JSON and query bundles as the primary source of grounded context.",
|
||||||
|
]
|
||||||
|
)
|
||||||
|
skill_path = target / "SKILL.md"
|
||||||
|
skill_path.write_text(skill_md, encoding="utf-8")
|
||||||
|
paths.append(skill_path)
|
||||||
|
|
||||||
|
bundle_path = target / "codex_bundle.json"
|
||||||
|
bundle_path.write_text(
|
||||||
|
json.dumps(
|
||||||
|
{
|
||||||
|
"assistant": "codex",
|
||||||
|
"snapshot_id": snapshot.get("snapshot_id", ""),
|
||||||
|
"query_bundle_count": len(query_bundles),
|
||||||
|
"query_bundles": query_bundles,
|
||||||
|
},
|
||||||
|
indent=2,
|
||||||
|
),
|
||||||
|
encoding="utf-8",
|
||||||
|
)
|
||||||
|
paths.append(bundle_path)
|
||||||
|
return paths
|
||||||
|
|
||||||
|
def build_context(self, query_result: dict) -> dict:
|
||||||
|
return {
|
||||||
|
"assistant": "codex",
|
||||||
|
"context_kind": "groundrecall_query_bundle",
|
||||||
|
"concept": query_result.get("concept", {}),
|
||||||
|
"relevant_claims": query_result.get("relevant_claims", []),
|
||||||
|
"supporting_observations": query_result.get("supporting_observations", []),
|
||||||
|
"suggested_next_actions": query_result.get("suggested_next_actions", []),
|
||||||
|
}
|
||||||
|
|
||||||
|
def supported_capabilities(self) -> dict[str, bool]:
|
||||||
|
return {
|
||||||
|
"skill_markdown": True,
|
||||||
|
"json_bundle": True,
|
||||||
|
"project_memory": False,
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
register_assistant_adapter(CodexAdapter())
|
||||||
|
|
@ -0,0 +1,54 @@
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
from dataclasses import dataclass
|
||||||
|
from pathlib import Path
|
||||||
|
|
||||||
|
|
||||||
|
TEXT_EXTENSIONS = {
|
||||||
|
".md",
|
||||||
|
".markdown",
|
||||||
|
".txt",
|
||||||
|
".tex",
|
||||||
|
".json",
|
||||||
|
".yaml",
|
||||||
|
".yml",
|
||||||
|
".csv",
|
||||||
|
".log",
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class DiscoveredArtifact:
|
||||||
|
path: Path
|
||||||
|
relative_path: str
|
||||||
|
artifact_kind: str
|
||||||
|
is_text: bool
|
||||||
|
|
||||||
|
|
||||||
|
def classify_artifact(root: Path, path: Path) -> DiscoveredArtifact:
|
||||||
|
rel = path.relative_to(root).as_posix()
|
||||||
|
top = rel.split("/", 1)[0]
|
||||||
|
suffix = path.suffix.lower()
|
||||||
|
is_text = suffix in TEXT_EXTENSIONS or path.name in {"README", "LICENSE"}
|
||||||
|
artifact_kind = "generic_artifact"
|
||||||
|
if top == "wiki":
|
||||||
|
artifact_kind = "compiled_page"
|
||||||
|
elif top in {"raw", "sources"}:
|
||||||
|
artifact_kind = "raw_note"
|
||||||
|
elif top == "logs":
|
||||||
|
artifact_kind = "session_log"
|
||||||
|
elif path.name.startswith("schema."):
|
||||||
|
artifact_kind = "schema_file"
|
||||||
|
elif suffix in {".md", ".markdown"}:
|
||||||
|
artifact_kind = "markdown_note"
|
||||||
|
return DiscoveredArtifact(path=path, relative_path=rel, artifact_kind=artifact_kind, is_text=is_text)
|
||||||
|
|
||||||
|
|
||||||
|
def discover_llmwiki_artifacts(root: str | Path) -> list[DiscoveredArtifact]:
|
||||||
|
base = Path(root)
|
||||||
|
artifacts: list[DiscoveredArtifact] = []
|
||||||
|
for path in sorted(p for p in base.rglob("*") if p.is_file()):
|
||||||
|
if any(part in {".git", "__pycache__", ".pytest_cache"} for part in path.parts):
|
||||||
|
continue
|
||||||
|
artifacts.append(classify_artifact(base, path))
|
||||||
|
return artifacts
|
||||||
|
|
@ -0,0 +1,24 @@
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
"""Legacy flat GroundRecall export module.
|
||||||
|
|
||||||
|
Compatibility path retained during the internal namespace migration.
|
||||||
|
Prefer imports under ``didactopus.groundrecall.export`` or CLI usage via
|
||||||
|
``didactopus.groundrecall.cli`` for new code.
|
||||||
|
"""
|
||||||
|
|
||||||
|
from .groundrecall.export import (
|
||||||
|
build_parser,
|
||||||
|
export_canonical_bundle,
|
||||||
|
export_canonical_snapshot,
|
||||||
|
export_query_bundle,
|
||||||
|
main,
|
||||||
|
)
|
||||||
|
|
||||||
|
__all__ = [
|
||||||
|
"export_canonical_snapshot",
|
||||||
|
"export_query_bundle",
|
||||||
|
"export_canonical_bundle",
|
||||||
|
"build_parser",
|
||||||
|
"main",
|
||||||
|
]
|
||||||
|
|
@ -0,0 +1,9 @@
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
"""Legacy extracted GroundRecall import module.
|
||||||
|
|
||||||
|
Compatibility path retained while the standalone repo converges on the
|
||||||
|
top-level ``groundrecall.ingest`` module as the primary implementation.
|
||||||
|
"""
|
||||||
|
|
||||||
|
from .ingest import ImportResult, build_parser, main, run_groundrecall_import
|
||||||
|
|
@ -0,0 +1,9 @@
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
"""Legacy extracted GroundRecall lint module.
|
||||||
|
|
||||||
|
Compatibility path retained while the standalone repo converges on the
|
||||||
|
top-level ``groundrecall.lint`` module as the primary implementation.
|
||||||
|
"""
|
||||||
|
|
||||||
|
from .lint import build_parser, lint_import_directory, main
|
||||||
|
|
@ -0,0 +1,9 @@
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
"""Legacy extracted GroundRecall models module.
|
||||||
|
|
||||||
|
Compatibility path retained while the standalone repo converges on the
|
||||||
|
top-level ``groundrecall.models`` module as the primary implementation.
|
||||||
|
"""
|
||||||
|
|
||||||
|
from .models import * # noqa: F403
|
||||||
|
|
@ -0,0 +1,136 @@
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
from dataclasses import asdict, dataclass
|
||||||
|
from hashlib import sha256
|
||||||
|
from pathlib import Path
|
||||||
|
from typing import Any
|
||||||
|
|
||||||
|
from .groundrecall_discovery import DiscoveredArtifact
|
||||||
|
from .groundrecall_segmenter import SegmentedPage, SegmentedObservation
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class ImportContext:
|
||||||
|
import_id: str
|
||||||
|
import_mode: str
|
||||||
|
machine_id: str
|
||||||
|
agent_id: str
|
||||||
|
source_root: str
|
||||||
|
imported_at: str
|
||||||
|
|
||||||
|
|
||||||
|
def _sanitize_claim_key(value: str) -> str:
|
||||||
|
text = "".join(ch.lower() if ch.isalnum() else "-" for ch in value).strip("-")
|
||||||
|
return text or "claim"
|
||||||
|
|
||||||
|
|
||||||
|
def _claim_id_for_observation(observation_record: dict[str, Any], observation: SegmentedObservation, index: int) -> str:
|
||||||
|
if observation.explicit_claim_key:
|
||||||
|
return f"clm_{_sanitize_claim_key(observation.explicit_claim_key)}"
|
||||||
|
return f"clm_{observation_record['observation_id']}_{index}"
|
||||||
|
|
||||||
|
|
||||||
|
def build_artifact_record(context: ImportContext, artifact: DiscoveredArtifact, page: SegmentedPage | None) -> dict[str, Any]:
|
||||||
|
record = {
|
||||||
|
"artifact_id": f"ia_{sha256(artifact.relative_path.encode('utf-8')).hexdigest()[:12]}",
|
||||||
|
"import_id": context.import_id,
|
||||||
|
"artifact_kind": artifact.artifact_kind,
|
||||||
|
"path": artifact.relative_path,
|
||||||
|
"title": page.title if page else Path(artifact.relative_path).stem,
|
||||||
|
"sha256": sha256(artifact.path.read_bytes()).hexdigest(),
|
||||||
|
"created_at": context.imported_at,
|
||||||
|
"metadata": {
|
||||||
|
"frontmatter": page.frontmatter if page else {},
|
||||||
|
"headings": page.headings if page else [],
|
||||||
|
},
|
||||||
|
"current_status": "draft",
|
||||||
|
}
|
||||||
|
return record
|
||||||
|
|
||||||
|
|
||||||
|
def build_observation_record(
|
||||||
|
context: ImportContext,
|
||||||
|
artifact_record: dict[str, Any],
|
||||||
|
observation: SegmentedObservation,
|
||||||
|
index: int,
|
||||||
|
) -> dict[str, Any]:
|
||||||
|
return {
|
||||||
|
"observation_id": f"obs_{artifact_record['artifact_id']}_{index}",
|
||||||
|
"import_id": context.import_id,
|
||||||
|
"artifact_id": artifact_record["artifact_id"],
|
||||||
|
"role": observation.role,
|
||||||
|
"text": observation.text,
|
||||||
|
"origin_path": observation.artifact_relative_path,
|
||||||
|
"origin_section": observation.section,
|
||||||
|
"line_start": observation.line_start,
|
||||||
|
"line_end": observation.line_end,
|
||||||
|
"grounding_status": observation.grounding_status,
|
||||||
|
"support_kind": observation.support_kind,
|
||||||
|
"confidence_hint": observation.confidence_hint,
|
||||||
|
"current_status": "draft",
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
def build_claim_record(
|
||||||
|
context: ImportContext,
|
||||||
|
observation_record: dict[str, Any],
|
||||||
|
observation: SegmentedObservation,
|
||||||
|
concept_ids: list[str],
|
||||||
|
index: int,
|
||||||
|
) -> dict[str, Any]:
|
||||||
|
return {
|
||||||
|
"claim_id": _claim_id_for_observation(observation_record, observation, index),
|
||||||
|
"import_id": context.import_id,
|
||||||
|
"claim_text": observation_record["text"],
|
||||||
|
"claim_kind": "statement" if observation_record["role"] == "claim" else "summary",
|
||||||
|
"source_observation_ids": [observation_record["observation_id"]],
|
||||||
|
"supporting_fragment_ids": [],
|
||||||
|
"concept_ids": [f"concept::{concept_id}" for concept_id in concept_ids],
|
||||||
|
"contradicts_claim_ids": [f"clm_{_sanitize_claim_key(value)}" for value in observation.contradict_keys],
|
||||||
|
"supersedes_claim_ids": [f"clm_{_sanitize_claim_key(value)}" for value in observation.supersede_keys],
|
||||||
|
"confidence_hint": observation_record["confidence_hint"],
|
||||||
|
"grounding_status": observation_record["grounding_status"],
|
||||||
|
"current_status": "triaged" if observation_record["grounding_status"] != "ungrounded" else "draft",
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
def build_concept_records(context: ImportContext, artifact_record: dict[str, Any], concept_ids: list[str]) -> list[dict[str, Any]]:
|
||||||
|
records = []
|
||||||
|
for concept_id in concept_ids:
|
||||||
|
records.append(
|
||||||
|
{
|
||||||
|
"concept_id": f"concept::{concept_id}",
|
||||||
|
"import_id": context.import_id,
|
||||||
|
"title": concept_id.replace("-", " ").title(),
|
||||||
|
"aliases": [],
|
||||||
|
"description": "Imported concept from llmwiki corpus.",
|
||||||
|
"source_artifact_ids": [artifact_record["artifact_id"]],
|
||||||
|
"current_status": "triaged",
|
||||||
|
}
|
||||||
|
)
|
||||||
|
return records
|
||||||
|
|
||||||
|
|
||||||
|
def build_relation_records(context: ImportContext, artifact_record: dict[str, Any], concept_ids: list[str], links: list[str]) -> list[dict[str, Any]]:
|
||||||
|
if not concept_ids:
|
||||||
|
return []
|
||||||
|
primary = f"concept::{concept_ids[0]}"
|
||||||
|
records = []
|
||||||
|
for idx, link in enumerate(links, start=1):
|
||||||
|
target = f"concept::{link.lower().replace(' ', '-')}"
|
||||||
|
records.append(
|
||||||
|
{
|
||||||
|
"relation_id": f"rel_{artifact_record['artifact_id']}_{idx}",
|
||||||
|
"import_id": context.import_id,
|
||||||
|
"source_id": primary,
|
||||||
|
"target_id": target,
|
||||||
|
"relation_type": "references",
|
||||||
|
"evidence_ids": [],
|
||||||
|
"current_status": "draft",
|
||||||
|
}
|
||||||
|
)
|
||||||
|
return records
|
||||||
|
|
||||||
|
|
||||||
|
def manifest_record(context: ImportContext) -> dict[str, Any]:
|
||||||
|
return asdict(context) | {"source_repo_kind": "llmwiki"}
|
||||||
|
|
@ -0,0 +1,9 @@
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
"""Legacy extracted GroundRecall promotion module.
|
||||||
|
|
||||||
|
Compatibility path retained while the standalone repo converges on the
|
||||||
|
top-level ``groundrecall.promotion`` module as the primary implementation.
|
||||||
|
"""
|
||||||
|
|
||||||
|
from .promotion import build_parser, main, promote_import_to_store
|
||||||
|
|
@ -0,0 +1,26 @@
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
"""Legacy flat GroundRecall query module.
|
||||||
|
|
||||||
|
Compatibility path retained during the internal namespace migration.
|
||||||
|
Prefer imports under ``didactopus.groundrecall.query`` or CLI usage via
|
||||||
|
``didactopus.groundrecall.cli`` for new code.
|
||||||
|
"""
|
||||||
|
|
||||||
|
from .groundrecall.query import (
|
||||||
|
build_parser,
|
||||||
|
build_query_bundle_for_concept,
|
||||||
|
main,
|
||||||
|
query_concept,
|
||||||
|
query_provenance,
|
||||||
|
search_claims,
|
||||||
|
)
|
||||||
|
|
||||||
|
__all__ = [
|
||||||
|
"query_concept",
|
||||||
|
"search_claims",
|
||||||
|
"query_provenance",
|
||||||
|
"build_query_bundle_for_concept",
|
||||||
|
"build_parser",
|
||||||
|
"main",
|
||||||
|
]
|
||||||
|
|
@ -0,0 +1,138 @@
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import argparse
|
||||||
|
import json
|
||||||
|
from collections import defaultdict
|
||||||
|
from pathlib import Path
|
||||||
|
from typing import Any
|
||||||
|
|
||||||
|
from .review_export import build_citation_review_entries_from_import, export_review_state_json, export_review_ui_data
|
||||||
|
from .review_schema import ConceptReviewEntry, DraftPackData, ReviewSession
|
||||||
|
|
||||||
|
|
||||||
|
def _read_json(path: Path) -> dict[str, Any]:
|
||||||
|
return json.loads(path.read_text(encoding="utf-8"))
|
||||||
|
|
||||||
|
|
||||||
|
def _read_jsonl(path: Path) -> list[dict[str, Any]]:
|
||||||
|
if not path.exists():
|
||||||
|
return []
|
||||||
|
text = path.read_text(encoding="utf-8").strip()
|
||||||
|
if not text:
|
||||||
|
return []
|
||||||
|
return [json.loads(line) for line in text.splitlines()]
|
||||||
|
|
||||||
|
|
||||||
|
def _claim_summary(claims: list[dict[str, Any]]) -> list[str]:
|
||||||
|
lines: list[str] = []
|
||||||
|
for claim in claims[:3]:
|
||||||
|
grounding = claim.get("grounding_status", "unknown")
|
||||||
|
lines.append(f"Claim: {claim.get('claim_text', '')} [{grounding}]")
|
||||||
|
if len(claims) > 3:
|
||||||
|
lines.append(f"{len(claims) - 3} additional claims omitted from notes summary.")
|
||||||
|
return lines
|
||||||
|
|
||||||
|
|
||||||
|
def build_review_session_from_import(import_dir: str | Path, reviewer: str = "GroundRecall Import") -> ReviewSession:
|
||||||
|
base = Path(import_dir)
|
||||||
|
manifest = _read_json(base / "manifest.json")
|
||||||
|
lint_payload = _read_json(base / "lint_findings.json")
|
||||||
|
claims = _read_jsonl(base / "claims.jsonl")
|
||||||
|
concepts = _read_jsonl(base / "concepts.jsonl")
|
||||||
|
|
||||||
|
claims_by_concept: defaultdict[str, list[dict[str, Any]]] = defaultdict(list)
|
||||||
|
for claim in claims:
|
||||||
|
for concept_id in claim.get("concept_ids", []):
|
||||||
|
claims_by_concept[concept_id].append(claim)
|
||||||
|
|
||||||
|
findings_by_target: defaultdict[str, list[dict[str, Any]]] = defaultdict(list)
|
||||||
|
concept_findings: defaultdict[str, list[dict[str, Any]]] = defaultdict(list)
|
||||||
|
for finding in lint_payload.get("findings", []):
|
||||||
|
findings_by_target[finding["target_id"]].append(finding)
|
||||||
|
for claim in claims:
|
||||||
|
for concept_id in claim.get("concept_ids", []):
|
||||||
|
concept_findings[concept_id].extend(findings_by_target.get(claim["claim_id"], []))
|
||||||
|
for concept in concepts:
|
||||||
|
concept_findings[concept["concept_id"]].extend(findings_by_target.get(concept["concept_id"], []))
|
||||||
|
|
||||||
|
entries: list[ConceptReviewEntry] = []
|
||||||
|
for concept in concepts:
|
||||||
|
concept_id = concept["concept_id"]
|
||||||
|
related_claims = claims_by_concept.get(concept_id, [])
|
||||||
|
related_findings = concept_findings.get(concept_id, [])
|
||||||
|
has_errors = any(item["severity"] == "error" for item in related_findings)
|
||||||
|
all_grounded = bool(related_claims) and all(item.get("grounding_status") == "grounded" for item in related_claims)
|
||||||
|
status = "needs_review"
|
||||||
|
if not has_errors and all_grounded:
|
||||||
|
status = "provisional"
|
||||||
|
|
||||||
|
notes = _claim_summary(related_claims)
|
||||||
|
notes.extend(item["message"] for item in related_findings[:5])
|
||||||
|
|
||||||
|
entries.append(
|
||||||
|
ConceptReviewEntry(
|
||||||
|
concept_id=concept_id.replace("concept::", "", 1),
|
||||||
|
title=concept.get("title", concept_id),
|
||||||
|
description=concept.get("description", ""),
|
||||||
|
prerequisites=[],
|
||||||
|
mastery_signals=[],
|
||||||
|
status=status,
|
||||||
|
notes=notes,
|
||||||
|
)
|
||||||
|
)
|
||||||
|
|
||||||
|
conflicts = [item["message"] for item in lint_payload.get("findings", []) if item["severity"] == "error"]
|
||||||
|
review_flags = [item["message"] for item in lint_payload.get("findings", []) if item["severity"] == "warning"]
|
||||||
|
pack = {
|
||||||
|
"name": f"groundrecall-import-{manifest['import_id']}",
|
||||||
|
"display_name": f"GroundRecall Import {manifest['import_id']}",
|
||||||
|
"version": "0.1.0-draft",
|
||||||
|
"source_import_id": manifest["import_id"],
|
||||||
|
"source_root": manifest.get("source_root", ""),
|
||||||
|
}
|
||||||
|
attribution = {
|
||||||
|
"source_repo_kind": manifest.get("source_repo_kind", "llmwiki"),
|
||||||
|
"source_root": manifest.get("source_root", ""),
|
||||||
|
"imported_at": manifest.get("imported_at", ""),
|
||||||
|
"machine_id": manifest.get("machine_id", ""),
|
||||||
|
"rights_note": "Imported llmwiki-style corpus requires review before promotion.",
|
||||||
|
}
|
||||||
|
return ReviewSession(
|
||||||
|
reviewer=reviewer,
|
||||||
|
draft_pack=DraftPackData(
|
||||||
|
pack=pack,
|
||||||
|
concepts=entries,
|
||||||
|
conflicts=conflicts,
|
||||||
|
review_flags=review_flags,
|
||||||
|
attribution=attribution,
|
||||||
|
),
|
||||||
|
citation_reviews=build_citation_review_entries_from_import(base),
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
def export_review_bundle_from_import(import_dir: str | Path, out_dir: str | Path | None = None, reviewer: str = "GroundRecall Import") -> dict[str, str]:
|
||||||
|
base = Path(import_dir)
|
||||||
|
target = Path(out_dir) if out_dir is not None else base
|
||||||
|
target.mkdir(parents=True, exist_ok=True)
|
||||||
|
session = build_review_session_from_import(base, reviewer=reviewer)
|
||||||
|
review_state_path = target / "review_session.json"
|
||||||
|
export_review_state_json(session, review_state_path)
|
||||||
|
export_review_ui_data(session, target, import_dir=base)
|
||||||
|
return {
|
||||||
|
"review_session_json": str(review_state_path),
|
||||||
|
"review_data_json": str(target / "review_data.json"),
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
def build_parser() -> argparse.ArgumentParser:
|
||||||
|
parser = argparse.ArgumentParser(description="Build Didactopus review artifacts from a GroundRecall import.")
|
||||||
|
parser.add_argument("import_dir")
|
||||||
|
parser.add_argument("--out-dir", default=None)
|
||||||
|
parser.add_argument("--reviewer", default="GroundRecall Import")
|
||||||
|
return parser
|
||||||
|
|
||||||
|
|
||||||
|
def main() -> None:
|
||||||
|
args = build_parser().parse_args()
|
||||||
|
outputs = export_review_bundle_from_import(args.import_dir, out_dir=args.out_dir, reviewer=args.reviewer)
|
||||||
|
print(json.dumps(outputs, indent=2))
|
||||||
|
|
@ -0,0 +1,114 @@
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import argparse
|
||||||
|
import json
|
||||||
|
from collections import defaultdict
|
||||||
|
from pathlib import Path
|
||||||
|
from typing import Any
|
||||||
|
|
||||||
|
|
||||||
|
def _read_json(path: Path) -> dict[str, Any]:
|
||||||
|
return json.loads(path.read_text(encoding="utf-8"))
|
||||||
|
|
||||||
|
|
||||||
|
def _read_jsonl(path: Path) -> list[dict[str, Any]]:
|
||||||
|
if not path.exists():
|
||||||
|
return []
|
||||||
|
text = path.read_text(encoding="utf-8").strip()
|
||||||
|
if not text:
|
||||||
|
return []
|
||||||
|
return [json.loads(line) for line in text.splitlines()]
|
||||||
|
|
||||||
|
|
||||||
|
def _triage_lane(item: dict[str, Any], finding_codes: set[str]) -> str:
|
||||||
|
if {"claim_ungrounded", "ungrounded_summary"} & finding_codes:
|
||||||
|
return "source_cleanup"
|
||||||
|
if {"relation_missing_source", "relation_missing_target", "orphan_concept"} & finding_codes:
|
||||||
|
return "conflict_resolution"
|
||||||
|
return "knowledge_capture"
|
||||||
|
|
||||||
|
|
||||||
|
def _priority(item: dict[str, Any], finding_codes: set[str]) -> int:
|
||||||
|
priority = 50
|
||||||
|
if item.get("grounding_status") == "grounded":
|
||||||
|
priority -= 10
|
||||||
|
if item.get("current_status") == "triaged":
|
||||||
|
priority -= 5
|
||||||
|
if any(code.startswith("claim_") or code.startswith("relation_") for code in finding_codes):
|
||||||
|
priority += 20
|
||||||
|
priority -= min(len(finding_codes) * 2, 10)
|
||||||
|
return max(priority, 1)
|
||||||
|
|
||||||
|
|
||||||
|
def build_review_queue(import_dir: str | Path) -> dict[str, Any]:
|
||||||
|
base = Path(import_dir)
|
||||||
|
manifest = _read_json(base / "manifest.json")
|
||||||
|
lint_payload = _read_json(base / "lint_findings.json")
|
||||||
|
claims = _read_jsonl(base / "claims.jsonl")
|
||||||
|
concepts = _read_jsonl(base / "concepts.jsonl")
|
||||||
|
|
||||||
|
findings_by_target: defaultdict[str, list[dict[str, Any]]] = defaultdict(list)
|
||||||
|
for finding in lint_payload.get("findings", []):
|
||||||
|
findings_by_target[finding["target_id"]].append(finding)
|
||||||
|
|
||||||
|
queue: list[dict[str, Any]] = []
|
||||||
|
|
||||||
|
for claim in claims:
|
||||||
|
related = findings_by_target.get(claim["claim_id"], [])
|
||||||
|
finding_codes = {item["code"] for item in related}
|
||||||
|
queue.append(
|
||||||
|
{
|
||||||
|
"queue_id": f"rq_{claim['claim_id']}",
|
||||||
|
"candidate_type": "claim",
|
||||||
|
"candidate_id": claim["claim_id"],
|
||||||
|
"title": claim["claim_text"][:100],
|
||||||
|
"triage_lane": _triage_lane(claim, finding_codes),
|
||||||
|
"priority": _priority(claim, finding_codes),
|
||||||
|
"grounding_status": claim.get("grounding_status"),
|
||||||
|
"status": "needs_review",
|
||||||
|
"finding_codes": sorted(finding_codes),
|
||||||
|
"concept_ids": list(claim.get("concept_ids", [])),
|
||||||
|
}
|
||||||
|
)
|
||||||
|
|
||||||
|
for concept in concepts:
|
||||||
|
related = findings_by_target.get(concept["concept_id"], [])
|
||||||
|
finding_codes = {item["code"] for item in related}
|
||||||
|
if not finding_codes:
|
||||||
|
continue
|
||||||
|
queue.append(
|
||||||
|
{
|
||||||
|
"queue_id": f"rq_{concept['concept_id'].replace('::', '_')}",
|
||||||
|
"candidate_type": "concept",
|
||||||
|
"candidate_id": concept["concept_id"],
|
||||||
|
"title": concept["title"],
|
||||||
|
"triage_lane": _triage_lane(concept, finding_codes),
|
||||||
|
"priority": _priority(concept, finding_codes),
|
||||||
|
"grounding_status": concept.get("grounding_status", "triaged"),
|
||||||
|
"status": "needs_review",
|
||||||
|
"finding_codes": sorted(finding_codes),
|
||||||
|
"concept_ids": [concept["concept_id"]],
|
||||||
|
}
|
||||||
|
)
|
||||||
|
|
||||||
|
queue.sort(key=lambda item: (item["priority"], item["candidate_type"], item["candidate_id"]))
|
||||||
|
return {
|
||||||
|
"import_id": manifest["import_id"],
|
||||||
|
"queue_length": len(queue),
|
||||||
|
"items": queue,
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
def build_parser() -> argparse.ArgumentParser:
|
||||||
|
parser = argparse.ArgumentParser(description="Build a GroundRecall review queue from import artifacts.")
|
||||||
|
parser.add_argument("import_dir")
|
||||||
|
parser.add_argument("--out", default=None)
|
||||||
|
return parser
|
||||||
|
|
||||||
|
|
||||||
|
def main() -> None:
|
||||||
|
args = build_parser().parse_args()
|
||||||
|
payload = build_review_queue(args.import_dir)
|
||||||
|
out_path = Path(args.out) if args.out else Path(args.import_dir) / "review_queue.json"
|
||||||
|
out_path.write_text(json.dumps(payload, indent=2), encoding="utf-8")
|
||||||
|
print(f"Wrote {out_path}")
|
||||||
|
|
@ -0,0 +1,180 @@
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
from dataclasses import dataclass, field
|
||||||
|
from pathlib import Path
|
||||||
|
import re
|
||||||
|
|
||||||
|
from .groundrecall_discovery import DiscoveredArtifact
|
||||||
|
|
||||||
|
|
||||||
|
HEADING_RE = re.compile(r"^(#{1,6})\s+(.*)$")
|
||||||
|
FRONTMATTER_DELIM = "---"
|
||||||
|
ANNOTATION_RE = re.compile(r"\[(claim_id|contradicts|supersedes):([^\]]+)\]", re.IGNORECASE)
|
||||||
|
TABLE_SEPARATOR_RE = re.compile(r"^\|(?:\s*:?-{3,}:?\s*\|)+\s*$")
|
||||||
|
LATEX_STRUCTURAL_RE = re.compile(r"^\\(begin|end|centering|caption|label|tikzset|node|draw|path|matrix|includegraphics)\b")
|
||||||
|
LATEX_MATH_ONLY_RE = re.compile(r"^[\\{}[\]()$&_^%.,;:=+\-*/|<>~0-9A-Za-z ]+$")
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class SegmentedObservation:
|
||||||
|
artifact_relative_path: str
|
||||||
|
role: str
|
||||||
|
text: str
|
||||||
|
section: str
|
||||||
|
line_start: int
|
||||||
|
line_end: int
|
||||||
|
grounding_status: str
|
||||||
|
support_kind: str
|
||||||
|
confidence_hint: float
|
||||||
|
explicit_claim_key: str = ""
|
||||||
|
contradict_keys: list[str] = field(default_factory=list)
|
||||||
|
supersede_keys: list[str] = field(default_factory=list)
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class SegmentedPage:
|
||||||
|
title: str
|
||||||
|
headings: list[str] = field(default_factory=list)
|
||||||
|
frontmatter: dict[str, str] = field(default_factory=dict)
|
||||||
|
observations: list[SegmentedObservation] = field(default_factory=list)
|
||||||
|
concepts: list[str] = field(default_factory=list)
|
||||||
|
links: list[str] = field(default_factory=list)
|
||||||
|
|
||||||
|
|
||||||
|
def _parse_frontmatter(lines: list[str]) -> tuple[dict[str, str], int]:
|
||||||
|
if not lines or lines[0].strip() != FRONTMATTER_DELIM:
|
||||||
|
return {}, 0
|
||||||
|
data: dict[str, str] = {}
|
||||||
|
idx = 1
|
||||||
|
while idx < len(lines):
|
||||||
|
stripped = lines[idx].strip()
|
||||||
|
if stripped == FRONTMATTER_DELIM:
|
||||||
|
return data, idx + 1
|
||||||
|
if ":" in stripped:
|
||||||
|
key, value = stripped.split(":", 1)
|
||||||
|
data[key.strip()] = value.strip()
|
||||||
|
idx += 1
|
||||||
|
return data, 0
|
||||||
|
|
||||||
|
|
||||||
|
def _extract_links(text: str) -> list[str]:
|
||||||
|
return re.findall(r"\[\[([^\]]+)\]\]", text)
|
||||||
|
|
||||||
|
|
||||||
|
def _to_concept_id(text: str) -> str:
|
||||||
|
text = re.sub(r"[^a-zA-Z0-9]+", "-", text.strip().lower()).strip("-")
|
||||||
|
return text or "untitled"
|
||||||
|
|
||||||
|
|
||||||
|
def _parse_annotations(text: str) -> tuple[str, str, list[str], list[str]]:
|
||||||
|
claim_key = ""
|
||||||
|
contradict_keys: list[str] = []
|
||||||
|
supersede_keys: list[str] = []
|
||||||
|
for kind, raw_value in ANNOTATION_RE.findall(text):
|
||||||
|
values = [value.strip() for value in raw_value.split(",") if value.strip()]
|
||||||
|
kind_lower = kind.lower()
|
||||||
|
if kind_lower == "claim_id" and values:
|
||||||
|
claim_key = values[0]
|
||||||
|
elif kind_lower == "contradicts":
|
||||||
|
contradict_keys.extend(values)
|
||||||
|
elif kind_lower == "supersedes":
|
||||||
|
supersede_keys.extend(values)
|
||||||
|
cleaned = ANNOTATION_RE.sub("", text)
|
||||||
|
cleaned = re.sub(r"\s{2,}", " ", cleaned).strip()
|
||||||
|
return cleaned, claim_key, contradict_keys, supersede_keys
|
||||||
|
|
||||||
|
|
||||||
|
def _should_skip_line(text: str) -> bool:
|
||||||
|
stripped = text.strip()
|
||||||
|
if not stripped:
|
||||||
|
return True
|
||||||
|
if stripped.startswith("!["):
|
||||||
|
return True
|
||||||
|
if stripped in {"---", "```", "};", "{", "}", "</div>", "<div>", ":::"}:
|
||||||
|
return True
|
||||||
|
if stripped.startswith(":::"):
|
||||||
|
return True
|
||||||
|
if stripped.startswith("|") and stripped.endswith("|"):
|
||||||
|
return True
|
||||||
|
if TABLE_SEPARATOR_RE.match(stripped):
|
||||||
|
return True
|
||||||
|
if LATEX_STRUCTURAL_RE.match(stripped):
|
||||||
|
return True
|
||||||
|
if stripped.startswith("%"):
|
||||||
|
return True
|
||||||
|
if stripped.startswith("\\") and LATEX_MATH_ONLY_RE.match(stripped):
|
||||||
|
return True
|
||||||
|
return False
|
||||||
|
|
||||||
|
|
||||||
|
def segment_markdown_artifact(artifact: DiscoveredArtifact, text: str | None = None) -> SegmentedPage:
|
||||||
|
text = artifact.path.read_text(encoding="utf-8") if text is None else text
|
||||||
|
lines = text.splitlines()
|
||||||
|
frontmatter, start_idx = _parse_frontmatter(lines)
|
||||||
|
current_section = frontmatter.get("title", Path(artifact.relative_path).stem.replace("-", " ").title())
|
||||||
|
title = current_section
|
||||||
|
headings: list[str] = []
|
||||||
|
observations: list[SegmentedObservation] = []
|
||||||
|
concepts: list[str] = []
|
||||||
|
links: list[str] = []
|
||||||
|
|
||||||
|
for idx in range(start_idx, len(lines)):
|
||||||
|
raw_line = lines[idx]
|
||||||
|
stripped = raw_line.strip()
|
||||||
|
if _should_skip_line(stripped):
|
||||||
|
continue
|
||||||
|
heading_match = HEADING_RE.match(raw_line)
|
||||||
|
if heading_match:
|
||||||
|
current_section = heading_match.group(2).strip()
|
||||||
|
headings.append(current_section)
|
||||||
|
if not title and heading_match.group(1) == "#":
|
||||||
|
title = current_section
|
||||||
|
concepts.append(_to_concept_id(current_section))
|
||||||
|
continue
|
||||||
|
|
||||||
|
role = "summary"
|
||||||
|
obs_text = stripped
|
||||||
|
if stripped.startswith(("- ", "* ")):
|
||||||
|
role = "claim"
|
||||||
|
obs_text = stripped[2:].strip()
|
||||||
|
elif stripped.lower().startswith(("todo:", "question:", "q:")):
|
||||||
|
role = "question"
|
||||||
|
elif stripped.lower().startswith(("speculation:", "hypothesis:")):
|
||||||
|
role = "speculation"
|
||||||
|
elif artifact.artifact_kind == "session_log":
|
||||||
|
role = "transcript"
|
||||||
|
|
||||||
|
obs_text, claim_key, contradict_keys, supersede_keys = _parse_annotations(obs_text)
|
||||||
|
|
||||||
|
links.extend(_extract_links(obs_text))
|
||||||
|
if role in {"summary", "claim"}:
|
||||||
|
concepts.extend(_to_concept_id(link) for link in _extract_links(obs_text))
|
||||||
|
observations.append(
|
||||||
|
SegmentedObservation(
|
||||||
|
artifact_relative_path=artifact.relative_path,
|
||||||
|
role=role,
|
||||||
|
text=obs_text,
|
||||||
|
section=current_section,
|
||||||
|
line_start=idx + 1,
|
||||||
|
line_end=idx + 1,
|
||||||
|
grounding_status="partially_grounded" if artifact.artifact_kind == "compiled_page" else "grounded",
|
||||||
|
support_kind="derived_from_page" if artifact.artifact_kind == "compiled_page" else "direct_source",
|
||||||
|
confidence_hint=0.55 if role == "speculation" else 0.7 if role == "claim" else 0.6,
|
||||||
|
explicit_claim_key=claim_key,
|
||||||
|
contradict_keys=contradict_keys,
|
||||||
|
supersede_keys=supersede_keys,
|
||||||
|
)
|
||||||
|
)
|
||||||
|
|
||||||
|
if not headings and title:
|
||||||
|
headings.append(title)
|
||||||
|
if not concepts and title:
|
||||||
|
concepts.append(_to_concept_id(title))
|
||||||
|
return SegmentedPage(
|
||||||
|
title=title,
|
||||||
|
headings=headings,
|
||||||
|
frontmatter=frontmatter,
|
||||||
|
observations=observations,
|
||||||
|
concepts=sorted({c for c in concepts if c}),
|
||||||
|
links=sorted({link for link in links if link}),
|
||||||
|
)
|
||||||
|
|
@ -0,0 +1,15 @@
|
||||||
|
"""Legacy flat GroundRecall source adapter package.
|
||||||
|
|
||||||
|
Compatibility path retained during the internal namespace migration.
|
||||||
|
Prefer imports under ``didactopus.groundrecall.source_adapters`` for new code.
|
||||||
|
"""
|
||||||
|
|
||||||
|
from .base import get_source_adapter, list_source_adapters
|
||||||
|
from . import llmwiki # noqa: F401
|
||||||
|
from . import polypaper # noqa: F401
|
||||||
|
from . import doclift_bundle # noqa: F401
|
||||||
|
from . import markdown_notes # noqa: F401
|
||||||
|
from . import transcript # noqa: F401
|
||||||
|
from . import didactopus_pack # noqa: F401
|
||||||
|
|
||||||
|
__all__ = ["get_source_adapter", "list_source_adapters"]
|
||||||
|
|
@ -0,0 +1,76 @@
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
"""Legacy flat GroundRecall source adapter base module.
|
||||||
|
|
||||||
|
Compatibility path retained during the internal namespace migration.
|
||||||
|
Prefer imports under ``didactopus.groundrecall.source_adapters.base`` for new
|
||||||
|
code.
|
||||||
|
"""
|
||||||
|
|
||||||
|
from dataclasses import dataclass
|
||||||
|
from pathlib import Path
|
||||||
|
from typing import Literal, Protocol
|
||||||
|
|
||||||
|
|
||||||
|
ImportIntent = Literal["grounded_knowledge", "curriculum", "both"]
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class DiscoveredImportSource:
|
||||||
|
path: Path
|
||||||
|
relative_path: str
|
||||||
|
source_kind: str
|
||||||
|
artifact_kind: str
|
||||||
|
is_text: bool
|
||||||
|
metadata: dict
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class StructuredImportRows:
|
||||||
|
artifact_rows: list[dict]
|
||||||
|
observation_rows: list[dict]
|
||||||
|
claim_rows: list[dict]
|
||||||
|
concept_rows: list[dict]
|
||||||
|
relation_rows: list[dict]
|
||||||
|
|
||||||
|
|
||||||
|
class GroundRecallSourceAdapter(Protocol):
|
||||||
|
name: str
|
||||||
|
|
||||||
|
def detect(self, root: str | Path) -> bool:
|
||||||
|
...
|
||||||
|
|
||||||
|
def discover(self, root: str | Path) -> list[DiscoveredImportSource]:
|
||||||
|
...
|
||||||
|
|
||||||
|
def import_intent(self) -> ImportIntent:
|
||||||
|
...
|
||||||
|
|
||||||
|
def build_rows(self, context, sources: list[DiscoveredImportSource]) -> StructuredImportRows | None:
|
||||||
|
...
|
||||||
|
|
||||||
|
|
||||||
|
_REGISTRY: dict[str, GroundRecallSourceAdapter] = {}
|
||||||
|
|
||||||
|
|
||||||
|
def register_source_adapter(adapter: GroundRecallSourceAdapter) -> GroundRecallSourceAdapter:
|
||||||
|
_REGISTRY[adapter.name] = adapter
|
||||||
|
return adapter
|
||||||
|
|
||||||
|
|
||||||
|
def get_source_adapter(name: str) -> GroundRecallSourceAdapter:
|
||||||
|
try:
|
||||||
|
return _REGISTRY[name]
|
||||||
|
except KeyError as exc:
|
||||||
|
raise KeyError(f"Unknown GroundRecall source adapter: {name}") from exc
|
||||||
|
|
||||||
|
|
||||||
|
def list_source_adapters() -> list[str]:
|
||||||
|
return sorted(_REGISTRY)
|
||||||
|
|
||||||
|
|
||||||
|
def detect_source_adapter(root: str | Path) -> GroundRecallSourceAdapter:
|
||||||
|
for adapter in _REGISTRY.values():
|
||||||
|
if adapter.detect(root):
|
||||||
|
return adapter
|
||||||
|
raise ValueError(f"No GroundRecall source adapter detected for {root}")
|
||||||
|
|
@ -0,0 +1,234 @@
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
from hashlib import sha256
|
||||||
|
import yaml
|
||||||
|
from pathlib import Path
|
||||||
|
|
||||||
|
from ..artifact_schemas import ConceptsFile, RoadmapFile
|
||||||
|
from .base import DiscoveredImportSource, StructuredImportRows, register_source_adapter
|
||||||
|
|
||||||
|
|
||||||
|
class DidactopusPackSourceAdapter:
|
||||||
|
name = "didactopus_pack"
|
||||||
|
|
||||||
|
def detect(self, root: str | Path) -> bool:
|
||||||
|
base = Path(root)
|
||||||
|
required = {"pack.yaml", "concepts.yaml"}
|
||||||
|
return required.issubset({path.name for path in base.iterdir() if path.exists()})
|
||||||
|
|
||||||
|
def discover(self, root: str | Path) -> list[DiscoveredImportSource]:
|
||||||
|
base = Path(root)
|
||||||
|
rows: list[DiscoveredImportSource] = []
|
||||||
|
for filename in ["pack.yaml", "concepts.yaml", "roadmap.yaml", "projects.yaml", "rubrics.yaml", "review_ledger.json"]:
|
||||||
|
path = base / filename
|
||||||
|
if not path.exists():
|
||||||
|
continue
|
||||||
|
rows.append(
|
||||||
|
DiscoveredImportSource(
|
||||||
|
path=path,
|
||||||
|
relative_path=path.relative_to(base).as_posix(),
|
||||||
|
source_kind="didactopus_pack",
|
||||||
|
artifact_kind="didactopus_pack_artifact",
|
||||||
|
is_text=True,
|
||||||
|
metadata={},
|
||||||
|
)
|
||||||
|
)
|
||||||
|
return rows
|
||||||
|
|
||||||
|
def import_intent(self) -> str:
|
||||||
|
return "both"
|
||||||
|
|
||||||
|
def build_rows(self, context, sources: list[DiscoveredImportSource]) -> StructuredImportRows | None:
|
||||||
|
by_name = {Path(item.relative_path).name: item for item in sources}
|
||||||
|
concepts_src = by_name.get("concepts.yaml")
|
||||||
|
if concepts_src is None:
|
||||||
|
return None
|
||||||
|
|
||||||
|
pack_src = by_name.get("pack.yaml")
|
||||||
|
pack_payload = {}
|
||||||
|
if pack_src is not None:
|
||||||
|
pack_payload = yaml.safe_load(pack_src.path.read_text(encoding="utf-8")) or {}
|
||||||
|
concepts_payload = ConceptsFile.model_validate(
|
||||||
|
yaml.safe_load(concepts_src.path.read_text(encoding="utf-8")) or {"concepts": []}
|
||||||
|
)
|
||||||
|
roadmap_payload = None
|
||||||
|
roadmap_src = by_name.get("roadmap.yaml")
|
||||||
|
if roadmap_src is not None:
|
||||||
|
roadmap_payload = RoadmapFile.model_validate(
|
||||||
|
yaml.safe_load(roadmap_src.path.read_text(encoding="utf-8")) or {"stages": []}
|
||||||
|
)
|
||||||
|
|
||||||
|
artifact_rows: list[dict] = []
|
||||||
|
observation_rows: list[dict] = []
|
||||||
|
claim_rows: list[dict] = []
|
||||||
|
concept_rows: list[dict] = []
|
||||||
|
relation_rows: list[dict] = []
|
||||||
|
|
||||||
|
for source in sources:
|
||||||
|
artifact_rows.append(
|
||||||
|
{
|
||||||
|
"artifact_id": f"ia_{sha256(source.relative_path.encode('utf-8')).hexdigest()[:12]}",
|
||||||
|
"import_id": context.import_id,
|
||||||
|
"artifact_kind": source.artifact_kind,
|
||||||
|
"path": source.relative_path,
|
||||||
|
"title": source.path.stem,
|
||||||
|
"sha256": sha256(source.path.read_bytes()).hexdigest(),
|
||||||
|
"created_at": context.imported_at,
|
||||||
|
"metadata": {"source_kind": source.source_kind},
|
||||||
|
"current_status": "draft",
|
||||||
|
}
|
||||||
|
)
|
||||||
|
|
||||||
|
pack_name = pack_payload.get("name", Path(context.source_root).name)
|
||||||
|
concepts_artifact_id = next(
|
||||||
|
(row["artifact_id"] for row in artifact_rows if row["path"] == concepts_src.relative_path),
|
||||||
|
"",
|
||||||
|
)
|
||||||
|
|
||||||
|
for index, concept in enumerate(concepts_payload.concepts, start=1):
|
||||||
|
concept_key = f"concept::{concept.id}"
|
||||||
|
concept_rows.append(
|
||||||
|
{
|
||||||
|
"concept_id": concept_key,
|
||||||
|
"import_id": context.import_id,
|
||||||
|
"title": concept.title,
|
||||||
|
"aliases": [],
|
||||||
|
"description": concept.description or f"Imported concept from Didactopus pack {pack_name}.",
|
||||||
|
"source_artifact_ids": [concepts_artifact_id] if concepts_artifact_id else [],
|
||||||
|
"current_status": "triaged",
|
||||||
|
}
|
||||||
|
)
|
||||||
|
observation_id = f"obs_pack_{concept.id}_{index}"
|
||||||
|
observation_rows.append(
|
||||||
|
{
|
||||||
|
"observation_id": observation_id,
|
||||||
|
"import_id": context.import_id,
|
||||||
|
"artifact_id": concepts_artifact_id,
|
||||||
|
"role": "summary",
|
||||||
|
"text": concept.description or concept.title,
|
||||||
|
"origin_path": concepts_src.relative_path,
|
||||||
|
"origin_section": concept.title,
|
||||||
|
"line_start": 0,
|
||||||
|
"line_end": 0,
|
||||||
|
"grounding_status": "grounded",
|
||||||
|
"support_kind": "direct_source",
|
||||||
|
"confidence_hint": 0.85,
|
||||||
|
"current_status": "draft",
|
||||||
|
}
|
||||||
|
)
|
||||||
|
claim_rows.append(
|
||||||
|
{
|
||||||
|
"claim_id": f"clm_pack_{concept.id}",
|
||||||
|
"import_id": context.import_id,
|
||||||
|
"claim_text": concept.description or f"{concept.title} is a concept in pack {pack_name}.",
|
||||||
|
"claim_kind": "summary",
|
||||||
|
"source_observation_ids": [observation_id],
|
||||||
|
"supporting_fragment_ids": [],
|
||||||
|
"concept_ids": [concept_key],
|
||||||
|
"contradicts_claim_ids": [],
|
||||||
|
"supersedes_claim_ids": [],
|
||||||
|
"confidence_hint": 0.85,
|
||||||
|
"grounding_status": "grounded",
|
||||||
|
"current_status": "triaged",
|
||||||
|
}
|
||||||
|
)
|
||||||
|
for prereq in concept.prerequisites:
|
||||||
|
relation_rows.append(
|
||||||
|
{
|
||||||
|
"relation_id": f"rel_prereq_{concept.id}_{prereq}",
|
||||||
|
"import_id": context.import_id,
|
||||||
|
"source_id": f"concept::{prereq}",
|
||||||
|
"target_id": concept_key,
|
||||||
|
"relation_type": "prerequisite",
|
||||||
|
"evidence_ids": [f"clm_pack_{concept.id}"],
|
||||||
|
"current_status": "draft",
|
||||||
|
}
|
||||||
|
)
|
||||||
|
for signal_idx, signal in enumerate(concept.mastery_signals, start=1):
|
||||||
|
signal_obs_id = f"obs_signal_{concept.id}_{signal_idx}"
|
||||||
|
observation_rows.append(
|
||||||
|
{
|
||||||
|
"observation_id": signal_obs_id,
|
||||||
|
"import_id": context.import_id,
|
||||||
|
"artifact_id": concepts_artifact_id,
|
||||||
|
"role": "summary",
|
||||||
|
"text": signal,
|
||||||
|
"origin_path": concepts_src.relative_path,
|
||||||
|
"origin_section": f"{concept.title} mastery signal",
|
||||||
|
"line_start": 0,
|
||||||
|
"line_end": 0,
|
||||||
|
"grounding_status": "grounded",
|
||||||
|
"support_kind": "direct_source",
|
||||||
|
"confidence_hint": 0.8,
|
||||||
|
"current_status": "draft",
|
||||||
|
}
|
||||||
|
)
|
||||||
|
claim_rows.append(
|
||||||
|
{
|
||||||
|
"claim_id": f"clm_signal_{concept.id}_{signal_idx}",
|
||||||
|
"import_id": context.import_id,
|
||||||
|
"claim_text": signal,
|
||||||
|
"claim_kind": "mastery_signal",
|
||||||
|
"source_observation_ids": [signal_obs_id],
|
||||||
|
"supporting_fragment_ids": [],
|
||||||
|
"concept_ids": [concept_key],
|
||||||
|
"contradicts_claim_ids": [],
|
||||||
|
"supersedes_claim_ids": [],
|
||||||
|
"confidence_hint": 0.8,
|
||||||
|
"grounding_status": "grounded",
|
||||||
|
"current_status": "triaged",
|
||||||
|
}
|
||||||
|
)
|
||||||
|
|
||||||
|
if roadmap_payload is not None and roadmap_src is not None:
|
||||||
|
roadmap_artifact_id = next(
|
||||||
|
(row["artifact_id"] for row in artifact_rows if row["path"] == roadmap_src.relative_path),
|
||||||
|
"",
|
||||||
|
)
|
||||||
|
for stage in roadmap_payload.stages:
|
||||||
|
for concept_id in stage.concepts:
|
||||||
|
observation_id = f"obs_stage_{stage.id}_{concept_id}"
|
||||||
|
observation_rows.append(
|
||||||
|
{
|
||||||
|
"observation_id": observation_id,
|
||||||
|
"import_id": context.import_id,
|
||||||
|
"artifact_id": roadmap_artifact_id,
|
||||||
|
"role": "summary",
|
||||||
|
"text": f"{concept_id} appears in roadmap stage {stage.title}.",
|
||||||
|
"origin_path": roadmap_src.relative_path,
|
||||||
|
"origin_section": stage.title,
|
||||||
|
"line_start": 0,
|
||||||
|
"line_end": 0,
|
||||||
|
"grounding_status": "grounded",
|
||||||
|
"support_kind": "direct_source",
|
||||||
|
"confidence_hint": 0.75,
|
||||||
|
"current_status": "draft",
|
||||||
|
}
|
||||||
|
)
|
||||||
|
claim_rows.append(
|
||||||
|
{
|
||||||
|
"claim_id": f"clm_stage_{stage.id}_{concept_id}",
|
||||||
|
"import_id": context.import_id,
|
||||||
|
"claim_text": f"{concept_id} belongs to roadmap stage {stage.title}.",
|
||||||
|
"claim_kind": "roadmap_stage",
|
||||||
|
"source_observation_ids": [observation_id],
|
||||||
|
"supporting_fragment_ids": [],
|
||||||
|
"concept_ids": [f"concept::{concept_id}"],
|
||||||
|
"contradicts_claim_ids": [],
|
||||||
|
"supersedes_claim_ids": [],
|
||||||
|
"confidence_hint": 0.75,
|
||||||
|
"grounding_status": "grounded",
|
||||||
|
"current_status": "triaged",
|
||||||
|
}
|
||||||
|
)
|
||||||
|
|
||||||
|
return StructuredImportRows(
|
||||||
|
artifact_rows=artifact_rows,
|
||||||
|
observation_rows=observation_rows,
|
||||||
|
claim_rows=claim_rows,
|
||||||
|
concept_rows=concept_rows,
|
||||||
|
relation_rows=relation_rows,
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
register_source_adapter(DidactopusPackSourceAdapter())
|
||||||
|
|
@ -0,0 +1,150 @@
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import json
|
||||||
|
from hashlib import sha256
|
||||||
|
from pathlib import Path
|
||||||
|
|
||||||
|
from .base import DiscoveredImportSource, StructuredImportRows, register_source_adapter
|
||||||
|
|
||||||
|
|
||||||
|
class DocliftBundleSourceAdapter:
|
||||||
|
name = "doclift_bundle"
|
||||||
|
|
||||||
|
def detect(self, root: str | Path) -> bool:
|
||||||
|
base = Path(root)
|
||||||
|
return (base / "manifest.json").exists() and (base / "documents").exists()
|
||||||
|
|
||||||
|
def discover(self, root: str | Path) -> list[DiscoveredImportSource]:
|
||||||
|
base = Path(root)
|
||||||
|
rows: list[DiscoveredImportSource] = []
|
||||||
|
for path in sorted(p for p in base.rglob("*") if p.is_file() and p.suffix.lower() in {".json", ".md"}):
|
||||||
|
rows.append(
|
||||||
|
DiscoveredImportSource(
|
||||||
|
path=path,
|
||||||
|
relative_path=path.relative_to(base).as_posix(),
|
||||||
|
source_kind="doclift_bundle",
|
||||||
|
artifact_kind="doclift_bundle_artifact",
|
||||||
|
is_text=True,
|
||||||
|
metadata={},
|
||||||
|
)
|
||||||
|
)
|
||||||
|
return rows
|
||||||
|
|
||||||
|
def import_intent(self) -> str:
|
||||||
|
return "both"
|
||||||
|
|
||||||
|
def build_rows(self, context, sources: list[DiscoveredImportSource]) -> StructuredImportRows | None:
|
||||||
|
base = Path(context.source_root)
|
||||||
|
manifest_path = base / "manifest.json"
|
||||||
|
if not manifest_path.exists():
|
||||||
|
return None
|
||||||
|
manifest = json.loads(manifest_path.read_text(encoding="utf-8"))
|
||||||
|
|
||||||
|
artifact_rows: list[dict] = []
|
||||||
|
observation_rows: list[dict] = []
|
||||||
|
claim_rows: list[dict] = []
|
||||||
|
concept_rows: list[dict] = []
|
||||||
|
relation_rows: list[dict] = []
|
||||||
|
|
||||||
|
artifact_by_path: dict[str, str] = {}
|
||||||
|
for source in sources:
|
||||||
|
artifact_id = f"ia_{sha256(source.relative_path.encode('utf-8')).hexdigest()[:12]}"
|
||||||
|
artifact_rows.append(
|
||||||
|
{
|
||||||
|
"artifact_id": artifact_id,
|
||||||
|
"import_id": context.import_id,
|
||||||
|
"artifact_kind": source.artifact_kind,
|
||||||
|
"path": source.relative_path,
|
||||||
|
"title": source.path.stem,
|
||||||
|
"sha256": sha256(source.path.read_bytes()).hexdigest(),
|
||||||
|
"created_at": context.imported_at,
|
||||||
|
"metadata": {"source_kind": source.source_kind},
|
||||||
|
"current_status": "draft",
|
||||||
|
}
|
||||||
|
)
|
||||||
|
artifact_by_path[source.relative_path] = artifact_id
|
||||||
|
|
||||||
|
documents = [item for item in manifest.get("documents", []) if isinstance(item, dict)]
|
||||||
|
previous_concept_id: str | None = None
|
||||||
|
for index, document in enumerate(documents, start=1):
|
||||||
|
title = str(document.get("title") or f"Document {index}")
|
||||||
|
concept_id = f"concept::{document.get('document_id') or title.lower().replace(' ', '-')}"
|
||||||
|
markdown_path = Path(document.get("markdown_path", ""))
|
||||||
|
relative_markdown = markdown_path.relative_to(base).as_posix() if markdown_path.is_absolute() and markdown_path.exists() and markdown_path.is_relative_to(base) else document.get("markdown_path", "")
|
||||||
|
artifact_id = artifact_by_path.get(str(relative_markdown), "")
|
||||||
|
figures_path = Path(document.get("figures_path", ""))
|
||||||
|
figure_payload = {}
|
||||||
|
if figures_path.exists():
|
||||||
|
figure_payload = json.loads(figures_path.read_text(encoding="utf-8"))
|
||||||
|
source_path = str(figure_payload.get("source_path") or document.get("source_path") or relative_markdown)
|
||||||
|
|
||||||
|
concept_rows.append(
|
||||||
|
{
|
||||||
|
"concept_id": concept_id,
|
||||||
|
"import_id": context.import_id,
|
||||||
|
"title": title,
|
||||||
|
"aliases": [],
|
||||||
|
"description": f"Imported from doclift bundle document kind '{document.get('document_kind', 'document')}'.",
|
||||||
|
"source_artifact_ids": [artifact_id] if artifact_id else [],
|
||||||
|
"current_status": "triaged",
|
||||||
|
}
|
||||||
|
)
|
||||||
|
observation_id = f"obs_doclift_{index}"
|
||||||
|
observation_rows.append(
|
||||||
|
{
|
||||||
|
"observation_id": observation_id,
|
||||||
|
"import_id": context.import_id,
|
||||||
|
"artifact_id": artifact_id,
|
||||||
|
"role": "summary",
|
||||||
|
"text": title,
|
||||||
|
"origin_path": relative_markdown,
|
||||||
|
"origin_section": title,
|
||||||
|
"line_start": 0,
|
||||||
|
"line_end": 0,
|
||||||
|
"source_url": source_path,
|
||||||
|
"grounding_status": "grounded",
|
||||||
|
"support_kind": "direct_source",
|
||||||
|
"confidence_hint": 0.85,
|
||||||
|
"current_status": "draft",
|
||||||
|
}
|
||||||
|
)
|
||||||
|
claim_rows.append(
|
||||||
|
{
|
||||||
|
"claim_id": f"clm_doclift_{index}",
|
||||||
|
"import_id": context.import_id,
|
||||||
|
"claim_text": f"{title} is a {document.get('document_kind', 'document')} in the imported doclift bundle.",
|
||||||
|
"claim_kind": "summary",
|
||||||
|
"source_observation_ids": [observation_id],
|
||||||
|
"supporting_fragment_ids": [],
|
||||||
|
"concept_ids": [concept_id],
|
||||||
|
"contradicts_claim_ids": [],
|
||||||
|
"supersedes_claim_ids": [],
|
||||||
|
"confidence_hint": 0.85,
|
||||||
|
"grounding_status": "grounded",
|
||||||
|
"current_status": "triaged",
|
||||||
|
}
|
||||||
|
)
|
||||||
|
if previous_concept_id is not None:
|
||||||
|
relation_rows.append(
|
||||||
|
{
|
||||||
|
"relation_id": f"rel_doclift_seq_{index}",
|
||||||
|
"import_id": context.import_id,
|
||||||
|
"source_id": previous_concept_id,
|
||||||
|
"target_id": concept_id,
|
||||||
|
"relation_type": "references",
|
||||||
|
"evidence_ids": [f"clm_doclift_{index}"],
|
||||||
|
"current_status": "draft",
|
||||||
|
}
|
||||||
|
)
|
||||||
|
previous_concept_id = concept_id
|
||||||
|
|
||||||
|
return StructuredImportRows(
|
||||||
|
artifact_rows=artifact_rows,
|
||||||
|
observation_rows=observation_rows,
|
||||||
|
claim_rows=claim_rows,
|
||||||
|
concept_rows=concept_rows,
|
||||||
|
relation_rows=relation_rows,
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
register_source_adapter(DocliftBundleSourceAdapter())
|
||||||
|
|
@ -0,0 +1,36 @@
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
from pathlib import Path
|
||||||
|
|
||||||
|
from ..groundrecall_discovery import discover_llmwiki_artifacts
|
||||||
|
from .base import DiscoveredImportSource, StructuredImportRows, register_source_adapter
|
||||||
|
|
||||||
|
|
||||||
|
class LLMWikiSourceAdapter:
|
||||||
|
name = "llmwiki"
|
||||||
|
|
||||||
|
def detect(self, root: str | Path) -> bool:
|
||||||
|
base = Path(root)
|
||||||
|
return (base / "wiki").exists() or (base / "raw").exists() or any(path.name.startswith("schema.") for path in base.iterdir() if path.exists())
|
||||||
|
|
||||||
|
def discover(self, root: str | Path) -> list[DiscoveredImportSource]:
|
||||||
|
return [
|
||||||
|
DiscoveredImportSource(
|
||||||
|
path=item.path,
|
||||||
|
relative_path=item.relative_path,
|
||||||
|
source_kind="llmwiki",
|
||||||
|
artifact_kind=item.artifact_kind,
|
||||||
|
is_text=item.is_text,
|
||||||
|
metadata={},
|
||||||
|
)
|
||||||
|
for item in discover_llmwiki_artifacts(root)
|
||||||
|
]
|
||||||
|
|
||||||
|
def import_intent(self) -> str:
|
||||||
|
return "grounded_knowledge"
|
||||||
|
|
||||||
|
def build_rows(self, context, sources: list[DiscoveredImportSource]) -> StructuredImportRows | None:
|
||||||
|
return None
|
||||||
|
|
||||||
|
|
||||||
|
register_source_adapter(LLMWikiSourceAdapter())
|
||||||
|
|
@ -0,0 +1,41 @@
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
from pathlib import Path
|
||||||
|
|
||||||
|
from .base import DiscoveredImportSource, StructuredImportRows, register_source_adapter
|
||||||
|
|
||||||
|
|
||||||
|
TEXT_SUFFIXES = {".md", ".markdown", ".txt", ".tex"}
|
||||||
|
|
||||||
|
|
||||||
|
class MarkdownNotesSourceAdapter:
|
||||||
|
name = "markdown_notes"
|
||||||
|
|
||||||
|
def detect(self, root: str | Path) -> bool:
|
||||||
|
base = Path(root)
|
||||||
|
return any(path.suffix.lower() in TEXT_SUFFIXES for path in base.rglob("*") if path.is_file())
|
||||||
|
|
||||||
|
def discover(self, root: str | Path) -> list[DiscoveredImportSource]:
|
||||||
|
base = Path(root)
|
||||||
|
rows: list[DiscoveredImportSource] = []
|
||||||
|
for path in sorted(p for p in base.rglob("*") if p.is_file() and p.suffix.lower() in TEXT_SUFFIXES):
|
||||||
|
rows.append(
|
||||||
|
DiscoveredImportSource(
|
||||||
|
path=path,
|
||||||
|
relative_path=path.relative_to(base).as_posix(),
|
||||||
|
source_kind="markdown_notes",
|
||||||
|
artifact_kind="markdown_note",
|
||||||
|
is_text=True,
|
||||||
|
metadata={},
|
||||||
|
)
|
||||||
|
)
|
||||||
|
return rows
|
||||||
|
|
||||||
|
def import_intent(self) -> str:
|
||||||
|
return "grounded_knowledge"
|
||||||
|
|
||||||
|
def build_rows(self, context, sources: list[DiscoveredImportSource]) -> StructuredImportRows | None:
|
||||||
|
return None
|
||||||
|
|
||||||
|
|
||||||
|
register_source_adapter(MarkdownNotesSourceAdapter())
|
||||||
|
|
@ -0,0 +1,106 @@
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
from pathlib import Path
|
||||||
|
import re
|
||||||
|
|
||||||
|
from .base import DiscoveredImportSource, StructuredImportRows, register_source_adapter
|
||||||
|
|
||||||
|
|
||||||
|
TEXT_SUFFIXES = {".tex"}
|
||||||
|
EXCLUDED_NAMES = {
|
||||||
|
".pp-export-tmp.tex",
|
||||||
|
"paper.woven.arxiv.tex",
|
||||||
|
"paper.woven.test.tex",
|
||||||
|
"paper.woven.org",
|
||||||
|
"paper.org",
|
||||||
|
"paper_b.org",
|
||||||
|
"paper_c.org",
|
||||||
|
"paper_c.bak.org",
|
||||||
|
"paper-demo.org",
|
||||||
|
"paper-orig.org",
|
||||||
|
"test.output.org",
|
||||||
|
"tex-blocks.org",
|
||||||
|
}
|
||||||
|
EXCLUDED_DIRS = {".git", "__pycache__", ".pytest_cache", "setup"}
|
||||||
|
EXCLUDED_PREFIXES = ("table-", "figure-", "fig-")
|
||||||
|
INCLUDE_RE = re.compile(r"\\(?:include|input)\{([^}]+)\}")
|
||||||
|
|
||||||
|
|
||||||
|
class PolyPaperSourceAdapter:
|
||||||
|
name = "polypaper"
|
||||||
|
|
||||||
|
def detect(self, root: str | Path) -> bool:
|
||||||
|
base = Path(root)
|
||||||
|
return (
|
||||||
|
(base / "main.tex").exists()
|
||||||
|
and (base / "pieces").is_dir()
|
||||||
|
and ((base / "paper.org").exists() or (base / "README.md").exists())
|
||||||
|
)
|
||||||
|
|
||||||
|
def discover(self, root: str | Path) -> list[DiscoveredImportSource]:
|
||||||
|
base = Path(root)
|
||||||
|
allowed_paths = self._collect_reachable_tex(base)
|
||||||
|
rows: list[DiscoveredImportSource] = []
|
||||||
|
for path in sorted(allowed_paths):
|
||||||
|
rows.append(
|
||||||
|
DiscoveredImportSource(
|
||||||
|
path=path,
|
||||||
|
relative_path=path.relative_to(base).as_posix(),
|
||||||
|
source_kind="polypaper",
|
||||||
|
artifact_kind="markdown_note",
|
||||||
|
is_text=True,
|
||||||
|
metadata={},
|
||||||
|
)
|
||||||
|
)
|
||||||
|
return rows
|
||||||
|
|
||||||
|
def import_intent(self) -> str:
|
||||||
|
return "grounded_knowledge"
|
||||||
|
|
||||||
|
def build_rows(self, context, sources: list[DiscoveredImportSource]) -> StructuredImportRows | None:
|
||||||
|
return None
|
||||||
|
|
||||||
|
def _collect_reachable_tex(self, base: Path) -> set[Path]:
|
||||||
|
entrypoint = base / "main.tex"
|
||||||
|
reachable: set[Path] = set()
|
||||||
|
pending: list[Path] = [entrypoint]
|
||||||
|
|
||||||
|
while pending:
|
||||||
|
current = pending.pop()
|
||||||
|
if not current.exists():
|
||||||
|
continue
|
||||||
|
if current in reachable:
|
||||||
|
continue
|
||||||
|
if any(part in EXCLUDED_DIRS for part in current.relative_to(base).parts):
|
||||||
|
continue
|
||||||
|
if current.name in EXCLUDED_NAMES or current.suffix.lower() not in TEXT_SUFFIXES:
|
||||||
|
continue
|
||||||
|
if current.parent.name == "figs":
|
||||||
|
continue
|
||||||
|
if current.name.startswith(EXCLUDED_PREFIXES) or current.name == "tables.tex":
|
||||||
|
continue
|
||||||
|
text = current.read_text(encoding="utf-8")
|
||||||
|
for raw_ref in INCLUDE_RE.findall(text):
|
||||||
|
candidate = self._resolve_include(base, current.parent, raw_ref.strip())
|
||||||
|
if candidate is not None and candidate not in reachable:
|
||||||
|
pending.append(candidate)
|
||||||
|
if current != entrypoint:
|
||||||
|
reachable.add(current)
|
||||||
|
|
||||||
|
return reachable
|
||||||
|
|
||||||
|
def _resolve_include(self, base: Path, current_dir: Path, raw_ref: str) -> Path | None:
|
||||||
|
candidates = [current_dir / raw_ref, base / raw_ref]
|
||||||
|
resolved: list[Path] = []
|
||||||
|
for candidate in candidates:
|
||||||
|
if candidate.suffix:
|
||||||
|
resolved.append(candidate)
|
||||||
|
else:
|
||||||
|
resolved.append(candidate.with_suffix(".tex"))
|
||||||
|
for candidate in resolved:
|
||||||
|
if candidate.exists():
|
||||||
|
return candidate
|
||||||
|
return None
|
||||||
|
|
||||||
|
|
||||||
|
register_source_adapter(PolyPaperSourceAdapter())
|
||||||
|
|
@ -0,0 +1,38 @@
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
from pathlib import Path
|
||||||
|
|
||||||
|
from .base import DiscoveredImportSource, StructuredImportRows, register_source_adapter
|
||||||
|
|
||||||
|
|
||||||
|
class TranscriptSourceAdapter:
|
||||||
|
name = "transcript"
|
||||||
|
|
||||||
|
def detect(self, root: str | Path) -> bool:
|
||||||
|
base = Path(root)
|
||||||
|
return any("transcript" in path.name.lower() for path in base.rglob("*") if path.is_file())
|
||||||
|
|
||||||
|
def discover(self, root: str | Path) -> list[DiscoveredImportSource]:
|
||||||
|
base = Path(root)
|
||||||
|
rows: list[DiscoveredImportSource] = []
|
||||||
|
for path in sorted(p for p in base.rglob("*") if p.is_file() and "transcript" in p.name.lower()):
|
||||||
|
rows.append(
|
||||||
|
DiscoveredImportSource(
|
||||||
|
path=path,
|
||||||
|
relative_path=path.relative_to(base).as_posix(),
|
||||||
|
source_kind="transcript",
|
||||||
|
artifact_kind="session_log",
|
||||||
|
is_text=True,
|
||||||
|
metadata={},
|
||||||
|
)
|
||||||
|
)
|
||||||
|
return rows
|
||||||
|
|
||||||
|
def import_intent(self) -> str:
|
||||||
|
return "grounded_knowledge"
|
||||||
|
|
||||||
|
def build_rows(self, context, sources: list[DiscoveredImportSource]) -> StructuredImportRows | None:
|
||||||
|
return None
|
||||||
|
|
||||||
|
|
||||||
|
register_source_adapter(TranscriptSourceAdapter())
|
||||||
|
|
@ -0,0 +1,9 @@
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
"""Legacy extracted GroundRecall store module.
|
||||||
|
|
||||||
|
Compatibility path retained while the standalone repo converges on the
|
||||||
|
top-level ``groundrecall.store`` module as the primary implementation.
|
||||||
|
"""
|
||||||
|
|
||||||
|
from .store import GroundRecallStore
|
||||||
|
|
@ -0,0 +1,231 @@
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import argparse
|
||||||
|
import json
|
||||||
|
import shutil
|
||||||
|
import socket
|
||||||
|
import subprocess
|
||||||
|
from collections import OrderedDict
|
||||||
|
from dataclasses import dataclass
|
||||||
|
from datetime import datetime, timezone
|
||||||
|
from pathlib import Path
|
||||||
|
from typing import Any
|
||||||
|
|
||||||
|
from .groundrecall_discovery import DiscoveredArtifact
|
||||||
|
from .groundrecall_lint import lint_import_directory
|
||||||
|
from .groundrecall_normalizer import (
|
||||||
|
ImportContext,
|
||||||
|
build_artifact_record,
|
||||||
|
build_claim_record,
|
||||||
|
build_concept_records,
|
||||||
|
build_observation_record,
|
||||||
|
build_relation_records,
|
||||||
|
manifest_record,
|
||||||
|
)
|
||||||
|
from .groundrecall_review_bridge import export_review_bundle_from_import
|
||||||
|
from .groundrecall_review_queue import build_review_queue
|
||||||
|
from .groundrecall_segmenter import SegmentedPage, segment_markdown_artifact
|
||||||
|
from .groundrecall_source_adapters.base import detect_source_adapter
|
||||||
|
import groundrecall.groundrecall_source_adapters # noqa: F401
|
||||||
|
|
||||||
|
|
||||||
|
VALID_MODES = {"archive", "quick", "grounded"}
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class ImportResult:
|
||||||
|
manifest: dict[str, Any]
|
||||||
|
artifacts: list[dict[str, Any]]
|
||||||
|
observations: list[dict[str, Any]]
|
||||||
|
claims: list[dict[str, Any]]
|
||||||
|
concepts: list[dict[str, Any]]
|
||||||
|
relations: list[dict[str, Any]]
|
||||||
|
out_dir: Path
|
||||||
|
|
||||||
|
|
||||||
|
def _timestamp() -> str:
|
||||||
|
return datetime.now(timezone.utc).replace(microsecond=0).isoformat().replace("+00:00", "Z")
|
||||||
|
|
||||||
|
|
||||||
|
def _default_import_id(source_root: Path) -> str:
|
||||||
|
stem = source_root.name.lower().replace("_", "-")
|
||||||
|
stamp = datetime.now(timezone.utc).strftime("%Y%m%dT%H%M%SZ")
|
||||||
|
return f"{stem}-{stamp}"
|
||||||
|
|
||||||
|
|
||||||
|
def _write_json(path: Path, payload: dict[str, Any]) -> None:
|
||||||
|
path.write_text(json.dumps(payload, indent=2), encoding="utf-8")
|
||||||
|
|
||||||
|
|
||||||
|
def _write_jsonl(path: Path, rows: list[dict[str, Any]]) -> None:
|
||||||
|
text = "\n".join(json.dumps(row, sort_keys=True) for row in rows)
|
||||||
|
if text:
|
||||||
|
text += "\n"
|
||||||
|
path.write_text(text, encoding="utf-8")
|
||||||
|
|
||||||
|
|
||||||
|
def _dedupe_by_key(rows: list[dict[str, Any]], key: str) -> list[dict[str, Any]]:
|
||||||
|
unique: OrderedDict[str, dict[str, Any]] = OrderedDict()
|
||||||
|
for row in rows:
|
||||||
|
unique.setdefault(str(row[key]), row)
|
||||||
|
return list(unique.values())
|
||||||
|
|
||||||
|
|
||||||
|
def _convert_tex_to_markdown(path: Path) -> str | None:
|
||||||
|
pandoc = shutil.which("pandoc")
|
||||||
|
if pandoc is None:
|
||||||
|
return None
|
||||||
|
result = subprocess.run(
|
||||||
|
[pandoc, "-f", "latex", "-t", "gfm", str(path)],
|
||||||
|
capture_output=True,
|
||||||
|
text=True,
|
||||||
|
check=False,
|
||||||
|
)
|
||||||
|
if result.returncode != 0:
|
||||||
|
return None
|
||||||
|
markdown = result.stdout.strip()
|
||||||
|
return markdown or None
|
||||||
|
|
||||||
|
|
||||||
|
def _segment_artifact(artifact: DiscoveredArtifact) -> SegmentedPage | None:
|
||||||
|
if not artifact.is_text:
|
||||||
|
return None
|
||||||
|
suffix = artifact.path.suffix.lower()
|
||||||
|
if suffix not in {".md", ".markdown", ".txt", ".tex", ".log"}:
|
||||||
|
return None
|
||||||
|
if suffix == ".tex":
|
||||||
|
converted = _convert_tex_to_markdown(artifact.path)
|
||||||
|
if converted is not None:
|
||||||
|
return segment_markdown_artifact(artifact, text=converted)
|
||||||
|
return segment_markdown_artifact(artifact)
|
||||||
|
|
||||||
|
|
||||||
|
def run_groundrecall_import(
|
||||||
|
source_root: str | Path,
|
||||||
|
out_root: str | Path | None = None,
|
||||||
|
mode: str = "quick",
|
||||||
|
import_id: str | None = None,
|
||||||
|
machine_id: str | None = None,
|
||||||
|
agent_id: str = "groundrecall.ingest",
|
||||||
|
) -> ImportResult:
|
||||||
|
source_path = Path(source_root).resolve()
|
||||||
|
if mode not in VALID_MODES:
|
||||||
|
raise ValueError(f"Unsupported import mode: {mode}")
|
||||||
|
adapter = detect_source_adapter(source_path)
|
||||||
|
discovered = adapter.discover(source_path)
|
||||||
|
artifacts = [
|
||||||
|
DiscoveredArtifact(
|
||||||
|
path=item.path,
|
||||||
|
relative_path=item.relative_path,
|
||||||
|
artifact_kind=item.artifact_kind,
|
||||||
|
is_text=item.is_text,
|
||||||
|
)
|
||||||
|
for item in discovered
|
||||||
|
]
|
||||||
|
actual_import_id = import_id or _default_import_id(source_path)
|
||||||
|
output_root = Path(out_root) if out_root else source_path / "imports"
|
||||||
|
output_dir = output_root / actual_import_id
|
||||||
|
output_dir.mkdir(parents=True, exist_ok=True)
|
||||||
|
|
||||||
|
context = ImportContext(
|
||||||
|
import_id=actual_import_id,
|
||||||
|
import_mode=mode,
|
||||||
|
machine_id=machine_id or socket.gethostname(),
|
||||||
|
agent_id=agent_id,
|
||||||
|
source_root=str(source_path),
|
||||||
|
imported_at=_timestamp(),
|
||||||
|
)
|
||||||
|
|
||||||
|
artifact_rows: list[dict[str, Any]] = []
|
||||||
|
observation_rows: list[dict[str, Any]] = []
|
||||||
|
claim_rows: list[dict[str, Any]] = []
|
||||||
|
concept_rows: list[dict[str, Any]] = []
|
||||||
|
relation_rows: list[dict[str, Any]] = []
|
||||||
|
structured_rows = adapter.build_rows(context, discovered)
|
||||||
|
if structured_rows is not None:
|
||||||
|
artifact_rows.extend(structured_rows.artifact_rows)
|
||||||
|
observation_rows.extend(structured_rows.observation_rows)
|
||||||
|
claim_rows.extend(structured_rows.claim_rows)
|
||||||
|
concept_rows.extend(structured_rows.concept_rows)
|
||||||
|
relation_rows.extend(structured_rows.relation_rows)
|
||||||
|
else:
|
||||||
|
for artifact in artifacts:
|
||||||
|
page = _segment_artifact(artifact)
|
||||||
|
artifact_row = build_artifact_record(context, artifact, page)
|
||||||
|
artifact_rows.append(artifact_row)
|
||||||
|
if page is None:
|
||||||
|
continue
|
||||||
|
|
||||||
|
concept_rows.extend(build_concept_records(context, artifact_row, page.concepts))
|
||||||
|
relation_rows.extend(build_relation_records(context, artifact_row, page.concepts, page.links))
|
||||||
|
|
||||||
|
for index, observation in enumerate(page.observations, start=1):
|
||||||
|
observation_row = build_observation_record(context, artifact_row, observation, index)
|
||||||
|
observation_rows.append(observation_row)
|
||||||
|
if mode == "archive":
|
||||||
|
continue
|
||||||
|
if observation.role not in {"claim", "summary"}:
|
||||||
|
continue
|
||||||
|
claim_rows.append(build_claim_record(context, observation_row, observation, page.concepts[:3], index))
|
||||||
|
|
||||||
|
concept_rows = _dedupe_by_key(concept_rows, "concept_id")
|
||||||
|
relation_rows = _dedupe_by_key(relation_rows, "relation_id")
|
||||||
|
artifact_rows = _dedupe_by_key(artifact_rows, "artifact_id")
|
||||||
|
observation_rows = _dedupe_by_key(observation_rows, "observation_id")
|
||||||
|
claim_rows = _dedupe_by_key(claim_rows, "claim_id")
|
||||||
|
|
||||||
|
manifest = manifest_record(context) | {
|
||||||
|
"source_adapter": adapter.name,
|
||||||
|
"import_intent": adapter.import_intent(),
|
||||||
|
"artifact_count": len(artifact_rows),
|
||||||
|
"observation_count": len(observation_rows),
|
||||||
|
"claim_count": len(claim_rows),
|
||||||
|
"concept_count": len(concept_rows),
|
||||||
|
"relation_count": len(relation_rows),
|
||||||
|
}
|
||||||
|
|
||||||
|
_write_json(output_dir / "manifest.json", manifest)
|
||||||
|
_write_jsonl(output_dir / "artifacts.jsonl", artifact_rows)
|
||||||
|
_write_jsonl(output_dir / "observations.jsonl", observation_rows)
|
||||||
|
_write_jsonl(output_dir / "claims.jsonl", claim_rows)
|
||||||
|
_write_jsonl(output_dir / "concepts.jsonl", concept_rows)
|
||||||
|
_write_jsonl(output_dir / "relations.jsonl", relation_rows)
|
||||||
|
lint_payload = lint_import_directory(output_dir)
|
||||||
|
_write_json(output_dir / "lint_findings.json", lint_payload)
|
||||||
|
review_queue = build_review_queue(output_dir)
|
||||||
|
_write_json(output_dir / "review_queue.json", review_queue)
|
||||||
|
export_review_bundle_from_import(output_dir)
|
||||||
|
|
||||||
|
return ImportResult(
|
||||||
|
manifest=manifest,
|
||||||
|
artifacts=artifact_rows,
|
||||||
|
observations=observation_rows,
|
||||||
|
claims=claim_rows,
|
||||||
|
concepts=concept_rows,
|
||||||
|
relations=relation_rows,
|
||||||
|
out_dir=output_dir,
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
def build_parser() -> argparse.ArgumentParser:
|
||||||
|
parser = argparse.ArgumentParser(description="Import an llmwiki-style repository into GroundRecall import artifacts.")
|
||||||
|
parser.add_argument("source_root")
|
||||||
|
parser.add_argument("--out-root", default=None)
|
||||||
|
parser.add_argument("--mode", choices=sorted(VALID_MODES), default="quick")
|
||||||
|
parser.add_argument("--import-id", default=None)
|
||||||
|
parser.add_argument("--machine-id", default=None)
|
||||||
|
parser.add_argument("--agent-id", default="groundrecall.ingest")
|
||||||
|
return parser
|
||||||
|
|
||||||
|
|
||||||
|
def main() -> None:
|
||||||
|
args = build_parser().parse_args()
|
||||||
|
result = run_groundrecall_import(
|
||||||
|
source_root=args.source_root,
|
||||||
|
out_root=args.out_root,
|
||||||
|
mode=args.mode,
|
||||||
|
import_id=args.import_id,
|
||||||
|
machine_id=args.machine_id,
|
||||||
|
agent_id=args.agent_id,
|
||||||
|
)
|
||||||
|
print(f"Wrote import artifacts to {result.out_dir}")
|
||||||
|
|
@ -0,0 +1,47 @@
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import argparse
|
||||||
|
import json
|
||||||
|
from pathlib import Path
|
||||||
|
from typing import Any
|
||||||
|
|
||||||
|
from .store import GroundRecallStore
|
||||||
|
|
||||||
|
|
||||||
|
def summarize_store(store_dir: str | Path) -> dict[str, Any]:
|
||||||
|
store = GroundRecallStore(store_dir)
|
||||||
|
snapshots = store.list_snapshots()
|
||||||
|
latest_snapshot = max(snapshots, key=lambda item: item.created_at, default=None)
|
||||||
|
return {
|
||||||
|
"store_dir": str(Path(store_dir)),
|
||||||
|
"source_count": len(store.list_sources()),
|
||||||
|
"artifact_count": len(store.list_artifacts()),
|
||||||
|
"observation_count": len(store.list_observations()),
|
||||||
|
"claim_count": len(store.list_claims()),
|
||||||
|
"concept_count": len(store.list_concepts()),
|
||||||
|
"relation_count": len(store.list_relations()),
|
||||||
|
"review_candidate_count": len(store.list_review_candidates()),
|
||||||
|
"promotion_count": len(store.list_promotions()),
|
||||||
|
"snapshot_count": len(snapshots),
|
||||||
|
"latest_snapshot_id": latest_snapshot.snapshot_id if latest_snapshot is not None else "",
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
def inspect_store(store_dir: str | Path, out_path: str | Path | None = None) -> dict[str, Any]:
|
||||||
|
payload = summarize_store(store_dir)
|
||||||
|
if out_path is not None:
|
||||||
|
Path(out_path).write_text(json.dumps(payload, indent=2), encoding="utf-8")
|
||||||
|
return payload
|
||||||
|
|
||||||
|
|
||||||
|
def build_parser() -> argparse.ArgumentParser:
|
||||||
|
parser = argparse.ArgumentParser(description="Inspect canonical GroundRecall store contents.")
|
||||||
|
parser.add_argument("store_dir")
|
||||||
|
parser.add_argument("--out", default=None)
|
||||||
|
return parser
|
||||||
|
|
||||||
|
|
||||||
|
def main() -> None:
|
||||||
|
args = build_parser().parse_args()
|
||||||
|
payload = inspect_store(args.store_dir, out_path=args.out)
|
||||||
|
print(json.dumps(payload, indent=2))
|
||||||
|
|
@ -0,0 +1,196 @@
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import argparse
|
||||||
|
import json
|
||||||
|
from collections import Counter, defaultdict
|
||||||
|
from pathlib import Path
|
||||||
|
from typing import Any
|
||||||
|
|
||||||
|
|
||||||
|
def _read_json(path: Path) -> dict[str, Any]:
|
||||||
|
return json.loads(path.read_text(encoding="utf-8"))
|
||||||
|
|
||||||
|
|
||||||
|
def _read_jsonl(path: Path) -> list[dict[str, Any]]:
|
||||||
|
if not path.exists():
|
||||||
|
return []
|
||||||
|
text = path.read_text(encoding="utf-8").strip()
|
||||||
|
if not text:
|
||||||
|
return []
|
||||||
|
return [json.loads(line) for line in text.splitlines()]
|
||||||
|
|
||||||
|
|
||||||
|
def lint_import_directory(import_dir: str | Path) -> dict[str, Any]:
|
||||||
|
base = Path(import_dir)
|
||||||
|
manifest = _read_json(base / "manifest.json")
|
||||||
|
artifacts = _read_jsonl(base / "artifacts.jsonl")
|
||||||
|
observations = _read_jsonl(base / "observations.jsonl")
|
||||||
|
claims = _read_jsonl(base / "claims.jsonl")
|
||||||
|
concepts = _read_jsonl(base / "concepts.jsonl")
|
||||||
|
relations = _read_jsonl(base / "relations.jsonl")
|
||||||
|
|
||||||
|
findings: list[dict[str, Any]] = []
|
||||||
|
observation_by_id = {row["observation_id"]: row for row in observations}
|
||||||
|
concept_ids = {row["concept_id"] for row in concepts}
|
||||||
|
|
||||||
|
text_counter = Counter(row["claim_text"].strip().lower() for row in claims if row.get("claim_text", "").strip())
|
||||||
|
claim_ids = {row["claim_id"] for row in claims}
|
||||||
|
for claim in claims:
|
||||||
|
claim_text = claim.get("claim_text", "").strip()
|
||||||
|
if not claim.get("source_observation_ids"):
|
||||||
|
findings.append(
|
||||||
|
{
|
||||||
|
"severity": "error",
|
||||||
|
"code": "claim_missing_observation",
|
||||||
|
"target_id": claim["claim_id"],
|
||||||
|
"message": "Claim has no source observation ids.",
|
||||||
|
}
|
||||||
|
)
|
||||||
|
if not claim.get("concept_ids"):
|
||||||
|
findings.append(
|
||||||
|
{
|
||||||
|
"severity": "warning",
|
||||||
|
"code": "claim_missing_concept",
|
||||||
|
"target_id": claim["claim_id"],
|
||||||
|
"message": "Claim is not associated with any concepts.",
|
||||||
|
}
|
||||||
|
)
|
||||||
|
if claim.get("grounding_status") == "ungrounded":
|
||||||
|
findings.append(
|
||||||
|
{
|
||||||
|
"severity": "warning",
|
||||||
|
"code": "claim_ungrounded",
|
||||||
|
"target_id": claim["claim_id"],
|
||||||
|
"message": "Claim is ungrounded and should not be promoted directly.",
|
||||||
|
}
|
||||||
|
)
|
||||||
|
if claim_text and text_counter[claim_text.lower()] > 1:
|
||||||
|
findings.append(
|
||||||
|
{
|
||||||
|
"severity": "warning",
|
||||||
|
"code": "duplicate_claim_text",
|
||||||
|
"target_id": claim["claim_id"],
|
||||||
|
"message": "Claim text duplicates another imported claim.",
|
||||||
|
}
|
||||||
|
)
|
||||||
|
for obs_id in claim.get("source_observation_ids", []):
|
||||||
|
if obs_id not in observation_by_id:
|
||||||
|
findings.append(
|
||||||
|
{
|
||||||
|
"severity": "error",
|
||||||
|
"code": "claim_observation_missing",
|
||||||
|
"target_id": claim["claim_id"],
|
||||||
|
"message": f"Claim references missing observation {obs_id}.",
|
||||||
|
}
|
||||||
|
)
|
||||||
|
for target_claim_id in claim.get("contradicts_claim_ids", []):
|
||||||
|
if target_claim_id not in claim_ids:
|
||||||
|
findings.append(
|
||||||
|
{
|
||||||
|
"severity": "warning",
|
||||||
|
"code": "unresolved_contradiction_ref",
|
||||||
|
"target_id": claim["claim_id"],
|
||||||
|
"message": f"Claim references missing contradiction target {target_claim_id}.",
|
||||||
|
}
|
||||||
|
)
|
||||||
|
for target_claim_id in claim.get("supersedes_claim_ids", []):
|
||||||
|
if target_claim_id not in claim_ids:
|
||||||
|
findings.append(
|
||||||
|
{
|
||||||
|
"severity": "warning",
|
||||||
|
"code": "unresolved_supersession_ref",
|
||||||
|
"target_id": claim["claim_id"],
|
||||||
|
"message": f"Claim references missing supersession target {target_claim_id}.",
|
||||||
|
}
|
||||||
|
)
|
||||||
|
if claim.get("contradicts_claim_ids") and claim.get("supersedes_claim_ids"):
|
||||||
|
findings.append(
|
||||||
|
{
|
||||||
|
"severity": "warning",
|
||||||
|
"code": "claim_mixed_conflict_and_supersession",
|
||||||
|
"target_id": claim["claim_id"],
|
||||||
|
"message": "Claim marks both contradiction and supersession targets; review the intended relation.",
|
||||||
|
}
|
||||||
|
)
|
||||||
|
|
||||||
|
concept_sources: defaultdict[str, set[str]] = defaultdict(set)
|
||||||
|
for claim in claims:
|
||||||
|
for concept_id in claim.get("concept_ids", []):
|
||||||
|
concept_sources[concept_id].add(claim["claim_id"])
|
||||||
|
for relation in relations:
|
||||||
|
concept_sources[relation.get("source_id", "")].add(relation["relation_id"])
|
||||||
|
concept_sources[relation.get("target_id", "")].add(relation["relation_id"])
|
||||||
|
|
||||||
|
for concept in concepts:
|
||||||
|
if not concept_sources.get(concept["concept_id"]):
|
||||||
|
findings.append(
|
||||||
|
{
|
||||||
|
"severity": "warning",
|
||||||
|
"code": "orphan_concept",
|
||||||
|
"target_id": concept["concept_id"],
|
||||||
|
"message": "Concept has no connected claims or relations.",
|
||||||
|
}
|
||||||
|
)
|
||||||
|
|
||||||
|
for relation in relations:
|
||||||
|
if relation.get("source_id") not in concept_ids:
|
||||||
|
findings.append(
|
||||||
|
{
|
||||||
|
"severity": "error",
|
||||||
|
"code": "relation_missing_source",
|
||||||
|
"target_id": relation["relation_id"],
|
||||||
|
"message": f"Relation source {relation.get('source_id')} is missing.",
|
||||||
|
}
|
||||||
|
)
|
||||||
|
if relation.get("target_id") not in concept_ids:
|
||||||
|
findings.append(
|
||||||
|
{
|
||||||
|
"severity": "error",
|
||||||
|
"code": "relation_missing_target",
|
||||||
|
"target_id": relation["relation_id"],
|
||||||
|
"message": f"Relation target {relation.get('target_id')} is missing.",
|
||||||
|
}
|
||||||
|
)
|
||||||
|
|
||||||
|
for observation in observations:
|
||||||
|
role = observation.get("role")
|
||||||
|
if role == "summary" and observation.get("grounding_status") == "ungrounded":
|
||||||
|
findings.append(
|
||||||
|
{
|
||||||
|
"severity": "warning",
|
||||||
|
"code": "ungrounded_summary",
|
||||||
|
"target_id": observation["observation_id"],
|
||||||
|
"message": "Summary observation is ungrounded.",
|
||||||
|
}
|
||||||
|
)
|
||||||
|
|
||||||
|
summary = {
|
||||||
|
"artifact_count": len(artifacts),
|
||||||
|
"observation_count": len(observations),
|
||||||
|
"claim_count": len(claims),
|
||||||
|
"concept_count": len(concepts),
|
||||||
|
"relation_count": len(relations),
|
||||||
|
"error_count": sum(1 for item in findings if item["severity"] == "error"),
|
||||||
|
"warning_count": sum(1 for item in findings if item["severity"] == "warning"),
|
||||||
|
}
|
||||||
|
return {
|
||||||
|
"import_id": manifest["import_id"],
|
||||||
|
"import_mode": manifest["import_mode"],
|
||||||
|
"summary": summary,
|
||||||
|
"findings": findings,
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
def build_parser() -> argparse.ArgumentParser:
|
||||||
|
parser = argparse.ArgumentParser(description="Lint GroundRecall import artifacts.")
|
||||||
|
parser.add_argument("import_dir")
|
||||||
|
parser.add_argument("--out", default=None)
|
||||||
|
return parser
|
||||||
|
|
||||||
|
|
||||||
|
def main() -> None:
|
||||||
|
args = build_parser().parse_args()
|
||||||
|
payload = lint_import_directory(args.import_dir)
|
||||||
|
out_path = Path(args.out) if args.out else Path(args.import_dir) / "lint_findings.json"
|
||||||
|
out_path.write_text(json.dumps(payload, indent=2), encoding="utf-8")
|
||||||
|
print(f"Wrote {out_path}")
|
||||||
|
|
@ -0,0 +1,137 @@
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
from typing import Literal
|
||||||
|
|
||||||
|
from pydantic import BaseModel, Field
|
||||||
|
|
||||||
|
|
||||||
|
LifecycleStatus = Literal["draft", "triaged", "reviewed", "promoted", "superseded", "archived", "rejected"]
|
||||||
|
GroundingStatus = Literal["grounded", "partially_grounded", "ungrounded"]
|
||||||
|
SupportKind = Literal["direct_source", "derived_from_page", "derived_from_session", "inferred", "unknown"]
|
||||||
|
|
||||||
|
|
||||||
|
class ProvenanceRecord(BaseModel):
|
||||||
|
origin_artifact_id: str = ""
|
||||||
|
origin_path: str = ""
|
||||||
|
origin_section: str = ""
|
||||||
|
source_url: str = ""
|
||||||
|
retrieval_date: str = ""
|
||||||
|
machine_id: str = ""
|
||||||
|
session_id: str = ""
|
||||||
|
support_kind: SupportKind = "unknown"
|
||||||
|
grounding_status: GroundingStatus = "ungrounded"
|
||||||
|
|
||||||
|
|
||||||
|
class SourceRecord(BaseModel):
|
||||||
|
source_id: str
|
||||||
|
title: str = ""
|
||||||
|
source_type: str = "document"
|
||||||
|
path: str = ""
|
||||||
|
url: str = ""
|
||||||
|
retrieved_at: str = ""
|
||||||
|
metadata: dict = Field(default_factory=dict)
|
||||||
|
current_status: LifecycleStatus = "draft"
|
||||||
|
|
||||||
|
|
||||||
|
class FragmentRecord(BaseModel):
|
||||||
|
fragment_id: str
|
||||||
|
source_id: str
|
||||||
|
text: str
|
||||||
|
section: str = ""
|
||||||
|
line_start: int = 0
|
||||||
|
line_end: int = 0
|
||||||
|
metadata: dict = Field(default_factory=dict)
|
||||||
|
current_status: LifecycleStatus = "draft"
|
||||||
|
|
||||||
|
|
||||||
|
class ArtifactRecord(BaseModel):
|
||||||
|
artifact_id: str
|
||||||
|
artifact_kind: str
|
||||||
|
title: str = ""
|
||||||
|
path: str = ""
|
||||||
|
sha256: str = ""
|
||||||
|
created_at: str = ""
|
||||||
|
metadata: dict = Field(default_factory=dict)
|
||||||
|
current_status: LifecycleStatus = "draft"
|
||||||
|
|
||||||
|
|
||||||
|
class ObservationRecord(BaseModel):
|
||||||
|
observation_id: str
|
||||||
|
artifact_id: str = ""
|
||||||
|
role: str
|
||||||
|
text: str
|
||||||
|
provenance: ProvenanceRecord = Field(default_factory=ProvenanceRecord)
|
||||||
|
confidence_hint: float = 0.0
|
||||||
|
current_status: LifecycleStatus = "draft"
|
||||||
|
|
||||||
|
|
||||||
|
class ClaimRecord(BaseModel):
|
||||||
|
claim_id: str
|
||||||
|
claim_text: str
|
||||||
|
claim_kind: str = "statement"
|
||||||
|
source_observation_ids: list[str] = Field(default_factory=list)
|
||||||
|
supporting_fragment_ids: list[str] = Field(default_factory=list)
|
||||||
|
concept_ids: list[str] = Field(default_factory=list)
|
||||||
|
contradicts_claim_ids: list[str] = Field(default_factory=list)
|
||||||
|
supersedes_claim_ids: list[str] = Field(default_factory=list)
|
||||||
|
confidence_hint: float = 0.0
|
||||||
|
review_confidence: float = 0.0
|
||||||
|
last_confirmed_at: str = ""
|
||||||
|
provenance: ProvenanceRecord = Field(default_factory=ProvenanceRecord)
|
||||||
|
current_status: LifecycleStatus = "draft"
|
||||||
|
|
||||||
|
|
||||||
|
class ConceptRecord(BaseModel):
|
||||||
|
concept_id: str
|
||||||
|
title: str
|
||||||
|
aliases: list[str] = Field(default_factory=list)
|
||||||
|
description: str = ""
|
||||||
|
source_artifact_ids: list[str] = Field(default_factory=list)
|
||||||
|
current_status: LifecycleStatus = "draft"
|
||||||
|
|
||||||
|
|
||||||
|
class RelationRecord(BaseModel):
|
||||||
|
relation_id: str
|
||||||
|
source_id: str
|
||||||
|
target_id: str
|
||||||
|
relation_type: str
|
||||||
|
evidence_ids: list[str] = Field(default_factory=list)
|
||||||
|
provenance: ProvenanceRecord = Field(default_factory=ProvenanceRecord)
|
||||||
|
current_status: LifecycleStatus = "draft"
|
||||||
|
|
||||||
|
|
||||||
|
class ReviewCandidateRecord(BaseModel):
|
||||||
|
review_candidate_id: str
|
||||||
|
candidate_type: Literal["claim", "concept", "relation"]
|
||||||
|
candidate_id: str
|
||||||
|
triage_lane: str = "knowledge_capture"
|
||||||
|
priority: int = 50
|
||||||
|
finding_codes: list[str] = Field(default_factory=list)
|
||||||
|
rationale: str = ""
|
||||||
|
current_status: LifecycleStatus = "draft"
|
||||||
|
|
||||||
|
|
||||||
|
class PromotionRecord(BaseModel):
|
||||||
|
promotion_id: str
|
||||||
|
candidate_type: Literal["claim", "concept", "relation"]
|
||||||
|
candidate_id: str
|
||||||
|
promotion_target: str = "groundrecall_store"
|
||||||
|
verdict: Literal["approved", "rejected", "superseded"] = "approved"
|
||||||
|
reviewer: str = ""
|
||||||
|
promoted_object_ids: list[str] = Field(default_factory=list)
|
||||||
|
notes: str = ""
|
||||||
|
promoted_at: str = ""
|
||||||
|
|
||||||
|
|
||||||
|
class GroundRecallSnapshot(BaseModel):
|
||||||
|
snapshot_id: str
|
||||||
|
created_at: str
|
||||||
|
sources: list[SourceRecord] = Field(default_factory=list)
|
||||||
|
fragments: list[FragmentRecord] = Field(default_factory=list)
|
||||||
|
artifacts: list[ArtifactRecord] = Field(default_factory=list)
|
||||||
|
observations: list[ObservationRecord] = Field(default_factory=list)
|
||||||
|
claims: list[ClaimRecord] = Field(default_factory=list)
|
||||||
|
concepts: list[ConceptRecord] = Field(default_factory=list)
|
||||||
|
relations: list[RelationRecord] = Field(default_factory=list)
|
||||||
|
promotions: list[PromotionRecord] = Field(default_factory=list)
|
||||||
|
metadata: dict = Field(default_factory=dict)
|
||||||
|
|
@ -0,0 +1,250 @@
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import argparse
|
||||||
|
import json
|
||||||
|
from datetime import datetime, timezone
|
||||||
|
from pathlib import Path
|
||||||
|
from typing import Any
|
||||||
|
|
||||||
|
from .models import (
|
||||||
|
ArtifactRecord,
|
||||||
|
ClaimRecord,
|
||||||
|
ConceptRecord,
|
||||||
|
ObservationRecord,
|
||||||
|
PromotionRecord,
|
||||||
|
ProvenanceRecord,
|
||||||
|
RelationRecord,
|
||||||
|
ReviewCandidateRecord,
|
||||||
|
)
|
||||||
|
from .review_schema import ReviewSession
|
||||||
|
from .store import GroundRecallStore
|
||||||
|
|
||||||
|
|
||||||
|
def _read_json(path: Path) -> dict[str, Any]:
|
||||||
|
return json.loads(path.read_text(encoding="utf-8"))
|
||||||
|
|
||||||
|
|
||||||
|
def _read_jsonl(path: Path) -> list[dict[str, Any]]:
|
||||||
|
if not path.exists():
|
||||||
|
return []
|
||||||
|
text = path.read_text(encoding="utf-8").strip()
|
||||||
|
if not text:
|
||||||
|
return []
|
||||||
|
return [json.loads(line) for line in text.splitlines()]
|
||||||
|
|
||||||
|
|
||||||
|
def _now() -> str:
|
||||||
|
return datetime.now(timezone.utc).replace(microsecond=0).isoformat().replace("+00:00", "Z")
|
||||||
|
|
||||||
|
|
||||||
|
def _review_status_map(status: str) -> str:
|
||||||
|
return {
|
||||||
|
"trusted": "promoted",
|
||||||
|
"provisional": "reviewed",
|
||||||
|
"rejected": "rejected",
|
||||||
|
"needs_review": "triaged",
|
||||||
|
}.get(status, "triaged")
|
||||||
|
|
||||||
|
|
||||||
|
def _provenance_from_payload(payload: dict[str, Any]) -> ProvenanceRecord:
|
||||||
|
return ProvenanceRecord(
|
||||||
|
origin_artifact_id=payload.get("origin_artifact_id", ""),
|
||||||
|
origin_path=payload.get("origin_path", ""),
|
||||||
|
origin_section=payload.get("origin_section", ""),
|
||||||
|
source_url=payload.get("source_url", ""),
|
||||||
|
retrieval_date=payload.get("retrieval_date", ""),
|
||||||
|
machine_id=payload.get("machine_id", ""),
|
||||||
|
session_id=payload.get("session_id", ""),
|
||||||
|
support_kind=payload.get("support_kind", "unknown"),
|
||||||
|
grounding_status=payload.get("grounding_status", "ungrounded"),
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
def promote_import_to_store(
|
||||||
|
import_dir: str | Path,
|
||||||
|
store_dir: str | Path,
|
||||||
|
reviewer: str | None = None,
|
||||||
|
snapshot_id: str | None = None,
|
||||||
|
) -> dict[str, Any]:
|
||||||
|
base = Path(import_dir)
|
||||||
|
manifest = _read_json(base / "manifest.json")
|
||||||
|
review_session = ReviewSession.model_validate_json((base / "review_session.json").read_text(encoding="utf-8"))
|
||||||
|
queue_payload = _read_json(base / "review_queue.json")
|
||||||
|
artifacts = _read_jsonl(base / "artifacts.jsonl")
|
||||||
|
observations = _read_jsonl(base / "observations.jsonl")
|
||||||
|
claims = _read_jsonl(base / "claims.jsonl")
|
||||||
|
concepts = _read_jsonl(base / "concepts.jsonl")
|
||||||
|
relations = _read_jsonl(base / "relations.jsonl")
|
||||||
|
|
||||||
|
store = GroundRecallStore(store_dir)
|
||||||
|
reviewed_by_concept = {entry.concept_id: entry for entry in review_session.draft_pack.concepts}
|
||||||
|
promoted_claim_ids: list[str] = []
|
||||||
|
promoted_concept_ids: list[str] = []
|
||||||
|
promoted_relation_ids: list[str] = []
|
||||||
|
|
||||||
|
for artifact in artifacts:
|
||||||
|
store.save_artifact(
|
||||||
|
ArtifactRecord(
|
||||||
|
artifact_id=artifact["artifact_id"],
|
||||||
|
artifact_kind=artifact["artifact_kind"],
|
||||||
|
title=artifact.get("title", ""),
|
||||||
|
path=artifact.get("path", ""),
|
||||||
|
sha256=artifact.get("sha256", ""),
|
||||||
|
created_at=artifact.get("created_at", ""),
|
||||||
|
metadata=dict(artifact.get("metadata", {})),
|
||||||
|
current_status="reviewed",
|
||||||
|
)
|
||||||
|
)
|
||||||
|
|
||||||
|
for observation in observations:
|
||||||
|
store.save_observation(
|
||||||
|
ObservationRecord(
|
||||||
|
observation_id=observation["observation_id"],
|
||||||
|
artifact_id=observation.get("artifact_id", ""),
|
||||||
|
role=observation.get("role", "summary"),
|
||||||
|
text=observation.get("text", ""),
|
||||||
|
provenance=_provenance_from_payload(observation),
|
||||||
|
confidence_hint=float(observation.get("confidence_hint", 0.0)),
|
||||||
|
current_status="reviewed",
|
||||||
|
)
|
||||||
|
)
|
||||||
|
|
||||||
|
for concept in concepts:
|
||||||
|
short_id = concept["concept_id"].replace("concept::", "", 1)
|
||||||
|
review_entry = reviewed_by_concept.get(short_id)
|
||||||
|
current_status = _review_status_map(review_entry.status if review_entry else concept.get("current_status", "triaged"))
|
||||||
|
record = store.save_concept(
|
||||||
|
ConceptRecord(
|
||||||
|
concept_id=concept["concept_id"],
|
||||||
|
title=review_entry.title if review_entry else concept.get("title", concept["concept_id"]),
|
||||||
|
aliases=list(concept.get("aliases", [])),
|
||||||
|
description=review_entry.description if review_entry else concept.get("description", ""),
|
||||||
|
source_artifact_ids=list(concept.get("source_artifact_ids", [])),
|
||||||
|
current_status=current_status, # type: ignore[arg-type]
|
||||||
|
)
|
||||||
|
)
|
||||||
|
if record.current_status in {"promoted", "reviewed"}:
|
||||||
|
promoted_concept_ids.append(record.concept_id)
|
||||||
|
|
||||||
|
reviewed_concept_ids = set(promoted_concept_ids)
|
||||||
|
for claim in claims:
|
||||||
|
concept_ids = list(claim.get("concept_ids", []))
|
||||||
|
statuses = []
|
||||||
|
for concept_id in concept_ids:
|
||||||
|
short_id = concept_id.replace("concept::", "", 1)
|
||||||
|
review_entry = reviewed_by_concept.get(short_id)
|
||||||
|
statuses.append(_review_status_map(review_entry.status) if review_entry else "triaged")
|
||||||
|
if statuses and all(status == "rejected" for status in statuses):
|
||||||
|
current_status = "rejected"
|
||||||
|
elif statuses and any(status == "promoted" for status in statuses):
|
||||||
|
current_status = "promoted"
|
||||||
|
elif statuses and any(status == "reviewed" for status in statuses):
|
||||||
|
current_status = "reviewed"
|
||||||
|
else:
|
||||||
|
current_status = "triaged"
|
||||||
|
record = store.save_claim(
|
||||||
|
ClaimRecord(
|
||||||
|
claim_id=claim["claim_id"],
|
||||||
|
claim_text=claim.get("claim_text", ""),
|
||||||
|
claim_kind=claim.get("claim_kind", "statement"),
|
||||||
|
source_observation_ids=list(claim.get("source_observation_ids", [])),
|
||||||
|
supporting_fragment_ids=list(claim.get("supporting_fragment_ids", [])),
|
||||||
|
concept_ids=concept_ids,
|
||||||
|
contradicts_claim_ids=list(claim.get("contradicts_claim_ids", [])),
|
||||||
|
supersedes_claim_ids=list(claim.get("supersedes_claim_ids", [])),
|
||||||
|
confidence_hint=float(claim.get("confidence_hint", 0.0)),
|
||||||
|
review_confidence=float(claim.get("review_confidence", 0.0)),
|
||||||
|
last_confirmed_at=claim.get("last_confirmed_at", ""),
|
||||||
|
provenance=_provenance_from_payload(claim),
|
||||||
|
current_status=current_status, # type: ignore[arg-type]
|
||||||
|
)
|
||||||
|
)
|
||||||
|
if record.current_status in {"promoted", "reviewed"}:
|
||||||
|
promoted_claim_ids.append(record.claim_id)
|
||||||
|
|
||||||
|
for relation in relations:
|
||||||
|
src_ok = relation.get("source_id") in reviewed_concept_ids
|
||||||
|
tgt_ok = relation.get("target_id") in reviewed_concept_ids
|
||||||
|
current_status = "promoted" if src_ok and tgt_ok else "triaged"
|
||||||
|
record = store.save_relation(
|
||||||
|
RelationRecord(
|
||||||
|
relation_id=relation["relation_id"],
|
||||||
|
source_id=relation.get("source_id", ""),
|
||||||
|
target_id=relation.get("target_id", ""),
|
||||||
|
relation_type=relation.get("relation_type", "references"),
|
||||||
|
evidence_ids=list(relation.get("evidence_ids", [])),
|
||||||
|
provenance=_provenance_from_payload(relation),
|
||||||
|
current_status=current_status, # type: ignore[arg-type]
|
||||||
|
)
|
||||||
|
)
|
||||||
|
if record.current_status in {"promoted", "reviewed"}:
|
||||||
|
promoted_relation_ids.append(record.relation_id)
|
||||||
|
|
||||||
|
for item in queue_payload.get("items", []):
|
||||||
|
store.save_review_candidate(
|
||||||
|
ReviewCandidateRecord(
|
||||||
|
review_candidate_id=item["queue_id"],
|
||||||
|
candidate_type=item["candidate_type"],
|
||||||
|
candidate_id=item["candidate_id"],
|
||||||
|
triage_lane=item.get("triage_lane", "knowledge_capture"),
|
||||||
|
priority=int(item.get("priority", 50)),
|
||||||
|
finding_codes=list(item.get("finding_codes", [])),
|
||||||
|
rationale=item.get("title", ""),
|
||||||
|
current_status="reviewed" if item["candidate_id"] in set(promoted_claim_ids + promoted_concept_ids + promoted_relation_ids) else "triaged",
|
||||||
|
)
|
||||||
|
)
|
||||||
|
|
||||||
|
promotion = store.save_promotion(
|
||||||
|
PromotionRecord(
|
||||||
|
promotion_id=f"promotion-{manifest['import_id']}",
|
||||||
|
candidate_type="concept",
|
||||||
|
candidate_id=manifest["import_id"],
|
||||||
|
promotion_target="groundrecall_store",
|
||||||
|
verdict="approved",
|
||||||
|
reviewer=reviewer or review_session.reviewer,
|
||||||
|
promoted_object_ids=promoted_concept_ids + promoted_claim_ids + promoted_relation_ids,
|
||||||
|
notes=f"Promoted import {manifest['import_id']} into GroundRecallStore.",
|
||||||
|
promoted_at=_now(),
|
||||||
|
)
|
||||||
|
)
|
||||||
|
|
||||||
|
built_snapshot = store.build_snapshot(
|
||||||
|
snapshot_id=snapshot_id or f"snapshot-{manifest['import_id']}",
|
||||||
|
created_at=_now(),
|
||||||
|
metadata={
|
||||||
|
"source_import_id": manifest["import_id"],
|
||||||
|
"reviewer": reviewer or review_session.reviewer,
|
||||||
|
"export_kind": "canonical",
|
||||||
|
},
|
||||||
|
)
|
||||||
|
store.save_snapshot(built_snapshot)
|
||||||
|
|
||||||
|
return {
|
||||||
|
"import_id": manifest["import_id"],
|
||||||
|
"store_dir": str(Path(store_dir)),
|
||||||
|
"promotion_id": promotion.promotion_id,
|
||||||
|
"promoted_concept_count": len(promoted_concept_ids),
|
||||||
|
"promoted_claim_count": len(promoted_claim_ids),
|
||||||
|
"promoted_relation_count": len(promoted_relation_ids),
|
||||||
|
"snapshot_id": built_snapshot.snapshot_id,
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
def build_parser() -> argparse.ArgumentParser:
|
||||||
|
parser = argparse.ArgumentParser(description="Promote a GroundRecall import into canonical store objects.")
|
||||||
|
parser.add_argument("import_dir")
|
||||||
|
parser.add_argument("store_dir")
|
||||||
|
parser.add_argument("--reviewer", default=None)
|
||||||
|
parser.add_argument("--snapshot-id", default=None)
|
||||||
|
return parser
|
||||||
|
|
||||||
|
|
||||||
|
def main() -> None:
|
||||||
|
args = build_parser().parse_args()
|
||||||
|
payload = promote_import_to_store(
|
||||||
|
import_dir=args.import_dir,
|
||||||
|
store_dir=args.store_dir,
|
||||||
|
reviewer=args.reviewer,
|
||||||
|
snapshot_id=args.snapshot_id,
|
||||||
|
)
|
||||||
|
print(json.dumps(payload, indent=2))
|
||||||
|
|
@ -0,0 +1,188 @@
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import argparse
|
||||||
|
import json
|
||||||
|
from pathlib import Path
|
||||||
|
from typing import Any
|
||||||
|
|
||||||
|
from .store import GroundRecallStore
|
||||||
|
|
||||||
|
|
||||||
|
def _normalize(text: str) -> str:
|
||||||
|
return " ".join(text.lower().split())
|
||||||
|
|
||||||
|
|
||||||
|
def _matches(query: str, *values: str) -> bool:
|
||||||
|
needle = _normalize(query)
|
||||||
|
return any(needle in _normalize(value) for value in values if value)
|
||||||
|
|
||||||
|
|
||||||
|
def query_concept(store_dir: str | Path, concept_ref: str) -> dict[str, Any] | None:
|
||||||
|
store = GroundRecallStore(store_dir)
|
||||||
|
concepts = store.list_concepts()
|
||||||
|
concept = next(
|
||||||
|
(
|
||||||
|
item
|
||||||
|
for item in concepts
|
||||||
|
if concept_ref == item.concept_id
|
||||||
|
or concept_ref == item.concept_id.replace("concept::", "", 1)
|
||||||
|
or _matches(concept_ref, item.title, item.description, *item.aliases)
|
||||||
|
),
|
||||||
|
None,
|
||||||
|
)
|
||||||
|
if concept is None:
|
||||||
|
return None
|
||||||
|
|
||||||
|
claims = [item for item in store.list_claims() if concept.concept_id in item.concept_ids and item.current_status != "rejected"]
|
||||||
|
relations = [
|
||||||
|
item
|
||||||
|
for item in store.list_relations()
|
||||||
|
if (item.source_id == concept.concept_id or item.target_id == concept.concept_id) and item.current_status != "rejected"
|
||||||
|
]
|
||||||
|
artifacts = {item.artifact_id: item for item in store.list_artifacts()}
|
||||||
|
observations = {item.observation_id: item for item in store.list_observations()}
|
||||||
|
|
||||||
|
supporting_observations = []
|
||||||
|
for claim in claims:
|
||||||
|
for observation_id in claim.source_observation_ids:
|
||||||
|
observation = observations.get(observation_id)
|
||||||
|
if observation is not None:
|
||||||
|
supporting_observations.append(
|
||||||
|
{
|
||||||
|
"observation_id": observation.observation_id,
|
||||||
|
"text": observation.text,
|
||||||
|
"role": observation.role,
|
||||||
|
"origin_path": observation.provenance.origin_path,
|
||||||
|
"grounding_status": observation.provenance.grounding_status,
|
||||||
|
}
|
||||||
|
)
|
||||||
|
|
||||||
|
related_concept_ids = sorted(
|
||||||
|
{
|
||||||
|
relation.target_id if relation.source_id == concept.concept_id else relation.source_id
|
||||||
|
for relation in relations
|
||||||
|
if relation.source_id != relation.target_id
|
||||||
|
}
|
||||||
|
)
|
||||||
|
related_concepts = [item.model_dump() for item in concepts if item.concept_id in related_concept_ids]
|
||||||
|
|
||||||
|
source_artifacts = [
|
||||||
|
artifact.model_dump()
|
||||||
|
for artifact in artifacts.values()
|
||||||
|
if artifact.artifact_id in set(concept.source_artifact_ids)
|
||||||
|
]
|
||||||
|
|
||||||
|
return {
|
||||||
|
"query_type": "concept",
|
||||||
|
"concept": concept.model_dump(),
|
||||||
|
"claims": [item.model_dump() for item in claims],
|
||||||
|
"relations": [item.model_dump() for item in relations],
|
||||||
|
"related_concepts": related_concepts,
|
||||||
|
"supporting_observations": supporting_observations,
|
||||||
|
"source_artifacts": source_artifacts,
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
def search_claims(
|
||||||
|
store_dir: str | Path,
|
||||||
|
text: str,
|
||||||
|
include_rejected: bool = False,
|
||||||
|
limit: int = 20,
|
||||||
|
) -> dict[str, Any]:
|
||||||
|
store = GroundRecallStore(store_dir)
|
||||||
|
concepts = {item.concept_id: item for item in store.list_concepts()}
|
||||||
|
matches = []
|
||||||
|
for claim in store.list_claims():
|
||||||
|
if not include_rejected and claim.current_status == "rejected":
|
||||||
|
continue
|
||||||
|
concept_titles = [concepts[concept_id].title for concept_id in claim.concept_ids if concept_id in concepts]
|
||||||
|
if _matches(text, claim.claim_text, *concept_titles):
|
||||||
|
matches.append(
|
||||||
|
{
|
||||||
|
"claim": claim.model_dump(),
|
||||||
|
"concept_titles": concept_titles,
|
||||||
|
"provenance": claim.provenance.model_dump(),
|
||||||
|
}
|
||||||
|
)
|
||||||
|
if len(matches) >= limit:
|
||||||
|
break
|
||||||
|
return {
|
||||||
|
"query_type": "claim_search",
|
||||||
|
"query": text,
|
||||||
|
"matches": matches,
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
def query_provenance(
|
||||||
|
store_dir: str | Path,
|
||||||
|
origin_path: str | None = None,
|
||||||
|
source_url: str | None = None,
|
||||||
|
) -> dict[str, Any]:
|
||||||
|
store = GroundRecallStore(store_dir)
|
||||||
|
claims = []
|
||||||
|
observations = []
|
||||||
|
for claim in store.list_claims():
|
||||||
|
if origin_path and claim.provenance.origin_path == origin_path:
|
||||||
|
claims.append(claim.model_dump())
|
||||||
|
continue
|
||||||
|
if source_url and claim.provenance.source_url == source_url:
|
||||||
|
claims.append(claim.model_dump())
|
||||||
|
for observation in store.list_observations():
|
||||||
|
if origin_path and observation.provenance.origin_path == origin_path:
|
||||||
|
observations.append(observation.model_dump())
|
||||||
|
continue
|
||||||
|
if source_url and observation.provenance.source_url == source_url:
|
||||||
|
observations.append(observation.model_dump())
|
||||||
|
return {
|
||||||
|
"query_type": "provenance",
|
||||||
|
"origin_path": origin_path or "",
|
||||||
|
"source_url": source_url or "",
|
||||||
|
"claims": claims,
|
||||||
|
"observations": observations,
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
def build_query_bundle_for_concept(store_dir: str | Path, concept_ref: str) -> dict[str, Any] | None:
|
||||||
|
payload = query_concept(store_dir, concept_ref)
|
||||||
|
if payload is None:
|
||||||
|
return None
|
||||||
|
claims = payload["claims"]
|
||||||
|
contradictions = [item for item in claims if item.get("contradicts_claim_ids")]
|
||||||
|
supersessions = [item for item in claims if item.get("supersedes_claim_ids")]
|
||||||
|
return {
|
||||||
|
"bundle_kind": "groundrecall_query_bundle",
|
||||||
|
"query_type": "concept",
|
||||||
|
"concept": payload["concept"],
|
||||||
|
"relevant_claims": claims,
|
||||||
|
"supporting_observations": payload["supporting_observations"],
|
||||||
|
"related_concepts": payload["related_concepts"],
|
||||||
|
"contradictions": contradictions,
|
||||||
|
"supersessions": supersessions,
|
||||||
|
"suggested_next_actions": [
|
||||||
|
"Review promoted claims with low review confidence.",
|
||||||
|
"Inspect supporting observations before exporting assistant context.",
|
||||||
|
"Check related concepts for hidden prerequisite or contradiction edges.",
|
||||||
|
],
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
def build_parser() -> argparse.ArgumentParser:
|
||||||
|
parser = argparse.ArgumentParser(description="Query canonical GroundRecall objects.")
|
||||||
|
parser.add_argument("store_dir")
|
||||||
|
parser.add_argument("query")
|
||||||
|
parser.add_argument("--kind", choices=["concept", "claim", "provenance", "bundle"], default="concept")
|
||||||
|
parser.add_argument("--source-url", default=None)
|
||||||
|
return parser
|
||||||
|
|
||||||
|
|
||||||
|
def main() -> None:
|
||||||
|
args = build_parser().parse_args()
|
||||||
|
if args.kind == "concept":
|
||||||
|
payload = query_concept(args.store_dir, args.query)
|
||||||
|
elif args.kind == "claim":
|
||||||
|
payload = search_claims(args.store_dir, args.query)
|
||||||
|
elif args.kind == "provenance":
|
||||||
|
payload = query_provenance(args.store_dir, origin_path=args.query, source_url=args.source_url)
|
||||||
|
else:
|
||||||
|
payload = build_query_bundle_for_concept(args.store_dir, args.query)
|
||||||
|
print(json.dumps(payload, indent=2))
|
||||||
|
|
@ -0,0 +1,366 @@
|
||||||
|
const state = {
|
||||||
|
reviewData: null,
|
||||||
|
selectedConceptId: null,
|
||||||
|
selectedCitationId: null,
|
||||||
|
conceptSearch: "",
|
||||||
|
citationFilter: "all",
|
||||||
|
message: "",
|
||||||
|
verificationResult: null,
|
||||||
|
};
|
||||||
|
|
||||||
|
function escapeHtml(value) {
|
||||||
|
return String(value ?? "")
|
||||||
|
.replaceAll("&", "&")
|
||||||
|
.replaceAll("<", "<")
|
||||||
|
.replaceAll(">", ">")
|
||||||
|
.replaceAll('"', """);
|
||||||
|
}
|
||||||
|
|
||||||
|
function splitLines(value) {
|
||||||
|
return String(value || "")
|
||||||
|
.split("\n")
|
||||||
|
.map((line) => line.trim())
|
||||||
|
.filter(Boolean);
|
||||||
|
}
|
||||||
|
|
||||||
|
function conceptRows() {
|
||||||
|
const data = state.reviewData;
|
||||||
|
if (!data) return [];
|
||||||
|
const reviewById = new Map((data.concept_reviews || []).map((item) => [item.concept_id, item]));
|
||||||
|
return (data.draft_pack?.concepts || []).map((concept) => ({
|
||||||
|
...concept,
|
||||||
|
review: reviewById.get(concept.concept_id) || null,
|
||||||
|
}));
|
||||||
|
}
|
||||||
|
|
||||||
|
function citationRows() {
|
||||||
|
return state.reviewData?.citation_reviews || [];
|
||||||
|
}
|
||||||
|
|
||||||
|
function selectedConcept() {
|
||||||
|
return conceptRows().find((item) => item.concept_id === state.selectedConceptId) || conceptRows()[0] || null;
|
||||||
|
}
|
||||||
|
|
||||||
|
function selectedCitation() {
|
||||||
|
return citationRows().find((item) => item.citation_review_id === state.selectedCitationId) || citationRows()[0] || null;
|
||||||
|
}
|
||||||
|
|
||||||
|
async function loadReviewData() {
|
||||||
|
const response = await fetch("/api/load");
|
||||||
|
const payload = await response.json();
|
||||||
|
state.reviewData = payload.review_data;
|
||||||
|
if (!state.selectedConceptId && conceptRows()[0]) {
|
||||||
|
state.selectedConceptId = conceptRows()[0].concept_id;
|
||||||
|
}
|
||||||
|
if (!state.selectedCitationId && citationRows()[0]) {
|
||||||
|
state.selectedCitationId = citationRows()[0].citation_review_id;
|
||||||
|
}
|
||||||
|
render();
|
||||||
|
}
|
||||||
|
|
||||||
|
async function saveConcept(form) {
|
||||||
|
const payload = {
|
||||||
|
concept_updates: [
|
||||||
|
{
|
||||||
|
concept_id: form.get("concept_id"),
|
||||||
|
status: form.get("status"),
|
||||||
|
description: form.get("description"),
|
||||||
|
prerequisites: splitLines(form.get("prerequisites")),
|
||||||
|
notes: splitLines(form.get("notes")),
|
||||||
|
},
|
||||||
|
],
|
||||||
|
};
|
||||||
|
const response = await fetch("/api/save", {
|
||||||
|
method: "POST",
|
||||||
|
headers: { "Content-Type": "application/json" },
|
||||||
|
body: JSON.stringify(payload),
|
||||||
|
});
|
||||||
|
const result = await response.json();
|
||||||
|
state.reviewData = result.review_data;
|
||||||
|
state.message = `Saved concept ${payload.concept_updates[0].concept_id}.`;
|
||||||
|
render();
|
||||||
|
}
|
||||||
|
|
||||||
|
async function saveCitation(form) {
|
||||||
|
const payload = {
|
||||||
|
citation_updates: [
|
||||||
|
{
|
||||||
|
citation_review_id: form.get("citation_review_id"),
|
||||||
|
status: form.get("status"),
|
||||||
|
notes: splitLines(form.get("notes")),
|
||||||
|
},
|
||||||
|
],
|
||||||
|
};
|
||||||
|
const response = await fetch("/api/save", {
|
||||||
|
method: "POST",
|
||||||
|
headers: { "Content-Type": "application/json" },
|
||||||
|
body: JSON.stringify(payload),
|
||||||
|
});
|
||||||
|
const result = await response.json();
|
||||||
|
state.reviewData = result.review_data;
|
||||||
|
state.message = `Saved citation review ${payload.citation_updates[0].citation_review_id}.`;
|
||||||
|
render();
|
||||||
|
}
|
||||||
|
|
||||||
|
async function verifyCitation(citationReviewId) {
|
||||||
|
const response = await fetch("/api/citations/verify", {
|
||||||
|
method: "POST",
|
||||||
|
headers: { "Content-Type": "application/json" },
|
||||||
|
body: JSON.stringify({ citation_review_id: citationReviewId }),
|
||||||
|
});
|
||||||
|
state.verificationResult = await response.json();
|
||||||
|
state.message = `Verification run for ${citationReviewId}.`;
|
||||||
|
render();
|
||||||
|
}
|
||||||
|
|
||||||
|
function statusOptions(specs, selectedValue) {
|
||||||
|
return (specs?.options || [])
|
||||||
|
.map((option) => `<option value="${escapeHtml(option.value)}"${option.value === selectedValue ? " selected" : ""}>${escapeHtml(option.label)}</option>`)
|
||||||
|
.join("");
|
||||||
|
}
|
||||||
|
|
||||||
|
function renderConceptPanel(concept) {
|
||||||
|
if (!concept) {
|
||||||
|
return `<section class="panel"><h2>No concept selected</h2></section>`;
|
||||||
|
}
|
||||||
|
const review = concept.review || {};
|
||||||
|
const statusSpec = (state.reviewData.field_specs || []).find((item) => item.field === "status");
|
||||||
|
const guidance = (state.reviewData.review_guidance?.priorities || []).map((item) => `<li>${escapeHtml(item)}</li>`).join("");
|
||||||
|
const claims = (review.top_claims || []).map((claim) => `
|
||||||
|
<article class="claim-card">
|
||||||
|
<div class="claim-head">
|
||||||
|
<strong>${escapeHtml(claim.claim_kind || "claim")}</strong>
|
||||||
|
<span class="chip">${escapeHtml(claim.grounding_status || "unknown")}</span>
|
||||||
|
</div>
|
||||||
|
<p>${escapeHtml(claim.claim_text || "")}</p>
|
||||||
|
<div class="tiny">Artifacts: ${escapeHtml((claim.artifact_paths || []).join(", ") || "none")}</div>
|
||||||
|
${(claim.supporting_observations || []).slice(0, 2).map((obs) => `
|
||||||
|
<div class="support-block">
|
||||||
|
<div class="tiny">${escapeHtml(obs.origin_path || "")}${obs.line_start ? `:${obs.line_start}` : ""}</div>
|
||||||
|
<div>${escapeHtml(obs.text || "")}</div>
|
||||||
|
</div>
|
||||||
|
`).join("")}
|
||||||
|
</article>
|
||||||
|
`).join("");
|
||||||
|
|
||||||
|
return `
|
||||||
|
<section class="panel detail">
|
||||||
|
<div class="panel-head">
|
||||||
|
<div>
|
||||||
|
<h2>${escapeHtml(concept.title)}</h2>
|
||||||
|
<div class="muted">${escapeHtml(concept.concept_id)} · claims ${escapeHtml(review.claim_count || 0)} · grounded ${escapeHtml(review.grounded_claim_count || 0)} · warnings ${escapeHtml(review.warning_count || 0)}</div>
|
||||||
|
</div>
|
||||||
|
<div class="pill ${review.has_citation_support ? "pill-good" : "pill-warn"}">${review.has_citation_support ? "citation-bearing" : "no citation support"}</div>
|
||||||
|
</div>
|
||||||
|
<p class="help">${escapeHtml(review.review_help || "")}</p>
|
||||||
|
<form id="concept-form">
|
||||||
|
<input type="hidden" name="concept_id" value="${escapeHtml(concept.concept_id)}" />
|
||||||
|
<label>
|
||||||
|
<span>Review status</span>
|
||||||
|
<select name="status">${statusOptions(statusSpec, concept.status)}</select>
|
||||||
|
</label>
|
||||||
|
<label>
|
||||||
|
<span>Description</span>
|
||||||
|
<textarea name="description" rows="3">${escapeHtml(concept.description || "")}</textarea>
|
||||||
|
</label>
|
||||||
|
<label>
|
||||||
|
<span>Prerequisites</span>
|
||||||
|
<textarea name="prerequisites" rows="3">${escapeHtml((concept.prerequisites || []).join("\n"))}</textarea>
|
||||||
|
</label>
|
||||||
|
<label>
|
||||||
|
<span>Reviewer notes</span>
|
||||||
|
<textarea name="notes" rows="5">${escapeHtml((concept.notes || []).join("\n"))}</textarea>
|
||||||
|
</label>
|
||||||
|
<div class="actions">
|
||||||
|
<button type="submit" class="primary">Save Concept Review</button>
|
||||||
|
</div>
|
||||||
|
</form>
|
||||||
|
<section class="subpanel">
|
||||||
|
<h3>Reviewer guidance</h3>
|
||||||
|
<ul>${guidance}</ul>
|
||||||
|
</section>
|
||||||
|
<section class="subpanel">
|
||||||
|
<h3>Representative claims</h3>
|
||||||
|
<div class="stack">${claims || "<div class=\"muted\">No representative claims available.</div>"}</div>
|
||||||
|
</section>
|
||||||
|
</section>
|
||||||
|
`;
|
||||||
|
}
|
||||||
|
|
||||||
|
function renderCitationPanel(citation) {
|
||||||
|
const statusSpec = (state.reviewData.citation_field_specs || []).find((item) => item.field === "status");
|
||||||
|
const nextActions = (state.reviewData.citations?.next_actions || []).map((item) => `<li>${escapeHtml(item)}</li>`).join("");
|
||||||
|
if (!citation) {
|
||||||
|
return `<section class="panel"><h2>No citation selected</h2></section>`;
|
||||||
|
}
|
||||||
|
return `
|
||||||
|
<section class="panel detail">
|
||||||
|
<div class="panel-head">
|
||||||
|
<div>
|
||||||
|
<h2>Citation lane</h2>
|
||||||
|
<div class="muted">${escapeHtml(citation.source_kind)} · ${escapeHtml(citation.artifact_path || citation.locator || "")}</div>
|
||||||
|
</div>
|
||||||
|
<div class="pill">${escapeHtml(citation.status)}</div>
|
||||||
|
</div>
|
||||||
|
<form id="citation-form">
|
||||||
|
<input type="hidden" name="citation_review_id" value="${escapeHtml(citation.citation_review_id)}" />
|
||||||
|
<label>
|
||||||
|
<span>Status</span>
|
||||||
|
<select name="status">${statusOptions(statusSpec, citation.status)}</select>
|
||||||
|
</label>
|
||||||
|
<label>
|
||||||
|
<span>Citation key</span>
|
||||||
|
<input value="${escapeHtml(citation.citation_key || "")}" disabled />
|
||||||
|
</label>
|
||||||
|
<label>
|
||||||
|
<span>Reference title</span>
|
||||||
|
<input value="${escapeHtml(citation.title || "")}" disabled />
|
||||||
|
</label>
|
||||||
|
<label>
|
||||||
|
<span>Bibliography source</span>
|
||||||
|
<input value="${escapeHtml(citation.source_bib_path || "")}" disabled />
|
||||||
|
</label>
|
||||||
|
<label>
|
||||||
|
<span>Reviewer notes</span>
|
||||||
|
<textarea name="notes" rows="5">${escapeHtml((citation.notes || []).join("\n"))}</textarea>
|
||||||
|
</label>
|
||||||
|
<div class="tiny">Related concepts: ${escapeHtml((citation.related_concept_ids || []).join(", ") || "none")}</div>
|
||||||
|
<div class="tiny">Related claims: ${escapeHtml((citation.related_claim_ids || []).join(", ") || "none")}</div>
|
||||||
|
<div class="actions">
|
||||||
|
<button type="button" id="verify-citation" class="secondary">Verify With CiteGeist</button>
|
||||||
|
<button type="submit" class="primary">Save Citation Review</button>
|
||||||
|
</div>
|
||||||
|
</form>
|
||||||
|
<section class="subpanel">
|
||||||
|
<h3>Citation guidance</h3>
|
||||||
|
<ul>${(state.reviewData.review_guidance?.citation_guidance || []).map((item) => `<li>${escapeHtml(item)}</li>`).join("")}</ul>
|
||||||
|
</section>
|
||||||
|
<section class="subpanel">
|
||||||
|
<h3>Next actions</h3>
|
||||||
|
<ul>${nextActions}</ul>
|
||||||
|
</section>
|
||||||
|
<section class="subpanel">
|
||||||
|
<h3>Verification</h3>
|
||||||
|
${
|
||||||
|
state.verificationResult && state.verificationResult.citation_review_id === citation.citation_review_id
|
||||||
|
? `<pre class="json-block">${escapeHtml(JSON.stringify(state.verificationResult, null, 2))}</pre>`
|
||||||
|
: `<div class="muted">Run CiteGeist verification to inspect the stored entry and candidate matches.</div>`
|
||||||
|
}
|
||||||
|
</section>
|
||||||
|
</section>
|
||||||
|
`;
|
||||||
|
}
|
||||||
|
|
||||||
|
function render() {
|
||||||
|
const app = document.getElementById("app");
|
||||||
|
if (!state.reviewData) {
|
||||||
|
app.innerHTML = `<main class="shell"><section class="panel"><h1>Loading review data…</h1></section></main>`;
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
const summary = state.reviewData.import_context?.manifest || {};
|
||||||
|
const conceptList = conceptRows().filter((item) => {
|
||||||
|
const needle = state.conceptSearch.trim().toLowerCase();
|
||||||
|
return !needle || item.title.toLowerCase().includes(needle) || item.concept_id.toLowerCase().includes(needle);
|
||||||
|
});
|
||||||
|
const citationList = citationRows().filter((item) => {
|
||||||
|
if (state.citationFilter === "all") return true;
|
||||||
|
return item.status === state.citationFilter;
|
||||||
|
});
|
||||||
|
const concept = selectedConcept();
|
||||||
|
const citation = selectedCitation();
|
||||||
|
|
||||||
|
app.innerHTML = `
|
||||||
|
<main class="shell">
|
||||||
|
<header class="hero">
|
||||||
|
<div>
|
||||||
|
<h1>GroundRecall Review Workbench</h1>
|
||||||
|
<p>Concept-first review with a dedicated citation lane for academic imports.</p>
|
||||||
|
<div class="muted">${escapeHtml(summary.import_id || "")} · ${escapeHtml(summary.source_root || "")}</div>
|
||||||
|
${state.message ? `<div class="message">${escapeHtml(state.message)}</div>` : ""}
|
||||||
|
</div>
|
||||||
|
<div class="hero-stats">
|
||||||
|
<div class="stat"><strong>${escapeHtml(summary.artifact_count || 0)}</strong><span>artifacts</span></div>
|
||||||
|
<div class="stat"><strong>${escapeHtml(summary.claim_count || 0)}</strong><span>claims</span></div>
|
||||||
|
<div class="stat"><strong>${escapeHtml(summary.concept_count || 0)}</strong><span>concepts</span></div>
|
||||||
|
<div class="stat"><strong>${escapeHtml(state.reviewData.citations?.summary?.citation_key_total || 0)}</strong><span>citation keys</span></div>
|
||||||
|
</div>
|
||||||
|
</header>
|
||||||
|
|
||||||
|
<section class="workspace-grid">
|
||||||
|
<aside class="panel list-panel">
|
||||||
|
<div class="panel-head"><h2>Concepts</h2></div>
|
||||||
|
<label class="search">
|
||||||
|
<span>Search</span>
|
||||||
|
<input id="concept-search" value="${escapeHtml(state.conceptSearch)}" />
|
||||||
|
</label>
|
||||||
|
<div class="stack">
|
||||||
|
${conceptList.map((item) => `
|
||||||
|
<button class="list-item ${item.concept_id === concept?.concept_id ? "active" : ""}" data-concept-id="${escapeHtml(item.concept_id)}">
|
||||||
|
<strong>${escapeHtml(item.title)}</strong>
|
||||||
|
<span>${escapeHtml(item.status)}</span>
|
||||||
|
</button>
|
||||||
|
`).join("")}
|
||||||
|
</div>
|
||||||
|
</aside>
|
||||||
|
${renderConceptPanel(concept)}
|
||||||
|
</section>
|
||||||
|
|
||||||
|
<section class="workspace-grid">
|
||||||
|
<aside class="panel list-panel">
|
||||||
|
<div class="panel-head"><h2>Citation lane</h2></div>
|
||||||
|
<label class="search">
|
||||||
|
<span>Filter</span>
|
||||||
|
<select id="citation-filter">
|
||||||
|
${["all", "unreviewed", "verified", "needs_source_check", "misleading", "irrelevant", "fabricated"].map((value) => `<option value="${value}"${value === state.citationFilter ? " selected" : ""}>${value}</option>`).join("")}
|
||||||
|
</select>
|
||||||
|
</label>
|
||||||
|
<div class="stack">
|
||||||
|
${citationList.map((item) => `
|
||||||
|
<button class="list-item ${item.citation_review_id === citation?.citation_review_id ? "active" : ""}" data-citation-id="${escapeHtml(item.citation_review_id)}">
|
||||||
|
<strong>${escapeHtml(item.citation_key || item.title || item.citation_review_id)}</strong>
|
||||||
|
<span>${escapeHtml(item.status)}</span>
|
||||||
|
</button>
|
||||||
|
`).join("")}
|
||||||
|
</div>
|
||||||
|
</aside>
|
||||||
|
${renderCitationPanel(citation)}
|
||||||
|
</section>
|
||||||
|
</main>
|
||||||
|
`;
|
||||||
|
|
||||||
|
document.querySelectorAll("[data-concept-id]").forEach((node) => {
|
||||||
|
node.addEventListener("click", () => {
|
||||||
|
state.selectedConceptId = node.getAttribute("data-concept-id");
|
||||||
|
render();
|
||||||
|
});
|
||||||
|
});
|
||||||
|
document.querySelectorAll("[data-citation-id]").forEach((node) => {
|
||||||
|
node.addEventListener("click", () => {
|
||||||
|
state.selectedCitationId = node.getAttribute("data-citation-id");
|
||||||
|
render();
|
||||||
|
});
|
||||||
|
});
|
||||||
|
document.getElementById("concept-search")?.addEventListener("input", (event) => {
|
||||||
|
state.conceptSearch = event.target.value;
|
||||||
|
render();
|
||||||
|
});
|
||||||
|
document.getElementById("citation-filter")?.addEventListener("change", (event) => {
|
||||||
|
state.citationFilter = event.target.value;
|
||||||
|
render();
|
||||||
|
});
|
||||||
|
document.getElementById("concept-form")?.addEventListener("submit", async (event) => {
|
||||||
|
event.preventDefault();
|
||||||
|
await saveConcept(new FormData(event.target));
|
||||||
|
});
|
||||||
|
document.getElementById("citation-form")?.addEventListener("submit", async (event) => {
|
||||||
|
event.preventDefault();
|
||||||
|
await saveCitation(new FormData(event.target));
|
||||||
|
});
|
||||||
|
document.getElementById("verify-citation")?.addEventListener("click", async () => {
|
||||||
|
if (state.selectedCitationId) {
|
||||||
|
await verifyCitation(state.selectedCitationId);
|
||||||
|
}
|
||||||
|
});
|
||||||
|
}
|
||||||
|
|
||||||
|
loadReviewData();
|
||||||
|
|
@ -0,0 +1,13 @@
|
||||||
|
<!DOCTYPE html>
|
||||||
|
<html lang="en">
|
||||||
|
<head>
|
||||||
|
<meta charset="utf-8" />
|
||||||
|
<meta name="viewport" content="width=device-width, initial-scale=1" />
|
||||||
|
<title>GroundRecall Review Workbench</title>
|
||||||
|
<link rel="stylesheet" href="/styles.css" />
|
||||||
|
</head>
|
||||||
|
<body>
|
||||||
|
<div id="app"></div>
|
||||||
|
<script src="/app.js"></script>
|
||||||
|
</body>
|
||||||
|
</html>
|
||||||
|
|
@ -0,0 +1,248 @@
|
||||||
|
:root {
|
||||||
|
--bg: #f4f0e8;
|
||||||
|
--panel: rgba(255, 251, 245, 0.92);
|
||||||
|
--panel-strong: #fffdfa;
|
||||||
|
--ink: #1f2933;
|
||||||
|
--muted: #5f6c76;
|
||||||
|
--accent: #0f766e;
|
||||||
|
--accent-strong: #134e4a;
|
||||||
|
--warn: #9a3412;
|
||||||
|
--line: rgba(31, 41, 51, 0.12);
|
||||||
|
--shadow: 0 20px 60px rgba(31, 41, 51, 0.08);
|
||||||
|
}
|
||||||
|
|
||||||
|
* {
|
||||||
|
box-sizing: border-box;
|
||||||
|
}
|
||||||
|
|
||||||
|
body {
|
||||||
|
margin: 0;
|
||||||
|
color: var(--ink);
|
||||||
|
background:
|
||||||
|
radial-gradient(circle at top right, rgba(15, 118, 110, 0.14), transparent 28%),
|
||||||
|
radial-gradient(circle at top left, rgba(154, 52, 18, 0.08), transparent 24%),
|
||||||
|
linear-gradient(180deg, #fbf8f1 0%, var(--bg) 100%);
|
||||||
|
font-family: "Iowan Old Style", "Palatino Linotype", "Book Antiqua", serif;
|
||||||
|
}
|
||||||
|
|
||||||
|
button,
|
||||||
|
input,
|
||||||
|
select,
|
||||||
|
textarea {
|
||||||
|
font: inherit;
|
||||||
|
}
|
||||||
|
|
||||||
|
.shell {
|
||||||
|
width: min(1500px, calc(100vw - 32px));
|
||||||
|
margin: 24px auto 48px;
|
||||||
|
}
|
||||||
|
|
||||||
|
.hero,
|
||||||
|
.panel {
|
||||||
|
background: var(--panel);
|
||||||
|
border: 1px solid var(--line);
|
||||||
|
border-radius: 20px;
|
||||||
|
box-shadow: var(--shadow);
|
||||||
|
}
|
||||||
|
|
||||||
|
.hero {
|
||||||
|
display: grid;
|
||||||
|
grid-template-columns: 1.6fr 1fr;
|
||||||
|
gap: 20px;
|
||||||
|
padding: 24px;
|
||||||
|
margin-bottom: 18px;
|
||||||
|
}
|
||||||
|
|
||||||
|
.hero h1,
|
||||||
|
.panel h2,
|
||||||
|
.panel h3 {
|
||||||
|
margin: 0 0 8px;
|
||||||
|
font-family: Georgia, "Times New Roman", serif;
|
||||||
|
}
|
||||||
|
|
||||||
|
.hero-stats {
|
||||||
|
display: grid;
|
||||||
|
grid-template-columns: repeat(2, minmax(0, 1fr));
|
||||||
|
gap: 12px;
|
||||||
|
}
|
||||||
|
|
||||||
|
.stat {
|
||||||
|
padding: 14px;
|
||||||
|
border-radius: 16px;
|
||||||
|
background: var(--panel-strong);
|
||||||
|
border: 1px solid var(--line);
|
||||||
|
}
|
||||||
|
|
||||||
|
.stat strong {
|
||||||
|
display: block;
|
||||||
|
font-size: 1.8rem;
|
||||||
|
}
|
||||||
|
|
||||||
|
.stat span,
|
||||||
|
.muted,
|
||||||
|
.tiny,
|
||||||
|
.help {
|
||||||
|
color: var(--muted);
|
||||||
|
}
|
||||||
|
|
||||||
|
.message {
|
||||||
|
margin-top: 10px;
|
||||||
|
color: var(--accent-strong);
|
||||||
|
}
|
||||||
|
|
||||||
|
.workspace-grid {
|
||||||
|
display: grid;
|
||||||
|
grid-template-columns: minmax(280px, 360px) 1fr;
|
||||||
|
gap: 18px;
|
||||||
|
margin-bottom: 18px;
|
||||||
|
}
|
||||||
|
|
||||||
|
.panel {
|
||||||
|
padding: 18px;
|
||||||
|
}
|
||||||
|
|
||||||
|
.panel-head {
|
||||||
|
display: flex;
|
||||||
|
justify-content: space-between;
|
||||||
|
align-items: flex-start;
|
||||||
|
gap: 12px;
|
||||||
|
margin-bottom: 12px;
|
||||||
|
}
|
||||||
|
|
||||||
|
.list-panel {
|
||||||
|
max-height: 78vh;
|
||||||
|
overflow: auto;
|
||||||
|
}
|
||||||
|
|
||||||
|
.search,
|
||||||
|
label {
|
||||||
|
display: grid;
|
||||||
|
gap: 6px;
|
||||||
|
margin-bottom: 12px;
|
||||||
|
}
|
||||||
|
|
||||||
|
.stack {
|
||||||
|
display: grid;
|
||||||
|
gap: 10px;
|
||||||
|
}
|
||||||
|
|
||||||
|
.list-item,
|
||||||
|
.primary,
|
||||||
|
.secondary {
|
||||||
|
border: 1px solid var(--line);
|
||||||
|
border-radius: 14px;
|
||||||
|
background: var(--panel-strong);
|
||||||
|
}
|
||||||
|
|
||||||
|
.list-item {
|
||||||
|
display: grid;
|
||||||
|
gap: 4px;
|
||||||
|
width: 100%;
|
||||||
|
padding: 12px;
|
||||||
|
text-align: left;
|
||||||
|
cursor: pointer;
|
||||||
|
}
|
||||||
|
|
||||||
|
.list-item.active {
|
||||||
|
border-color: rgba(15, 118, 110, 0.45);
|
||||||
|
background: rgba(15, 118, 110, 0.08);
|
||||||
|
}
|
||||||
|
|
||||||
|
input,
|
||||||
|
select,
|
||||||
|
textarea {
|
||||||
|
width: 100%;
|
||||||
|
padding: 10px 12px;
|
||||||
|
border: 1px solid var(--line);
|
||||||
|
border-radius: 12px;
|
||||||
|
background: #fff;
|
||||||
|
}
|
||||||
|
|
||||||
|
textarea {
|
||||||
|
resize: vertical;
|
||||||
|
}
|
||||||
|
|
||||||
|
.actions {
|
||||||
|
display: flex;
|
||||||
|
justify-content: flex-end;
|
||||||
|
}
|
||||||
|
|
||||||
|
.primary {
|
||||||
|
padding: 10px 16px;
|
||||||
|
color: white;
|
||||||
|
background: linear-gradient(135deg, var(--accent) 0%, var(--accent-strong) 100%);
|
||||||
|
cursor: pointer;
|
||||||
|
}
|
||||||
|
|
||||||
|
.secondary {
|
||||||
|
padding: 10px 16px;
|
||||||
|
color: var(--accent-strong);
|
||||||
|
cursor: pointer;
|
||||||
|
}
|
||||||
|
|
||||||
|
.subpanel {
|
||||||
|
margin-top: 16px;
|
||||||
|
padding-top: 14px;
|
||||||
|
border-top: 1px solid var(--line);
|
||||||
|
}
|
||||||
|
|
||||||
|
.claim-card,
|
||||||
|
.support-block {
|
||||||
|
padding: 12px;
|
||||||
|
border-radius: 14px;
|
||||||
|
background: var(--panel-strong);
|
||||||
|
border: 1px solid var(--line);
|
||||||
|
}
|
||||||
|
|
||||||
|
.claim-head {
|
||||||
|
display: flex;
|
||||||
|
justify-content: space-between;
|
||||||
|
gap: 12px;
|
||||||
|
margin-bottom: 8px;
|
||||||
|
}
|
||||||
|
|
||||||
|
.chip,
|
||||||
|
.pill {
|
||||||
|
display: inline-flex;
|
||||||
|
align-items: center;
|
||||||
|
padding: 4px 10px;
|
||||||
|
border-radius: 999px;
|
||||||
|
border: 1px solid var(--line);
|
||||||
|
background: #fff;
|
||||||
|
font-size: 0.85rem;
|
||||||
|
}
|
||||||
|
|
||||||
|
.pill-good {
|
||||||
|
color: var(--accent-strong);
|
||||||
|
}
|
||||||
|
|
||||||
|
.pill-warn {
|
||||||
|
color: var(--warn);
|
||||||
|
}
|
||||||
|
|
||||||
|
ul {
|
||||||
|
margin: 0;
|
||||||
|
padding-left: 20px;
|
||||||
|
}
|
||||||
|
|
||||||
|
.json-block {
|
||||||
|
padding: 12px;
|
||||||
|
overflow: auto;
|
||||||
|
border-radius: 14px;
|
||||||
|
background: #f8f7f3;
|
||||||
|
border: 1px solid var(--line);
|
||||||
|
font-family: "SFMono-Regular", Consolas, "Liberation Mono", monospace;
|
||||||
|
font-size: 0.9rem;
|
||||||
|
white-space: pre-wrap;
|
||||||
|
}
|
||||||
|
|
||||||
|
@media (max-width: 980px) {
|
||||||
|
.hero,
|
||||||
|
.workspace-grid {
|
||||||
|
grid-template-columns: 1fr;
|
||||||
|
}
|
||||||
|
|
||||||
|
.list-panel {
|
||||||
|
max-height: none;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
@ -0,0 +1,439 @@
|
||||||
|
from __future__ import annotations
|
||||||
|
from pathlib import Path
|
||||||
|
import hashlib
|
||||||
|
import json, yaml
|
||||||
|
import re
|
||||||
|
import sys
|
||||||
|
from collections import defaultdict
|
||||||
|
from typing import Any, Callable
|
||||||
|
from .citation_support import bibliography_summary_payload, load_bibliography_index, serialize_bib_entry
|
||||||
|
from .review_schema import CitationReviewEntry, ReviewSession
|
||||||
|
|
||||||
|
def export_review_state_json(session: ReviewSession, path: str | Path) -> None:
|
||||||
|
Path(path).write_text(session.model_dump_json(indent=2), encoding="utf-8")
|
||||||
|
|
||||||
|
def export_promoted_pack(session: ReviewSession, outdir: str | Path) -> None:
|
||||||
|
outdir = Path(outdir)
|
||||||
|
outdir.mkdir(parents=True, exist_ok=True)
|
||||||
|
promoted_pack = dict(session.draft_pack.pack)
|
||||||
|
promoted_pack["version"] = str(promoted_pack.get("version", "0.1.0-draft")).replace("-draft", "-reviewed")
|
||||||
|
promoted_pack["curation"] = {"reviewer": session.reviewer, "ledger_entries": len(session.ledger)}
|
||||||
|
|
||||||
|
concepts = []
|
||||||
|
for concept in session.draft_pack.concepts:
|
||||||
|
if concept.status == "rejected":
|
||||||
|
continue
|
||||||
|
concepts.append({
|
||||||
|
"id": concept.concept_id,
|
||||||
|
"title": concept.title,
|
||||||
|
"description": concept.description,
|
||||||
|
"prerequisites": concept.prerequisites,
|
||||||
|
"mastery_signals": concept.mastery_signals,
|
||||||
|
"status": concept.status,
|
||||||
|
"notes": concept.notes,
|
||||||
|
"mastery_profile": {},
|
||||||
|
})
|
||||||
|
|
||||||
|
(outdir / "pack.yaml").write_text(yaml.safe_dump(promoted_pack, sort_keys=False), encoding="utf-8")
|
||||||
|
(outdir / "concepts.yaml").write_text(yaml.safe_dump({"concepts": concepts}, sort_keys=False), encoding="utf-8")
|
||||||
|
(outdir / "review_ledger.json").write_text(json.dumps(session.model_dump(), indent=2), encoding="utf-8")
|
||||||
|
(outdir / "license_attribution.json").write_text(json.dumps(session.draft_pack.attribution, indent=2), encoding="utf-8")
|
||||||
|
|
||||||
|
|
||||||
|
def export_promoted_pack_to_course_repo(session: ReviewSession, course_repo: str | Path, outdir: str | Path | None = None) -> Path:
|
||||||
|
from .course_repo import resolve_course_repo
|
||||||
|
|
||||||
|
resolved = resolve_course_repo(course_repo)
|
||||||
|
target = Path(outdir) if outdir is not None else Path(resolved.generated_pack_dir or (Path(resolved.repo_root) / "generated" / "pack"))
|
||||||
|
export_promoted_pack(session, target)
|
||||||
|
return target
|
||||||
|
|
||||||
|
|
||||||
|
LATEX_CITE_RE = re.compile(r"\\cite[a-zA-Z*]*(?:\[[^\]]*\])?(?:\[[^\]]*\])?\{([^}]+)\}")
|
||||||
|
|
||||||
|
|
||||||
|
def _read_json(path: Path) -> dict[str, Any]:
|
||||||
|
return json.loads(path.read_text(encoding="utf-8"))
|
||||||
|
|
||||||
|
|
||||||
|
def _read_jsonl(path: Path) -> list[dict[str, Any]]:
|
||||||
|
text = path.read_text(encoding="utf-8").strip()
|
||||||
|
if not text:
|
||||||
|
return []
|
||||||
|
return [json.loads(line) for line in text.splitlines()]
|
||||||
|
|
||||||
|
|
||||||
|
def _status_field_spec() -> dict[str, Any]:
|
||||||
|
return {
|
||||||
|
"field": "status",
|
||||||
|
"label": "Review status",
|
||||||
|
"input": "select",
|
||||||
|
"required": True,
|
||||||
|
"options": [
|
||||||
|
{
|
||||||
|
"value": "trusted",
|
||||||
|
"label": "Trusted",
|
||||||
|
"help": "Promote this concept and its supported claims when the evidence and wording are ready.",
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"value": "provisional",
|
||||||
|
"label": "Provisional",
|
||||||
|
"help": "Keep this concept in reviewed state when it is promising but still needs citation or wording cleanup.",
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"value": "needs_review",
|
||||||
|
"label": "Needs Review",
|
||||||
|
"help": "Leave undecided when support, scope, or concept boundaries are still unclear.",
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"value": "rejected",
|
||||||
|
"label": "Rejected",
|
||||||
|
"help": "Exclude this concept when it is noise, unsupported, duplicated, or misleading.",
|
||||||
|
},
|
||||||
|
],
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
def _text_field_spec(field: str, label: str, help_text: str, *, multiline: bool = False) -> dict[str, Any]:
|
||||||
|
return {
|
||||||
|
"field": field,
|
||||||
|
"label": label,
|
||||||
|
"input": "textarea" if multiline else "text",
|
||||||
|
"required": False,
|
||||||
|
"help": help_text,
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
def _citation_status_field_spec() -> dict[str, Any]:
|
||||||
|
return {
|
||||||
|
"field": "status",
|
||||||
|
"label": "Citation review status",
|
||||||
|
"input": "select",
|
||||||
|
"required": True,
|
||||||
|
"options": [
|
||||||
|
{
|
||||||
|
"value": "unreviewed",
|
||||||
|
"label": "Unreviewed",
|
||||||
|
"help": "Keep this citation candidate in triage until fit and existence are checked.",
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"value": "verified",
|
||||||
|
"label": "Verified",
|
||||||
|
"help": "The cited work exists and materially supports the associated manuscript claim.",
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"value": "needs_source_check",
|
||||||
|
"label": "Needs Source Check",
|
||||||
|
"help": "The citation may be useful but still needs direct source inspection or metadata cleanup.",
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"value": "misleading",
|
||||||
|
"label": "Misleading",
|
||||||
|
"help": "The citation exists but overstates, contradicts, or poorly fits the claim.",
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"value": "irrelevant",
|
||||||
|
"label": "Irrelevant",
|
||||||
|
"help": "The citation does not materially support the concept or claim under review.",
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"value": "fabricated",
|
||||||
|
"label": "Fabricated",
|
||||||
|
"help": "The citation appears invented, malformed, or otherwise not real.",
|
||||||
|
},
|
||||||
|
],
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
def _load_citegeist_extract() -> tuple[Callable[[str], list[Any]] | None, list[str]]:
|
||||||
|
citegeist_src = Path("/home/netuser/bin/CiteGeist/src")
|
||||||
|
if citegeist_src.exists():
|
||||||
|
sys.path.insert(0, str(citegeist_src))
|
||||||
|
try:
|
||||||
|
from citegeist import available_extraction_backends, extract_references # type: ignore
|
||||||
|
except Exception:
|
||||||
|
return None, []
|
||||||
|
return extract_references, list(available_extraction_backends())
|
||||||
|
|
||||||
|
|
||||||
|
def _extract_citation_keys(text: str) -> list[str]:
|
||||||
|
keys: list[str] = []
|
||||||
|
for raw_group in LATEX_CITE_RE.findall(text):
|
||||||
|
keys.extend(part.strip() for part in raw_group.split(",") if part.strip())
|
||||||
|
return sorted(set(keys))
|
||||||
|
|
||||||
|
|
||||||
|
def _artifact_citation_payloads(
|
||||||
|
artifacts: list[dict[str, Any]],
|
||||||
|
*,
|
||||||
|
source_root: str,
|
||||||
|
) -> tuple[list[dict[str, Any]], dict[str, dict[str, Any]]]:
|
||||||
|
extract_references, backends = _load_citegeist_extract()
|
||||||
|
artifact_payloads: list[dict[str, Any]] = []
|
||||||
|
summaries: dict[str, dict[str, Any]] = {}
|
||||||
|
root = Path(source_root) if source_root else None
|
||||||
|
bibliography_index = load_bibliography_index(source_root) if source_root else {}
|
||||||
|
|
||||||
|
for artifact in artifacts:
|
||||||
|
path = Path(source_root) / artifact["path"] if root is not None else None
|
||||||
|
raw_text = ""
|
||||||
|
if path is not None and path.exists():
|
||||||
|
try:
|
||||||
|
raw_text = path.read_text(encoding="utf-8")
|
||||||
|
except UnicodeDecodeError:
|
||||||
|
raw_text = ""
|
||||||
|
citation_keys = _extract_citation_keys(raw_text) if raw_text else []
|
||||||
|
extracted_refs: list[dict[str, Any]] = []
|
||||||
|
if extract_references is not None and raw_text:
|
||||||
|
try:
|
||||||
|
for entry in extract_references(raw_text):
|
||||||
|
extracted_refs.append(
|
||||||
|
{
|
||||||
|
"citation_key": "",
|
||||||
|
"entry_type": entry.entry_type,
|
||||||
|
"title": entry.fields.get("title", ""),
|
||||||
|
"author": entry.fields.get("author", ""),
|
||||||
|
"year": entry.fields.get("year", ""),
|
||||||
|
"venue": entry.fields.get("journal", "") or entry.fields.get("booktitle", ""),
|
||||||
|
}
|
||||||
|
)
|
||||||
|
except Exception:
|
||||||
|
extracted_refs = []
|
||||||
|
|
||||||
|
payload = {
|
||||||
|
"artifact_id": artifact["artifact_id"],
|
||||||
|
"path": artifact["path"],
|
||||||
|
"title": artifact.get("title", ""),
|
||||||
|
"citation_keys": citation_keys,
|
||||||
|
"resolved_entries": [serialize_bib_entry(bibliography_index.get(key)) for key in citation_keys if bibliography_index.get(key)],
|
||||||
|
"citation_key_count": len(citation_keys),
|
||||||
|
"extracted_references": extracted_refs[:12],
|
||||||
|
"extracted_reference_count": len(extracted_refs),
|
||||||
|
"citegeist_backends": backends,
|
||||||
|
}
|
||||||
|
artifact_payloads.append(payload)
|
||||||
|
summaries[artifact["artifact_id"]] = {
|
||||||
|
"citation_key_count": len(citation_keys),
|
||||||
|
"extracted_reference_count": len(extracted_refs),
|
||||||
|
"has_citation_support": bool(citation_keys or extracted_refs),
|
||||||
|
}
|
||||||
|
return artifact_payloads, summaries
|
||||||
|
|
||||||
|
|
||||||
|
def build_citation_review_entries_from_import(import_dir: str | Path) -> list[CitationReviewEntry]:
|
||||||
|
base = Path(import_dir)
|
||||||
|
manifest = _read_json(base / "manifest.json")
|
||||||
|
artifacts = _read_jsonl(base / "artifacts.jsonl")
|
||||||
|
observations = _read_jsonl(base / "observations.jsonl")
|
||||||
|
claims = _read_jsonl(base / "claims.jsonl")
|
||||||
|
bibliography_index = load_bibliography_index(manifest.get("source_root", ""))
|
||||||
|
|
||||||
|
artifact_payloads, _ = _artifact_citation_payloads(
|
||||||
|
artifacts,
|
||||||
|
source_root=manifest.get("source_root", ""),
|
||||||
|
)
|
||||||
|
observations_by_id = {item["observation_id"]: item for item in observations}
|
||||||
|
artifact_claim_links: dict[str, dict[str, set[str]]] = defaultdict(lambda: {"claim_ids": set(), "concept_ids": set()})
|
||||||
|
|
||||||
|
for claim in claims:
|
||||||
|
artifact_ids = {
|
||||||
|
observations_by_id[item]["artifact_id"]
|
||||||
|
for item in claim.get("source_observation_ids", [])
|
||||||
|
if item in observations_by_id and observations_by_id[item].get("artifact_id")
|
||||||
|
}
|
||||||
|
for artifact_id in artifact_ids:
|
||||||
|
artifact_claim_links[artifact_id]["claim_ids"].add(claim["claim_id"])
|
||||||
|
artifact_claim_links[artifact_id]["concept_ids"].update(
|
||||||
|
concept_id.replace("concept::", "", 1) for concept_id in claim.get("concept_ids", [])
|
||||||
|
)
|
||||||
|
|
||||||
|
entries: list[CitationReviewEntry] = []
|
||||||
|
for artifact in artifact_payloads:
|
||||||
|
link_payload = artifact_claim_links.get(artifact["artifact_id"], {"claim_ids": set(), "concept_ids": set()})
|
||||||
|
for citation_key in artifact.get("citation_keys", []):
|
||||||
|
digest = hashlib.sha1(f"{artifact['artifact_id']}|key|{citation_key}".encode("utf-8")).hexdigest()[:12]
|
||||||
|
bib_entry = bibliography_index.get(citation_key, {})
|
||||||
|
fields = bib_entry.get("fields", {})
|
||||||
|
entries.append(
|
||||||
|
CitationReviewEntry(
|
||||||
|
citation_review_id=f"citrev-{digest}",
|
||||||
|
artifact_id=artifact["artifact_id"],
|
||||||
|
artifact_path=artifact.get("path", ""),
|
||||||
|
artifact_title=artifact.get("title", ""),
|
||||||
|
source_kind="citation_key",
|
||||||
|
locator=artifact.get("path", ""),
|
||||||
|
citation_key=citation_key,
|
||||||
|
title=str(fields.get("title", "")),
|
||||||
|
author=str(fields.get("author", "")),
|
||||||
|
year=str(fields.get("year", "")),
|
||||||
|
venue=str(fields.get("journal", "") or fields.get("booktitle", "") or fields.get("publisher", "")),
|
||||||
|
source_bib_path=str(bib_entry.get("source_bib_path", "")),
|
||||||
|
raw_bibtex=str(bib_entry.get("raw_bibtex", "")),
|
||||||
|
related_concept_ids=sorted(link_payload["concept_ids"]),
|
||||||
|
related_claim_ids=sorted(link_payload["claim_ids"]),
|
||||||
|
)
|
||||||
|
)
|
||||||
|
for index, reference in enumerate(artifact.get("extracted_references", []), start=1):
|
||||||
|
digest = hashlib.sha1(
|
||||||
|
f"{artifact['artifact_id']}|ref|{reference.get('title', '')}|{reference.get('author', '')}|{index}".encode("utf-8")
|
||||||
|
).hexdigest()[:12]
|
||||||
|
entries.append(
|
||||||
|
CitationReviewEntry(
|
||||||
|
citation_review_id=f"citrev-{digest}",
|
||||||
|
artifact_id=artifact["artifact_id"],
|
||||||
|
artifact_path=artifact.get("path", ""),
|
||||||
|
artifact_title=artifact.get("title", ""),
|
||||||
|
source_kind="extracted_reference",
|
||||||
|
locator=f"{artifact.get('path', '')}#ref-{index}",
|
||||||
|
citation_key="",
|
||||||
|
title=reference.get("title", ""),
|
||||||
|
author=reference.get("author", ""),
|
||||||
|
year=reference.get("year", ""),
|
||||||
|
venue=reference.get("venue", ""),
|
||||||
|
related_concept_ids=sorted(link_payload["concept_ids"]),
|
||||||
|
related_claim_ids=sorted(link_payload["claim_ids"]),
|
||||||
|
)
|
||||||
|
)
|
||||||
|
return entries
|
||||||
|
|
||||||
|
|
||||||
|
def _build_import_review_payload(session: ReviewSession, import_dir: Path) -> dict[str, Any]:
|
||||||
|
manifest = _read_json(import_dir / "manifest.json")
|
||||||
|
lint_payload = _read_json(import_dir / "lint_findings.json")
|
||||||
|
queue_payload = _read_json(import_dir / "review_queue.json")
|
||||||
|
artifacts = _read_jsonl(import_dir / "artifacts.jsonl")
|
||||||
|
observations = _read_jsonl(import_dir / "observations.jsonl")
|
||||||
|
claims = _read_jsonl(import_dir / "claims.jsonl")
|
||||||
|
|
||||||
|
observations_by_id = {item["observation_id"]: item for item in observations}
|
||||||
|
claims_by_concept: dict[str, list[dict[str, Any]]] = defaultdict(list)
|
||||||
|
findings_by_target: dict[str, list[dict[str, Any]]] = defaultdict(list)
|
||||||
|
for finding in lint_payload.get("findings", []):
|
||||||
|
findings_by_target[finding["target_id"]].append(finding)
|
||||||
|
for claim in claims:
|
||||||
|
for concept_id in claim.get("concept_ids", []):
|
||||||
|
claims_by_concept[concept_id].append(claim)
|
||||||
|
|
||||||
|
artifact_citations, artifact_citation_summary = _artifact_citation_payloads(
|
||||||
|
artifacts,
|
||||||
|
source_root=manifest.get("source_root", ""),
|
||||||
|
)
|
||||||
|
artifact_by_id = {item["artifact_id"]: item for item in artifacts}
|
||||||
|
|
||||||
|
concept_reviews: list[dict[str, Any]] = []
|
||||||
|
for concept in session.draft_pack.concepts:
|
||||||
|
full_concept_id = f"concept::{concept.concept_id}" if not concept.concept_id.startswith("concept::") else concept.concept_id
|
||||||
|
concept_claims = claims_by_concept.get(full_concept_id, [])
|
||||||
|
claim_payloads: list[dict[str, Any]] = []
|
||||||
|
has_citation_support = False
|
||||||
|
for claim in concept_claims[:25]:
|
||||||
|
supporting_observations = [observations_by_id[item] for item in claim.get("source_observation_ids", []) if item in observations_by_id]
|
||||||
|
artifact_ids = {item["artifact_id"] for item in supporting_observations}
|
||||||
|
citation_support = [artifact_citation_summary.get(artifact_id, {}) for artifact_id in artifact_ids]
|
||||||
|
has_citation_support = has_citation_support or any(item.get("has_citation_support") for item in citation_support)
|
||||||
|
claim_payloads.append(
|
||||||
|
{
|
||||||
|
"claim_id": claim["claim_id"],
|
||||||
|
"claim_text": claim.get("claim_text", ""),
|
||||||
|
"claim_kind": claim.get("claim_kind", ""),
|
||||||
|
"grounding_status": claim.get("grounding_status", "unknown"),
|
||||||
|
"supporting_observations": [
|
||||||
|
{
|
||||||
|
"observation_id": obs["observation_id"],
|
||||||
|
"origin_path": obs.get("origin_path", ""),
|
||||||
|
"origin_section": obs.get("origin_section", ""),
|
||||||
|
"text": obs.get("text", ""),
|
||||||
|
"line_start": obs.get("line_start", 0),
|
||||||
|
"line_end": obs.get("line_end", 0),
|
||||||
|
}
|
||||||
|
for obs in supporting_observations
|
||||||
|
],
|
||||||
|
"citation_support": citation_support,
|
||||||
|
"artifact_paths": [artifact_by_id[item]["path"] for item in artifact_ids if item in artifact_by_id],
|
||||||
|
"finding_messages": [item["message"] for item in findings_by_target.get(claim["claim_id"], [])],
|
||||||
|
}
|
||||||
|
)
|
||||||
|
|
||||||
|
concept_reviews.append(
|
||||||
|
{
|
||||||
|
"concept_id": concept.concept_id,
|
||||||
|
"title": concept.title,
|
||||||
|
"status": concept.status,
|
||||||
|
"description": concept.description,
|
||||||
|
"review_help": (
|
||||||
|
"Prefer `trusted` when claims are coherent and citation-bearing support is appropriate; "
|
||||||
|
"prefer `provisional` when the concept is plausible but still needs citation or wording cleanup."
|
||||||
|
),
|
||||||
|
"claim_count": len(concept_claims),
|
||||||
|
"grounded_claim_count": sum(1 for item in concept_claims if item.get("grounding_status") == "grounded"),
|
||||||
|
"warning_count": len(findings_by_target.get(full_concept_id, [])),
|
||||||
|
"has_citation_support": has_citation_support,
|
||||||
|
"top_claims": claim_payloads,
|
||||||
|
"notes": list(concept.notes),
|
||||||
|
}
|
||||||
|
)
|
||||||
|
|
||||||
|
return {
|
||||||
|
"import_context": {
|
||||||
|
"manifest": manifest,
|
||||||
|
"lint_summary": lint_payload.get("summary", {}),
|
||||||
|
"queue_length": queue_payload.get("queue_length", 0),
|
||||||
|
"source_adapter": manifest.get("source_adapter", ""),
|
||||||
|
},
|
||||||
|
"review_guidance": {
|
||||||
|
"overview": (
|
||||||
|
"Review concepts first, then inspect representative claims and their source observations before promotion."
|
||||||
|
),
|
||||||
|
"priorities": [
|
||||||
|
"Focus reviewer effort on concepts with strong grounded claims and explicit citations first.",
|
||||||
|
"Downgrade or reject concepts whose claims are fragmented, duplicated, or missing meaningful support.",
|
||||||
|
"For academic material, citation-bearing claims deserve special scrutiny for fit, contradiction, and fabrication risk.",
|
||||||
|
],
|
||||||
|
"citation_guidance": [
|
||||||
|
"A citation key or extracted reference is evidence of traceability, not correctness.",
|
||||||
|
"Check whether the cited work actually supports the claim and whether the claim overstates it.",
|
||||||
|
"Use the citation track to prioritize claims that can move into a separate citation-ingestion workflow.",
|
||||||
|
],
|
||||||
|
},
|
||||||
|
"field_specs": [
|
||||||
|
_status_field_spec(),
|
||||||
|
_text_field_spec("description", "Concept description", "Refine the concept summary to match the strongest supported interpretation."),
|
||||||
|
_text_field_spec("notes", "Reviewer notes", "Record why this concept is trusted, provisional, rejected, or still unclear.", multiline=True),
|
||||||
|
_text_field_spec("prerequisites", "Prerequisites", "List prerequisite concepts only when the manuscript support is explicit or defensible.", multiline=True),
|
||||||
|
],
|
||||||
|
"citation_field_specs": [
|
||||||
|
_citation_status_field_spec(),
|
||||||
|
_text_field_spec("notes", "Citation notes", "Record whether the cited work exists, fits the claim, or should move into a dedicated citation-ingestion lane.", multiline=True),
|
||||||
|
],
|
||||||
|
"concept_reviews": concept_reviews,
|
||||||
|
"citation_reviews": [entry.model_dump() for entry in session.citation_reviews],
|
||||||
|
"bibliography": bibliography_summary_payload(manifest.get("source_root", "")),
|
||||||
|
"citations": {
|
||||||
|
"enabled": True,
|
||||||
|
"provider": "citegeist" if artifact_citations and artifact_citations[0].get("citegeist_backends") else "none",
|
||||||
|
"artifacts": artifact_citations,
|
||||||
|
"summary": {
|
||||||
|
"artifact_count_with_citations": sum(1 for item in artifact_citations if item["citation_key_count"] or item["extracted_reference_count"]),
|
||||||
|
"citation_key_total": sum(item["citation_key_count"] for item in artifact_citations),
|
||||||
|
"extracted_reference_total": sum(item["extracted_reference_count"] for item in artifact_citations),
|
||||||
|
},
|
||||||
|
"next_actions": [
|
||||||
|
"Promote citation-bearing claims into a dedicated citation review lane.",
|
||||||
|
"Use CiteGeist extraction as a first pass, then verify support and metadata before trusting the citation.",
|
||||||
|
],
|
||||||
|
},
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
def export_review_ui_data(session: ReviewSession, outdir: str | Path, import_dir: str | Path | None = None) -> None:
|
||||||
|
outdir = Path(outdir)
|
||||||
|
outdir.mkdir(parents=True, exist_ok=True)
|
||||||
|
payload = {
|
||||||
|
"reviewer": session.reviewer,
|
||||||
|
"draft_pack": session.draft_pack.model_dump(),
|
||||||
|
"citation_reviews": [entry.model_dump() for entry in session.citation_reviews],
|
||||||
|
"ledger": [entry.model_dump() for entry in session.ledger],
|
||||||
|
}
|
||||||
|
if import_dir is not None:
|
||||||
|
payload.update(_build_import_review_payload(session, Path(import_dir)))
|
||||||
|
(outdir / "review_data.json").write_text(json.dumps(payload, indent=2), encoding="utf-8")
|
||||||
|
|
@ -0,0 +1,80 @@
|
||||||
|
from __future__ import annotations
|
||||||
|
from pydantic import BaseModel, Field
|
||||||
|
from typing import Literal
|
||||||
|
|
||||||
|
TrustStatus = Literal["trusted", "provisional", "rejected", "needs_review"]
|
||||||
|
CitationStatus = Literal["unreviewed", "verified", "needs_source_check", "misleading", "irrelevant", "fabricated"]
|
||||||
|
|
||||||
|
class ConceptReviewEntry(BaseModel):
|
||||||
|
concept_id: str
|
||||||
|
title: str
|
||||||
|
description: str = ""
|
||||||
|
prerequisites: list[str] = Field(default_factory=list)
|
||||||
|
mastery_signals: list[str] = Field(default_factory=list)
|
||||||
|
status: TrustStatus = "needs_review"
|
||||||
|
notes: list[str] = Field(default_factory=list)
|
||||||
|
|
||||||
|
|
||||||
|
class CitationReviewEntry(BaseModel):
|
||||||
|
citation_review_id: str
|
||||||
|
artifact_id: str
|
||||||
|
artifact_path: str = ""
|
||||||
|
artifact_title: str = ""
|
||||||
|
source_kind: Literal["citation_key", "extracted_reference"] = "citation_key"
|
||||||
|
locator: str = ""
|
||||||
|
citation_key: str = ""
|
||||||
|
title: str = ""
|
||||||
|
author: str = ""
|
||||||
|
year: str = ""
|
||||||
|
venue: str = ""
|
||||||
|
source_bib_path: str = ""
|
||||||
|
raw_bibtex: str = ""
|
||||||
|
status: CitationStatus = "unreviewed"
|
||||||
|
notes: list[str] = Field(default_factory=list)
|
||||||
|
related_concept_ids: list[str] = Field(default_factory=list)
|
||||||
|
related_claim_ids: list[str] = Field(default_factory=list)
|
||||||
|
|
||||||
|
class DraftPackData(BaseModel):
|
||||||
|
pack: dict = Field(default_factory=dict)
|
||||||
|
concepts: list[ConceptReviewEntry] = Field(default_factory=list)
|
||||||
|
conflicts: list[str] = Field(default_factory=list)
|
||||||
|
review_flags: list[str] = Field(default_factory=list)
|
||||||
|
attribution: dict = Field(default_factory=dict)
|
||||||
|
|
||||||
|
class ReviewAction(BaseModel):
|
||||||
|
action_type: str
|
||||||
|
target: str = ""
|
||||||
|
payload: dict = Field(default_factory=dict)
|
||||||
|
rationale: str = ""
|
||||||
|
|
||||||
|
class ReviewLedgerEntry(BaseModel):
|
||||||
|
reviewer: str
|
||||||
|
action: ReviewAction
|
||||||
|
|
||||||
|
class ReviewSession(BaseModel):
|
||||||
|
reviewer: str
|
||||||
|
draft_pack: DraftPackData
|
||||||
|
citation_reviews: list[CitationReviewEntry] = Field(default_factory=list)
|
||||||
|
ledger: list[ReviewLedgerEntry] = Field(default_factory=list)
|
||||||
|
|
||||||
|
class WorkspaceMeta(BaseModel):
|
||||||
|
workspace_id: str
|
||||||
|
title: str
|
||||||
|
path: str
|
||||||
|
created_at: str
|
||||||
|
last_opened_at: str
|
||||||
|
notes: str = ""
|
||||||
|
|
||||||
|
class WorkspaceRegistry(BaseModel):
|
||||||
|
workspaces: list[WorkspaceMeta] = Field(default_factory=list)
|
||||||
|
recent_workspace_ids: list[str] = Field(default_factory=list)
|
||||||
|
|
||||||
|
class ImportPreview(BaseModel):
|
||||||
|
ok: bool = False
|
||||||
|
source_dir: str
|
||||||
|
workspace_id: str
|
||||||
|
overwrite_required: bool = False
|
||||||
|
errors: list[str] = Field(default_factory=list)
|
||||||
|
warnings: list[str] = Field(default_factory=list)
|
||||||
|
summary: dict = Field(default_factory=dict)
|
||||||
|
semantic_warnings: list[str] = Field(default_factory=list)
|
||||||
|
|
@ -0,0 +1,246 @@
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import argparse
|
||||||
|
import json
|
||||||
|
import mimetypes
|
||||||
|
from http.server import BaseHTTPRequestHandler, HTTPServer
|
||||||
|
from pathlib import Path
|
||||||
|
from urllib.parse import parse_qs, urlparse
|
||||||
|
|
||||||
|
from .citation_support import materialize_citegeist_store
|
||||||
|
from .promotion import promote_import_to_store
|
||||||
|
from .review_workspace import GroundRecallReviewWorkspace
|
||||||
|
|
||||||
|
|
||||||
|
def _json_response(handler: BaseHTTPRequestHandler, status: int, payload: dict) -> None:
|
||||||
|
body = json.dumps(payload, indent=2).encode("utf-8")
|
||||||
|
handler.send_response(status)
|
||||||
|
handler.send_header("Content-Type", "application/json")
|
||||||
|
handler.send_header("Content-Length", str(len(body)))
|
||||||
|
handler.send_header("Access-Control-Allow-Origin", "*")
|
||||||
|
handler.send_header("Access-Control-Allow-Methods", "GET,POST,OPTIONS")
|
||||||
|
handler.send_header("Access-Control-Allow-Headers", "Content-Type")
|
||||||
|
handler.end_headers()
|
||||||
|
handler.wfile.write(body)
|
||||||
|
|
||||||
|
|
||||||
|
def _serve_static(handler: BaseHTTPRequestHandler, asset_path: Path) -> None:
|
||||||
|
if not asset_path.exists():
|
||||||
|
_json_response(handler, 404, {"error": "asset not found"})
|
||||||
|
return
|
||||||
|
body = asset_path.read_bytes()
|
||||||
|
handler.send_response(200)
|
||||||
|
handler.send_header("Content-Type", mimetypes.guess_type(str(asset_path))[0] or "application/octet-stream")
|
||||||
|
handler.send_header("Content-Length", str(len(body)))
|
||||||
|
handler.end_headers()
|
||||||
|
handler.wfile.write(body)
|
||||||
|
|
||||||
|
|
||||||
|
def _safe_show_entry(api: object, citation_key: str) -> dict | None:
|
||||||
|
if not citation_key:
|
||||||
|
return None
|
||||||
|
try:
|
||||||
|
return api.show_entry( # type: ignore[attr-defined]
|
||||||
|
citation_key,
|
||||||
|
include_provenance=True,
|
||||||
|
include_conflicts=True,
|
||||||
|
include_bibtex=True,
|
||||||
|
)
|
||||||
|
except AttributeError:
|
||||||
|
pass
|
||||||
|
|
||||||
|
store = getattr(api, "store", None)
|
||||||
|
if store is None:
|
||||||
|
return None
|
||||||
|
entry = store.get_entry(citation_key)
|
||||||
|
if entry is None:
|
||||||
|
return None
|
||||||
|
payload = dict(entry)
|
||||||
|
if hasattr(store, "get_field_provenance"):
|
||||||
|
try:
|
||||||
|
payload["provenance"] = store.get_field_provenance(citation_key)
|
||||||
|
except Exception:
|
||||||
|
payload["provenance"] = []
|
||||||
|
if hasattr(store, "get_conflicts"):
|
||||||
|
try:
|
||||||
|
payload["conflicts"] = store.get_conflicts(citation_key)
|
||||||
|
except Exception:
|
||||||
|
payload["conflicts"] = []
|
||||||
|
else:
|
||||||
|
payload["conflicts"] = []
|
||||||
|
if hasattr(store, "get_entry_bibtex"):
|
||||||
|
try:
|
||||||
|
payload["bibtex"] = store.get_entry_bibtex(citation_key)
|
||||||
|
except Exception:
|
||||||
|
payload["bibtex"] = None
|
||||||
|
return payload
|
||||||
|
|
||||||
|
|
||||||
|
def _safe_verify_entry(api: object, entry: object, *, context: str, limit: int) -> dict:
|
||||||
|
if getattr(entry, "raw_bibtex", ""):
|
||||||
|
try:
|
||||||
|
return api.verify_bibtex(entry.raw_bibtex, context=context, limit=limit) # type: ignore[attr-defined]
|
||||||
|
except Exception:
|
||||||
|
pass
|
||||||
|
values = [item for item in [getattr(entry, "citation_key", ""), getattr(entry, "title", ""), getattr(entry, "author", ""), getattr(entry, "year", "")] if item]
|
||||||
|
try:
|
||||||
|
return api.verify_strings(values, context=context, limit=limit) # type: ignore[attr-defined]
|
||||||
|
except Exception as exc:
|
||||||
|
return {
|
||||||
|
"context": context,
|
||||||
|
"results": [],
|
||||||
|
"error": str(exc),
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
class GroundRecallReviewHandler(BaseHTTPRequestHandler):
|
||||||
|
workspace: GroundRecallReviewWorkspace
|
||||||
|
default_store_dir: str | None = None
|
||||||
|
citegeist_bundle: dict | None = None
|
||||||
|
|
||||||
|
def do_OPTIONS(self) -> None:
|
||||||
|
_json_response(self, 200, {"ok": True})
|
||||||
|
|
||||||
|
def do_GET(self) -> None:
|
||||||
|
parsed = urlparse(self.path)
|
||||||
|
if parsed.path == "/api/healthz":
|
||||||
|
_json_response(self, 200, {"ok": True})
|
||||||
|
return
|
||||||
|
if parsed.path == "/api/load":
|
||||||
|
review_data = self.workspace.load_review_data()
|
||||||
|
review_data["citegeist"] = {
|
||||||
|
"enabled": bool(self.citegeist_bundle and self.citegeist_bundle.get("available")),
|
||||||
|
"db_path": self.citegeist_bundle.get("db_path") if self.citegeist_bundle else "",
|
||||||
|
"ingested_files": self.citegeist_bundle.get("ingested_files", []) if self.citegeist_bundle else [],
|
||||||
|
"show_entry_endpoint": "/api/citations/show-entry",
|
||||||
|
"verify_endpoint": "/api/citations/verify",
|
||||||
|
}
|
||||||
|
_json_response(
|
||||||
|
self,
|
||||||
|
200,
|
||||||
|
{
|
||||||
|
"ok": True,
|
||||||
|
"import_dir": str(self.workspace.import_dir),
|
||||||
|
"review_data": review_data,
|
||||||
|
},
|
||||||
|
)
|
||||||
|
return
|
||||||
|
if parsed.path == "/api/citations/show-entry":
|
||||||
|
if not self.citegeist_bundle or not self.citegeist_bundle.get("available"):
|
||||||
|
_json_response(self, 404, {"ok": False, "error": "citegeist unavailable"})
|
||||||
|
return
|
||||||
|
citation_key = parse_qs(parsed.query).get("citation_key", [""])[0]
|
||||||
|
if not citation_key:
|
||||||
|
_json_response(self, 400, {"ok": False, "error": "citation_key is required"})
|
||||||
|
return
|
||||||
|
payload = _safe_show_entry(self.citegeist_bundle["api"], citation_key)
|
||||||
|
_json_response(self, 200, {"ok": payload is not None, "entry": payload})
|
||||||
|
return
|
||||||
|
|
||||||
|
asset_root = Path(__file__).with_name("review_app")
|
||||||
|
if parsed.path in {"/", "/index.html"}:
|
||||||
|
_serve_static(self, asset_root / "index.html")
|
||||||
|
return
|
||||||
|
if parsed.path == "/app.js":
|
||||||
|
_serve_static(self, asset_root / "app.js")
|
||||||
|
return
|
||||||
|
if parsed.path == "/styles.css":
|
||||||
|
_serve_static(self, asset_root / "styles.css")
|
||||||
|
return
|
||||||
|
_json_response(self, 404, {"error": "not found"})
|
||||||
|
|
||||||
|
def do_POST(self) -> None:
|
||||||
|
parsed = urlparse(self.path)
|
||||||
|
length = int(self.headers.get("Content-Length", "0"))
|
||||||
|
raw = self.rfile.read(length) if length else b"{}"
|
||||||
|
payload = json.loads(raw.decode("utf-8") or "{}")
|
||||||
|
|
||||||
|
if parsed.path == "/api/save":
|
||||||
|
self.workspace.apply_updates(
|
||||||
|
concept_updates=payload.get("concept_updates"),
|
||||||
|
citation_updates=payload.get("citation_updates"),
|
||||||
|
reviewer=payload.get("reviewer"),
|
||||||
|
)
|
||||||
|
_json_response(
|
||||||
|
self,
|
||||||
|
200,
|
||||||
|
{
|
||||||
|
"ok": True,
|
||||||
|
"import_dir": str(self.workspace.import_dir),
|
||||||
|
"review_data": self.workspace.load_review_data(),
|
||||||
|
},
|
||||||
|
)
|
||||||
|
return
|
||||||
|
|
||||||
|
if parsed.path == "/api/promote":
|
||||||
|
store_dir = payload.get("store_dir") or self.default_store_dir
|
||||||
|
if not store_dir:
|
||||||
|
_json_response(self, 400, {"ok": False, "error": "store_dir is required"})
|
||||||
|
return
|
||||||
|
result = promote_import_to_store(
|
||||||
|
import_dir=self.workspace.import_dir,
|
||||||
|
store_dir=store_dir,
|
||||||
|
reviewer=payload.get("reviewer"),
|
||||||
|
snapshot_id=payload.get("snapshot_id"),
|
||||||
|
)
|
||||||
|
_json_response(self, 200, {"ok": True, "promotion": result})
|
||||||
|
return
|
||||||
|
if parsed.path == "/api/citations/verify":
|
||||||
|
if not self.citegeist_bundle or not self.citegeist_bundle.get("available"):
|
||||||
|
_json_response(self, 404, {"ok": False, "error": "citegeist unavailable"})
|
||||||
|
return
|
||||||
|
citation_review_id = str(payload.get("citation_review_id") or "").strip()
|
||||||
|
if not citation_review_id:
|
||||||
|
_json_response(self, 400, {"ok": False, "error": "citation_review_id is required"})
|
||||||
|
return
|
||||||
|
session = self.workspace.load_session()
|
||||||
|
entry = next((item for item in session.citation_reviews if item.citation_review_id == citation_review_id), None)
|
||||||
|
if entry is None:
|
||||||
|
_json_response(self, 404, {"ok": False, "error": "citation review entry not found"})
|
||||||
|
return
|
||||||
|
api = self.citegeist_bundle["api"]
|
||||||
|
show_entry_payload = _safe_show_entry(api, entry.citation_key) if entry.citation_key else None
|
||||||
|
context = f"{entry.artifact_path} {entry.artifact_title}".strip()
|
||||||
|
verification = _safe_verify_entry(api, entry, context=context, limit=int(payload.get("limit", 5)))
|
||||||
|
_json_response(
|
||||||
|
self,
|
||||||
|
200,
|
||||||
|
{
|
||||||
|
"ok": True,
|
||||||
|
"citation_review_id": citation_review_id,
|
||||||
|
"entry": show_entry_payload,
|
||||||
|
"verification": verification,
|
||||||
|
},
|
||||||
|
)
|
||||||
|
return
|
||||||
|
|
||||||
|
_json_response(self, 404, {"error": "not found"})
|
||||||
|
|
||||||
|
|
||||||
|
def build_parser() -> argparse.ArgumentParser:
|
||||||
|
parser = argparse.ArgumentParser(description="GroundRecall local review server")
|
||||||
|
parser.add_argument("import_dir")
|
||||||
|
parser.add_argument("--host", default="127.0.0.1")
|
||||||
|
parser.add_argument("--port", type=int, default=8766)
|
||||||
|
parser.add_argument("--reviewer", default="GroundRecall Import")
|
||||||
|
parser.add_argument("--store-dir", default=None)
|
||||||
|
return parser
|
||||||
|
|
||||||
|
|
||||||
|
def main() -> None:
|
||||||
|
args = build_parser().parse_args()
|
||||||
|
GroundRecallReviewHandler.workspace = GroundRecallReviewWorkspace(args.import_dir, reviewer=args.reviewer)
|
||||||
|
GroundRecallReviewHandler.default_store_dir = args.store_dir
|
||||||
|
GroundRecallReviewHandler.workspace.ensure_review_bundle()
|
||||||
|
session = GroundRecallReviewHandler.workspace.load_session()
|
||||||
|
GroundRecallReviewHandler.citegeist_bundle = materialize_citegeist_store(
|
||||||
|
args.import_dir,
|
||||||
|
session.draft_pack.pack.get("source_root", ""),
|
||||||
|
)
|
||||||
|
server = HTTPServer((args.host, args.port), GroundRecallReviewHandler)
|
||||||
|
print(f"GroundRecall review server listening on http://{args.host}:{args.port}")
|
||||||
|
server.serve_forever()
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
main()
|
||||||
|
|
@ -0,0 +1,126 @@
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import json
|
||||||
|
from pathlib import Path
|
||||||
|
from typing import Any
|
||||||
|
|
||||||
|
from .groundrecall_review_bridge import export_review_bundle_from_import
|
||||||
|
from .review_export import build_citation_review_entries_from_import, export_review_state_json, export_review_ui_data
|
||||||
|
from .review_schema import ReviewAction, ReviewLedgerEntry, ReviewSession
|
||||||
|
|
||||||
|
|
||||||
|
def _normalize_lines(value: Any) -> list[str]:
|
||||||
|
if isinstance(value, list):
|
||||||
|
return [str(item).strip() for item in value if str(item).strip()]
|
||||||
|
if isinstance(value, str):
|
||||||
|
return [line.strip() for line in value.splitlines() if line.strip()]
|
||||||
|
return []
|
||||||
|
|
||||||
|
|
||||||
|
class GroundRecallReviewWorkspace:
|
||||||
|
def __init__(self, import_dir: str | Path, reviewer: str = "GroundRecall Import") -> None:
|
||||||
|
self.import_dir = Path(import_dir)
|
||||||
|
self.reviewer = reviewer
|
||||||
|
|
||||||
|
@property
|
||||||
|
def review_session_path(self) -> Path:
|
||||||
|
return self.import_dir / "review_session.json"
|
||||||
|
|
||||||
|
@property
|
||||||
|
def review_data_path(self) -> Path:
|
||||||
|
return self.import_dir / "review_data.json"
|
||||||
|
|
||||||
|
def ensure_review_bundle(self) -> None:
|
||||||
|
if not self.review_session_path.exists():
|
||||||
|
export_review_bundle_from_import(self.import_dir, reviewer=self.reviewer)
|
||||||
|
return
|
||||||
|
session = ReviewSession.model_validate_json(self.review_session_path.read_text(encoding="utf-8"))
|
||||||
|
updated = False
|
||||||
|
if (
|
||||||
|
not session.citation_reviews
|
||||||
|
or any(entry.source_kind == "citation_key" and not entry.title for entry in session.citation_reviews)
|
||||||
|
or any(entry.source_kind == "citation_key" and not entry.source_bib_path for entry in session.citation_reviews)
|
||||||
|
):
|
||||||
|
session.citation_reviews = build_citation_review_entries_from_import(self.import_dir)
|
||||||
|
updated = True
|
||||||
|
if updated or not self.review_data_path.exists():
|
||||||
|
self.save_session(session)
|
||||||
|
|
||||||
|
def load_session(self) -> ReviewSession:
|
||||||
|
self.ensure_review_bundle()
|
||||||
|
return ReviewSession.model_validate_json(self.review_session_path.read_text(encoding="utf-8"))
|
||||||
|
|
||||||
|
def save_session(self, session: ReviewSession) -> None:
|
||||||
|
export_review_state_json(session, self.review_session_path)
|
||||||
|
export_review_ui_data(session, self.import_dir, import_dir=self.import_dir)
|
||||||
|
|
||||||
|
def load_review_data(self) -> dict[str, Any]:
|
||||||
|
self.ensure_review_bundle()
|
||||||
|
return json.loads(self.review_data_path.read_text(encoding="utf-8"))
|
||||||
|
|
||||||
|
def apply_updates(
|
||||||
|
self,
|
||||||
|
*,
|
||||||
|
concept_updates: list[dict[str, Any]] | None = None,
|
||||||
|
citation_updates: list[dict[str, Any]] | None = None,
|
||||||
|
reviewer: str | None = None,
|
||||||
|
) -> ReviewSession:
|
||||||
|
session = self.load_session()
|
||||||
|
if reviewer:
|
||||||
|
session.reviewer = reviewer
|
||||||
|
concept_by_id = {concept.concept_id: concept for concept in session.draft_pack.concepts}
|
||||||
|
citation_by_id = {entry.citation_review_id: entry for entry in session.citation_reviews}
|
||||||
|
|
||||||
|
for payload in concept_updates or []:
|
||||||
|
concept_id = str(payload.get("concept_id", "")).strip()
|
||||||
|
if not concept_id or concept_id not in concept_by_id:
|
||||||
|
continue
|
||||||
|
concept = concept_by_id[concept_id]
|
||||||
|
if "status" in payload:
|
||||||
|
concept.status = payload["status"]
|
||||||
|
if "description" in payload:
|
||||||
|
concept.description = str(payload.get("description", "")).strip()
|
||||||
|
if "notes" in payload:
|
||||||
|
concept.notes = _normalize_lines(payload.get("notes"))
|
||||||
|
if "prerequisites" in payload:
|
||||||
|
concept.prerequisites = _normalize_lines(payload.get("prerequisites"))
|
||||||
|
session.ledger.append(
|
||||||
|
ReviewLedgerEntry(
|
||||||
|
reviewer=session.reviewer,
|
||||||
|
action=ReviewAction(
|
||||||
|
action_type="edit_concept",
|
||||||
|
target=concept_id,
|
||||||
|
payload={
|
||||||
|
"status": concept.status,
|
||||||
|
"description": concept.description,
|
||||||
|
"notes": concept.notes,
|
||||||
|
"prerequisites": concept.prerequisites,
|
||||||
|
},
|
||||||
|
rationale=str(payload.get("rationale", "")).strip(),
|
||||||
|
),
|
||||||
|
)
|
||||||
|
)
|
||||||
|
|
||||||
|
for payload in citation_updates or []:
|
||||||
|
citation_review_id = str(payload.get("citation_review_id", "")).strip()
|
||||||
|
if not citation_review_id or citation_review_id not in citation_by_id:
|
||||||
|
continue
|
||||||
|
entry = citation_by_id[citation_review_id]
|
||||||
|
if "status" in payload:
|
||||||
|
entry.status = payload["status"]
|
||||||
|
if "notes" in payload:
|
||||||
|
entry.notes = _normalize_lines(payload.get("notes"))
|
||||||
|
session.ledger.append(
|
||||||
|
ReviewLedgerEntry(
|
||||||
|
reviewer=session.reviewer,
|
||||||
|
action=ReviewAction(
|
||||||
|
action_type="edit_citation",
|
||||||
|
target=citation_review_id,
|
||||||
|
payload={"status": entry.status, "notes": entry.notes},
|
||||||
|
rationale=str(payload.get("rationale", "")).strip(),
|
||||||
|
),
|
||||||
|
)
|
||||||
|
)
|
||||||
|
|
||||||
|
self.save_session(session)
|
||||||
|
return session
|
||||||
|
|
@ -0,0 +1,3 @@
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
from .. import groundrecall_source_adapters as _legacy_source_adapters # noqa: F401
|
||||||
|
|
@ -0,0 +1,3 @@
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
from ..groundrecall_source_adapters.base import * # noqa: F403
|
||||||
|
|
@ -0,0 +1,3 @@
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
from ..groundrecall_source_adapters.didactopus_pack import * # noqa: F403
|
||||||
|
|
@ -0,0 +1,3 @@
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
from ..groundrecall_source_adapters.llmwiki import * # noqa: F403
|
||||||
|
|
@ -0,0 +1,3 @@
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
from ..groundrecall_source_adapters.markdown_notes import * # noqa: F403
|
||||||
|
|
@ -0,0 +1,3 @@
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
from ..groundrecall_source_adapters.polypaper import * # noqa: F403
|
||||||
|
|
@ -0,0 +1,3 @@
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
from ..groundrecall_source_adapters.transcript import * # noqa: F403
|
||||||
|
|
@ -0,0 +1,203 @@
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import os
|
||||||
|
import tempfile
|
||||||
|
from pathlib import Path
|
||||||
|
from typing import TypeVar
|
||||||
|
|
||||||
|
from pydantic import BaseModel
|
||||||
|
|
||||||
|
from .models import (
|
||||||
|
ArtifactRecord,
|
||||||
|
ClaimRecord,
|
||||||
|
ConceptRecord,
|
||||||
|
FragmentRecord,
|
||||||
|
GroundRecallSnapshot,
|
||||||
|
ObservationRecord,
|
||||||
|
PromotionRecord,
|
||||||
|
RelationRecord,
|
||||||
|
ReviewCandidateRecord,
|
||||||
|
SourceRecord,
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
ModelT = TypeVar("ModelT", bound=BaseModel)
|
||||||
|
|
||||||
|
|
||||||
|
class GroundRecallStore:
|
||||||
|
def __init__(self, base_dir: str | Path):
|
||||||
|
self.base_dir = Path(base_dir)
|
||||||
|
self.sources_dir = self.base_dir / "sources"
|
||||||
|
self.fragments_dir = self.base_dir / "fragments"
|
||||||
|
self.artifacts_dir = self.base_dir / "artifacts"
|
||||||
|
self.observations_dir = self.base_dir / "observations"
|
||||||
|
self.claims_dir = self.base_dir / "claims"
|
||||||
|
self.concepts_dir = self.base_dir / "concepts"
|
||||||
|
self.relations_dir = self.base_dir / "relations"
|
||||||
|
self.review_candidates_dir = self.base_dir / "review_candidates"
|
||||||
|
self.promotions_dir = self.base_dir / "promotions"
|
||||||
|
self.snapshots_dir = self.base_dir / "snapshots"
|
||||||
|
for path in [
|
||||||
|
self.sources_dir,
|
||||||
|
self.fragments_dir,
|
||||||
|
self.artifacts_dir,
|
||||||
|
self.observations_dir,
|
||||||
|
self.claims_dir,
|
||||||
|
self.concepts_dir,
|
||||||
|
self.relations_dir,
|
||||||
|
self.review_candidates_dir,
|
||||||
|
self.promotions_dir,
|
||||||
|
self.snapshots_dir,
|
||||||
|
]:
|
||||||
|
path.mkdir(parents=True, exist_ok=True)
|
||||||
|
|
||||||
|
def _save(self, directory: Path, key: str, model: BaseModel) -> None:
|
||||||
|
target = directory / f"{key}.json"
|
||||||
|
payload = model.model_dump_json(indent=2)
|
||||||
|
self._write_text_atomic(target, payload)
|
||||||
|
|
||||||
|
def _write_text_atomic(self, path: Path, text: str) -> None:
|
||||||
|
path.parent.mkdir(parents=True, exist_ok=True)
|
||||||
|
fd, tmp_name = tempfile.mkstemp(
|
||||||
|
prefix=f".{path.name}.",
|
||||||
|
suffix=".tmp",
|
||||||
|
dir=path.parent,
|
||||||
|
text=True,
|
||||||
|
)
|
||||||
|
tmp_path = Path(tmp_name)
|
||||||
|
try:
|
||||||
|
with os.fdopen(fd, "w", encoding="utf-8") as handle:
|
||||||
|
handle.write(text)
|
||||||
|
handle.flush()
|
||||||
|
os.fsync(handle.fileno())
|
||||||
|
os.replace(tmp_path, path)
|
||||||
|
finally:
|
||||||
|
if tmp_path.exists():
|
||||||
|
tmp_path.unlink()
|
||||||
|
|
||||||
|
def _load(self, directory: Path, key: str, model_type: type[ModelT]) -> ModelT | None:
|
||||||
|
path = directory / f"{key}.json"
|
||||||
|
if not path.exists():
|
||||||
|
return None
|
||||||
|
return model_type.model_validate_json(path.read_text(encoding="utf-8"))
|
||||||
|
|
||||||
|
def _list(self, directory: Path, model_type: type[ModelT]) -> list[ModelT]:
|
||||||
|
items: list[ModelT] = []
|
||||||
|
for path in sorted(directory.glob("*.json")):
|
||||||
|
items.append(model_type.model_validate_json(path.read_text(encoding="utf-8")))
|
||||||
|
return items
|
||||||
|
|
||||||
|
def save_source(self, record: SourceRecord) -> SourceRecord:
|
||||||
|
self._save(self.sources_dir, record.source_id, record)
|
||||||
|
return record
|
||||||
|
|
||||||
|
def get_source(self, source_id: str) -> SourceRecord | None:
|
||||||
|
return self._load(self.sources_dir, source_id, SourceRecord)
|
||||||
|
|
||||||
|
def list_sources(self) -> list[SourceRecord]:
|
||||||
|
return self._list(self.sources_dir, SourceRecord)
|
||||||
|
|
||||||
|
def save_fragment(self, record: FragmentRecord) -> FragmentRecord:
|
||||||
|
self._save(self.fragments_dir, record.fragment_id, record)
|
||||||
|
return record
|
||||||
|
|
||||||
|
def get_fragment(self, fragment_id: str) -> FragmentRecord | None:
|
||||||
|
return self._load(self.fragments_dir, fragment_id, FragmentRecord)
|
||||||
|
|
||||||
|
def list_fragments(self) -> list[FragmentRecord]:
|
||||||
|
return self._list(self.fragments_dir, FragmentRecord)
|
||||||
|
|
||||||
|
def save_artifact(self, record: ArtifactRecord) -> ArtifactRecord:
|
||||||
|
self._save(self.artifacts_dir, record.artifact_id, record)
|
||||||
|
return record
|
||||||
|
|
||||||
|
def get_artifact(self, artifact_id: str) -> ArtifactRecord | None:
|
||||||
|
return self._load(self.artifacts_dir, artifact_id, ArtifactRecord)
|
||||||
|
|
||||||
|
def list_artifacts(self) -> list[ArtifactRecord]:
|
||||||
|
return self._list(self.artifacts_dir, ArtifactRecord)
|
||||||
|
|
||||||
|
def save_observation(self, record: ObservationRecord) -> ObservationRecord:
|
||||||
|
self._save(self.observations_dir, record.observation_id, record)
|
||||||
|
return record
|
||||||
|
|
||||||
|
def get_observation(self, observation_id: str) -> ObservationRecord | None:
|
||||||
|
return self._load(self.observations_dir, observation_id, ObservationRecord)
|
||||||
|
|
||||||
|
def list_observations(self) -> list[ObservationRecord]:
|
||||||
|
return self._list(self.observations_dir, ObservationRecord)
|
||||||
|
|
||||||
|
def save_claim(self, record: ClaimRecord) -> ClaimRecord:
|
||||||
|
self._save(self.claims_dir, record.claim_id, record)
|
||||||
|
return record
|
||||||
|
|
||||||
|
def get_claim(self, claim_id: str) -> ClaimRecord | None:
|
||||||
|
return self._load(self.claims_dir, claim_id, ClaimRecord)
|
||||||
|
|
||||||
|
def list_claims(self) -> list[ClaimRecord]:
|
||||||
|
return self._list(self.claims_dir, ClaimRecord)
|
||||||
|
|
||||||
|
def save_concept(self, record: ConceptRecord) -> ConceptRecord:
|
||||||
|
self._save(self.concepts_dir, record.concept_id.replace("::", "__"), record)
|
||||||
|
return record
|
||||||
|
|
||||||
|
def get_concept(self, concept_id: str) -> ConceptRecord | None:
|
||||||
|
return self._load(self.concepts_dir, concept_id.replace("::", "__"), ConceptRecord)
|
||||||
|
|
||||||
|
def list_concepts(self) -> list[ConceptRecord]:
|
||||||
|
return self._list(self.concepts_dir, ConceptRecord)
|
||||||
|
|
||||||
|
def save_relation(self, record: RelationRecord) -> RelationRecord:
|
||||||
|
self._save(self.relations_dir, record.relation_id, record)
|
||||||
|
return record
|
||||||
|
|
||||||
|
def get_relation(self, relation_id: str) -> RelationRecord | None:
|
||||||
|
return self._load(self.relations_dir, relation_id, RelationRecord)
|
||||||
|
|
||||||
|
def list_relations(self) -> list[RelationRecord]:
|
||||||
|
return self._list(self.relations_dir, RelationRecord)
|
||||||
|
|
||||||
|
def save_review_candidate(self, record: ReviewCandidateRecord) -> ReviewCandidateRecord:
|
||||||
|
self._save(self.review_candidates_dir, record.review_candidate_id, record)
|
||||||
|
return record
|
||||||
|
|
||||||
|
def get_review_candidate(self, review_candidate_id: str) -> ReviewCandidateRecord | None:
|
||||||
|
return self._load(self.review_candidates_dir, review_candidate_id, ReviewCandidateRecord)
|
||||||
|
|
||||||
|
def list_review_candidates(self) -> list[ReviewCandidateRecord]:
|
||||||
|
return self._list(self.review_candidates_dir, ReviewCandidateRecord)
|
||||||
|
|
||||||
|
def save_promotion(self, record: PromotionRecord) -> PromotionRecord:
|
||||||
|
self._save(self.promotions_dir, record.promotion_id, record)
|
||||||
|
return record
|
||||||
|
|
||||||
|
def get_promotion(self, promotion_id: str) -> PromotionRecord | None:
|
||||||
|
return self._load(self.promotions_dir, promotion_id, PromotionRecord)
|
||||||
|
|
||||||
|
def list_promotions(self) -> list[PromotionRecord]:
|
||||||
|
return self._list(self.promotions_dir, PromotionRecord)
|
||||||
|
|
||||||
|
def save_snapshot(self, snapshot: GroundRecallSnapshot) -> GroundRecallSnapshot:
|
||||||
|
self._save(self.snapshots_dir, snapshot.snapshot_id, snapshot)
|
||||||
|
return snapshot
|
||||||
|
|
||||||
|
def get_snapshot(self, snapshot_id: str) -> GroundRecallSnapshot | None:
|
||||||
|
return self._load(self.snapshots_dir, snapshot_id, GroundRecallSnapshot)
|
||||||
|
|
||||||
|
def list_snapshots(self) -> list[GroundRecallSnapshot]:
|
||||||
|
return self._list(self.snapshots_dir, GroundRecallSnapshot)
|
||||||
|
|
||||||
|
def build_snapshot(self, snapshot_id: str, created_at: str, metadata: dict | None = None) -> GroundRecallSnapshot:
|
||||||
|
return GroundRecallSnapshot(
|
||||||
|
snapshot_id=snapshot_id,
|
||||||
|
created_at=created_at,
|
||||||
|
sources=self.list_sources(),
|
||||||
|
fragments=self.list_fragments(),
|
||||||
|
artifacts=self.list_artifacts(),
|
||||||
|
observations=self.list_observations(),
|
||||||
|
claims=self.list_claims(),
|
||||||
|
concepts=self.list_concepts(),
|
||||||
|
relations=self.list_relations(),
|
||||||
|
promotions=self.list_promotions(),
|
||||||
|
metadata=metadata or {},
|
||||||
|
)
|
||||||
|
|
@ -0,0 +1,17 @@
|
||||||
|
import shutil
|
||||||
|
import subprocess
|
||||||
|
|
||||||
|
|
||||||
|
def test_groundrecall_console_script_help() -> None:
|
||||||
|
executable = shutil.which("groundrecall")
|
||||||
|
assert executable is not None
|
||||||
|
|
||||||
|
result = subprocess.run(
|
||||||
|
[executable, "--help"],
|
||||||
|
capture_output=True,
|
||||||
|
text=True,
|
||||||
|
check=False,
|
||||||
|
)
|
||||||
|
|
||||||
|
assert result.returncode == 0
|
||||||
|
assert "GroundRecall command-line tools" in result.stdout
|
||||||
|
|
@ -0,0 +1,134 @@
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import json
|
||||||
|
from pathlib import Path
|
||||||
|
|
||||||
|
from groundrecall.assistant_export import export_assistant_bundle
|
||||||
|
from groundrecall.assistants.base import get_assistant_adapter, list_assistant_adapters
|
||||||
|
import groundrecall.assistants.codex # noqa: F401
|
||||||
|
import groundrecall.assistants.claude_code # noqa: F401
|
||||||
|
from groundrecall.models import (
|
||||||
|
ArtifactRecord,
|
||||||
|
ClaimRecord,
|
||||||
|
ConceptRecord,
|
||||||
|
ObservationRecord,
|
||||||
|
ProvenanceRecord,
|
||||||
|
RelationRecord,
|
||||||
|
)
|
||||||
|
from groundrecall.query import build_query_bundle_for_concept
|
||||||
|
from groundrecall.store import GroundRecallStore
|
||||||
|
|
||||||
|
|
||||||
|
def _seed_store(store: GroundRecallStore) -> None:
|
||||||
|
store.save_artifact(
|
||||||
|
ArtifactRecord(
|
||||||
|
artifact_id="ia_001",
|
||||||
|
artifact_kind="compiled_page",
|
||||||
|
title="Channel Capacity",
|
||||||
|
path="wiki/channel-capacity.md",
|
||||||
|
current_status="reviewed",
|
||||||
|
)
|
||||||
|
)
|
||||||
|
store.save_observation(
|
||||||
|
ObservationRecord(
|
||||||
|
observation_id="obs_001",
|
||||||
|
artifact_id="ia_001",
|
||||||
|
role="claim",
|
||||||
|
text="Reliable communication rate is bounded by channel capacity.",
|
||||||
|
provenance=ProvenanceRecord(
|
||||||
|
origin_artifact_id="ia_001",
|
||||||
|
origin_path="wiki/channel-capacity.md",
|
||||||
|
support_kind="derived_from_page",
|
||||||
|
grounding_status="grounded",
|
||||||
|
),
|
||||||
|
current_status="reviewed",
|
||||||
|
)
|
||||||
|
)
|
||||||
|
store.save_concept(
|
||||||
|
ConceptRecord(
|
||||||
|
concept_id="concept::channel-capacity",
|
||||||
|
title="Channel Capacity",
|
||||||
|
description="Reliable communication limit.",
|
||||||
|
source_artifact_ids=["ia_001"],
|
||||||
|
current_status="promoted",
|
||||||
|
)
|
||||||
|
)
|
||||||
|
store.save_concept(
|
||||||
|
ConceptRecord(
|
||||||
|
concept_id="concept::shannon-entropy",
|
||||||
|
title="Shannon Entropy",
|
||||||
|
description="Average uncertainty.",
|
||||||
|
current_status="promoted",
|
||||||
|
)
|
||||||
|
)
|
||||||
|
store.save_claim(
|
||||||
|
ClaimRecord(
|
||||||
|
claim_id="clm_001",
|
||||||
|
claim_text="Channel capacity bounds reliable communication rate.",
|
||||||
|
concept_ids=["concept::channel-capacity"],
|
||||||
|
source_observation_ids=["obs_001"],
|
||||||
|
confidence_hint=0.8,
|
||||||
|
review_confidence=0.9,
|
||||||
|
provenance=ProvenanceRecord(
|
||||||
|
origin_artifact_id="ia_001",
|
||||||
|
origin_path="wiki/channel-capacity.md",
|
||||||
|
support_kind="derived_from_page",
|
||||||
|
grounding_status="grounded",
|
||||||
|
),
|
||||||
|
current_status="promoted",
|
||||||
|
)
|
||||||
|
)
|
||||||
|
store.save_relation(
|
||||||
|
RelationRecord(
|
||||||
|
relation_id="rel_001",
|
||||||
|
source_id="concept::channel-capacity",
|
||||||
|
target_id="concept::shannon-entropy",
|
||||||
|
relation_type="references",
|
||||||
|
current_status="promoted",
|
||||||
|
)
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
def test_assistant_adapter_registry_lists_known_adapters() -> None:
|
||||||
|
assert "codex" in list_assistant_adapters()
|
||||||
|
assert "claude_code" in list_assistant_adapters()
|
||||||
|
|
||||||
|
|
||||||
|
def test_codex_adapter_exports_skill_and_json_bundle(tmp_path: Path) -> None:
|
||||||
|
store = GroundRecallStore(tmp_path / "groundrecall")
|
||||||
|
_seed_store(store)
|
||||||
|
manifest = export_assistant_bundle(store.base_dir, "codex", tmp_path / "codex", concept_refs=["channel-capacity"])
|
||||||
|
|
||||||
|
assert (tmp_path / "codex" / "SKILL.md").exists()
|
||||||
|
assert (tmp_path / "codex" / "codex_bundle.json").exists()
|
||||||
|
assert (tmp_path / "codex" / "assistant_export_manifest.json").exists()
|
||||||
|
assert manifest["assistant"] == "codex"
|
||||||
|
|
||||||
|
|
||||||
|
def test_claude_code_adapter_exports_memory_and_json_bundle(tmp_path: Path) -> None:
|
||||||
|
store = GroundRecallStore(tmp_path / "groundrecall")
|
||||||
|
_seed_store(store)
|
||||||
|
manifest = export_assistant_bundle(store.base_dir, "claude_code", tmp_path / "claude", concept_refs=["channel-capacity"])
|
||||||
|
|
||||||
|
assert (tmp_path / "claude" / "CLAUDE.md").exists()
|
||||||
|
assert (tmp_path / "claude" / "claude_code_bundle.json").exists()
|
||||||
|
assert manifest["assistant"] == "claude_code"
|
||||||
|
|
||||||
|
|
||||||
|
def test_adapter_contexts_are_derived_from_assistant_neutral_query_bundles(tmp_path: Path) -> None:
|
||||||
|
store = GroundRecallStore(tmp_path / "groundrecall")
|
||||||
|
_seed_store(store)
|
||||||
|
query_bundle = build_query_bundle_for_concept(store.base_dir, "channel-capacity")
|
||||||
|
assert query_bundle is not None
|
||||||
|
|
||||||
|
codex = get_assistant_adapter("codex")
|
||||||
|
claude = get_assistant_adapter("claude_code")
|
||||||
|
codex_context = codex.build_context(query_bundle)
|
||||||
|
claude_context = claude.build_context(query_bundle)
|
||||||
|
|
||||||
|
assert codex_context["concept"]["concept_id"] == "concept::channel-capacity"
|
||||||
|
assert claude_context["concept"]["concept_id"] == "concept::channel-capacity"
|
||||||
|
assert codex_context["assistant"] == "codex"
|
||||||
|
assert claude_context["assistant"] == "claude_code"
|
||||||
|
assert "relevant_claims" in codex_context
|
||||||
|
assert "claims" in claude_context
|
||||||
|
|
@ -0,0 +1,136 @@
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import json
|
||||||
|
from pathlib import Path
|
||||||
|
|
||||||
|
from groundrecall.export import export_canonical_bundle, export_query_bundle
|
||||||
|
from groundrecall.models import (
|
||||||
|
ArtifactRecord,
|
||||||
|
ClaimRecord,
|
||||||
|
ConceptRecord,
|
||||||
|
ObservationRecord,
|
||||||
|
ProvenanceRecord,
|
||||||
|
RelationRecord,
|
||||||
|
SourceRecord,
|
||||||
|
)
|
||||||
|
from groundrecall.store import GroundRecallStore
|
||||||
|
|
||||||
|
|
||||||
|
def _read_jsonl(path: Path) -> list[dict]:
|
||||||
|
text = path.read_text(encoding="utf-8").strip()
|
||||||
|
if not text:
|
||||||
|
return []
|
||||||
|
return [json.loads(line) for line in text.splitlines()]
|
||||||
|
|
||||||
|
|
||||||
|
def _seed_store(store: GroundRecallStore) -> None:
|
||||||
|
store.save_source(SourceRecord(source_id="src_001", title="Source", current_status="promoted"))
|
||||||
|
store.save_artifact(
|
||||||
|
ArtifactRecord(
|
||||||
|
artifact_id="ia_001",
|
||||||
|
artifact_kind="compiled_page",
|
||||||
|
title="Channel Capacity",
|
||||||
|
path="wiki/channel-capacity.md",
|
||||||
|
current_status="reviewed",
|
||||||
|
)
|
||||||
|
)
|
||||||
|
store.save_observation(
|
||||||
|
ObservationRecord(
|
||||||
|
observation_id="obs_001",
|
||||||
|
artifact_id="ia_001",
|
||||||
|
role="claim",
|
||||||
|
text="Reliable communication rate is bounded by channel capacity.",
|
||||||
|
provenance=ProvenanceRecord(
|
||||||
|
origin_artifact_id="ia_001",
|
||||||
|
origin_path="wiki/channel-capacity.md",
|
||||||
|
support_kind="derived_from_page",
|
||||||
|
grounding_status="grounded",
|
||||||
|
),
|
||||||
|
current_status="reviewed",
|
||||||
|
)
|
||||||
|
)
|
||||||
|
store.save_concept(
|
||||||
|
ConceptRecord(
|
||||||
|
concept_id="concept::channel-capacity",
|
||||||
|
title="Channel Capacity",
|
||||||
|
description="Reliable communication limit.",
|
||||||
|
source_artifact_ids=["ia_001"],
|
||||||
|
current_status="promoted",
|
||||||
|
)
|
||||||
|
)
|
||||||
|
store.save_concept(
|
||||||
|
ConceptRecord(
|
||||||
|
concept_id="concept::shannon-entropy",
|
||||||
|
title="Shannon Entropy",
|
||||||
|
description="Average uncertainty.",
|
||||||
|
current_status="promoted",
|
||||||
|
)
|
||||||
|
)
|
||||||
|
store.save_claim(
|
||||||
|
ClaimRecord(
|
||||||
|
claim_id="clm_001",
|
||||||
|
claim_text="Channel capacity bounds reliable communication rate.",
|
||||||
|
concept_ids=["concept::channel-capacity"],
|
||||||
|
source_observation_ids=["obs_001"],
|
||||||
|
confidence_hint=0.8,
|
||||||
|
review_confidence=0.9,
|
||||||
|
provenance=ProvenanceRecord(
|
||||||
|
origin_artifact_id="ia_001",
|
||||||
|
origin_path="wiki/channel-capacity.md",
|
||||||
|
support_kind="derived_from_page",
|
||||||
|
grounding_status="grounded",
|
||||||
|
),
|
||||||
|
current_status="promoted",
|
||||||
|
)
|
||||||
|
)
|
||||||
|
store.save_relation(
|
||||||
|
RelationRecord(
|
||||||
|
relation_id="rel_001",
|
||||||
|
source_id="concept::channel-capacity",
|
||||||
|
target_id="concept::shannon-entropy",
|
||||||
|
relation_type="references",
|
||||||
|
current_status="promoted",
|
||||||
|
)
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
def test_export_canonical_bundle_writes_expected_files(tmp_path: Path) -> None:
|
||||||
|
store = GroundRecallStore(tmp_path / "groundrecall")
|
||||||
|
_seed_store(store)
|
||||||
|
|
||||||
|
out_dir = tmp_path / "exports"
|
||||||
|
payload = export_canonical_bundle(
|
||||||
|
store_dir=store.base_dir,
|
||||||
|
out_dir=out_dir,
|
||||||
|
concept_refs=["channel-capacity"],
|
||||||
|
snapshot_id="snap_export_001",
|
||||||
|
)
|
||||||
|
|
||||||
|
assert (out_dir / "groundrecall_snapshot.json").exists()
|
||||||
|
assert (out_dir / "claims.jsonl").exists()
|
||||||
|
assert (out_dir / "concepts.jsonl").exists()
|
||||||
|
assert (out_dir / "relations.jsonl").exists()
|
||||||
|
assert (out_dir / "provenance_manifest.json").exists()
|
||||||
|
assert (out_dir / "export_manifest.json").exists()
|
||||||
|
assert (out_dir / "query_bundle__channel-capacity.json").exists()
|
||||||
|
|
||||||
|
snapshot = json.loads((out_dir / "groundrecall_snapshot.json").read_text(encoding="utf-8"))
|
||||||
|
manifest = json.loads((out_dir / "export_manifest.json").read_text(encoding="utf-8"))
|
||||||
|
claims = _read_jsonl(out_dir / "claims.jsonl")
|
||||||
|
assert snapshot["snapshot_id"] == "snap_export_001"
|
||||||
|
assert manifest["export_kind"] == "canonical"
|
||||||
|
assert len(manifest["query_bundles"]) == 1
|
||||||
|
assert claims[0]["claim_id"] == "clm_001"
|
||||||
|
assert payload["query_bundles"]
|
||||||
|
|
||||||
|
|
||||||
|
def test_export_query_bundle_is_assistant_neutral(tmp_path: Path) -> None:
|
||||||
|
store = GroundRecallStore(tmp_path / "groundrecall")
|
||||||
|
_seed_store(store)
|
||||||
|
|
||||||
|
out_path = tmp_path / "bundle.json"
|
||||||
|
payload = export_query_bundle(store.base_dir, "channel capacity", out_path)
|
||||||
|
assert out_path.exists()
|
||||||
|
assert payload["bundle_kind"] == "groundrecall_query_bundle"
|
||||||
|
forbidden = {"assistant", "codex", "claude", "prompt_text"}
|
||||||
|
assert set(payload).isdisjoint(forbidden)
|
||||||
|
|
@ -0,0 +1,161 @@
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import json
|
||||||
|
from pathlib import Path
|
||||||
|
|
||||||
|
from groundrecall.ingest import run_groundrecall_import
|
||||||
|
from groundrecall.lint import lint_import_directory
|
||||||
|
|
||||||
|
|
||||||
|
def _read_jsonl(path: Path) -> list[dict]:
|
||||||
|
text = path.read_text(encoding="utf-8").strip()
|
||||||
|
if not text:
|
||||||
|
return []
|
||||||
|
return [json.loads(line) for line in text.splitlines()]
|
||||||
|
|
||||||
|
|
||||||
|
def test_groundrecall_import_emits_normalized_artifacts(tmp_path: Path) -> None:
|
||||||
|
root = tmp_path / "llmwiki"
|
||||||
|
(root / "wiki").mkdir(parents=True)
|
||||||
|
(root / "raw").mkdir()
|
||||||
|
(root / "logs").mkdir()
|
||||||
|
|
||||||
|
(root / "wiki" / "channel-capacity.md").write_text(
|
||||||
|
"# Channel Capacity\n\n"
|
||||||
|
"- Reliable rate upper bound for a noisy channel.\n\n"
|
||||||
|
"See also [[Shannon Entropy]].\n",
|
||||||
|
encoding="utf-8",
|
||||||
|
)
|
||||||
|
(root / "raw" / "notes.md").write_text(
|
||||||
|
"Speculation: Capacity may depend on constraints.\n",
|
||||||
|
encoding="utf-8",
|
||||||
|
)
|
||||||
|
(root / "logs" / "session.log").write_text(
|
||||||
|
"Learner asked about entropy and communication limits.\n",
|
||||||
|
encoding="utf-8",
|
||||||
|
)
|
||||||
|
|
||||||
|
result = run_groundrecall_import(root, mode="quick", import_id="import-test")
|
||||||
|
|
||||||
|
assert result.out_dir == root / "imports" / "import-test"
|
||||||
|
manifest = json.loads((result.out_dir / "manifest.json").read_text(encoding="utf-8"))
|
||||||
|
assert manifest["source_repo_kind"] == "llmwiki"
|
||||||
|
assert manifest["artifact_count"] == 3
|
||||||
|
assert manifest["claim_count"] >= 1
|
||||||
|
|
||||||
|
artifacts = _read_jsonl(result.out_dir / "artifacts.jsonl")
|
||||||
|
assert {item["artifact_kind"] for item in artifacts} == {"compiled_page", "raw_note", "session_log"}
|
||||||
|
|
||||||
|
claims = _read_jsonl(result.out_dir / "claims.jsonl")
|
||||||
|
assert any("Reliable rate upper bound" in item["claim_text"] for item in claims)
|
||||||
|
|
||||||
|
concepts = _read_jsonl(result.out_dir / "concepts.jsonl")
|
||||||
|
concept_ids = {item["concept_id"] for item in concepts}
|
||||||
|
assert "concept::channel-capacity" in concept_ids
|
||||||
|
assert "concept::shannon-entropy" in concept_ids
|
||||||
|
|
||||||
|
relations = _read_jsonl(result.out_dir / "relations.jsonl")
|
||||||
|
assert any(item["target_id"] == "concept::shannon-entropy" for item in relations)
|
||||||
|
|
||||||
|
lint_payload = json.loads((result.out_dir / "lint_findings.json").read_text(encoding="utf-8"))
|
||||||
|
assert "summary" in lint_payload
|
||||||
|
assert lint_payload["summary"]["warning_count"] >= 0
|
||||||
|
|
||||||
|
review_queue = json.loads((result.out_dir / "review_queue.json").read_text(encoding="utf-8"))
|
||||||
|
assert review_queue["queue_length"] >= 1
|
||||||
|
assert any(item["candidate_type"] == "claim" for item in review_queue["items"])
|
||||||
|
review_session = json.loads((result.out_dir / "review_session.json").read_text(encoding="utf-8"))
|
||||||
|
assert review_session["reviewer"] == "GroundRecall Import"
|
||||||
|
assert review_session["draft_pack"]["pack"]["source_import_id"] == "import-test"
|
||||||
|
assert any(item["concept_id"] == "channel-capacity" for item in review_session["draft_pack"]["concepts"])
|
||||||
|
review_data = json.loads((result.out_dir / "review_data.json").read_text(encoding="utf-8"))
|
||||||
|
assert review_data["reviewer"] == "GroundRecall Import"
|
||||||
|
assert "field_specs" in review_data
|
||||||
|
assert any(item["field"] == "status" for item in review_data["field_specs"])
|
||||||
|
assert "review_guidance" in review_data
|
||||||
|
assert "concept_reviews" in review_data
|
||||||
|
assert "citations" in review_data
|
||||||
|
assert "citation_reviews" in review_data
|
||||||
|
|
||||||
|
|
||||||
|
def test_groundrecall_import_parses_explicit_claim_relations(tmp_path: Path) -> None:
|
||||||
|
root = tmp_path / "llmwiki"
|
||||||
|
(root / "wiki").mkdir(parents=True)
|
||||||
|
(root / "wiki" / "notes.md").write_text(
|
||||||
|
"# Notes\n\n"
|
||||||
|
"- [claim_id: base] Channel capacity bounds reliable communication rate.\n"
|
||||||
|
"- [claim_id: revised] [supersedes: base] Channel capacity bounds reliable communication rate for a specified channel model.\n"
|
||||||
|
"- [claim_id: dissent] [contradicts: revised] Channel capacity has no stable interpretation.\n",
|
||||||
|
encoding="utf-8",
|
||||||
|
)
|
||||||
|
|
||||||
|
result = run_groundrecall_import(root, mode="quick", import_id="relations-test")
|
||||||
|
claims = _read_jsonl(result.out_dir / "claims.jsonl")
|
||||||
|
by_id = {item["claim_id"]: item for item in claims}
|
||||||
|
|
||||||
|
assert "clm_base" in by_id
|
||||||
|
assert by_id["clm_revised"]["supersedes_claim_ids"] == ["clm_base"]
|
||||||
|
assert by_id["clm_dissent"]["contradicts_claim_ids"] == ["clm_revised"]
|
||||||
|
|
||||||
|
lint_payload = json.loads((result.out_dir / "lint_findings.json").read_text(encoding="utf-8"))
|
||||||
|
codes = {item["code"] for item in lint_payload["findings"]}
|
||||||
|
assert "unresolved_supersession_ref" not in codes
|
||||||
|
assert "unresolved_contradiction_ref" not in codes
|
||||||
|
|
||||||
|
|
||||||
|
def test_groundrecall_lint_flags_orphan_concepts_and_missing_targets(tmp_path: Path) -> None:
|
||||||
|
root = tmp_path / "llmwiki"
|
||||||
|
(root / "wiki").mkdir(parents=True)
|
||||||
|
(root / "wiki" / "solo.md").write_text(
|
||||||
|
"# Solo Concept\n",
|
||||||
|
encoding="utf-8",
|
||||||
|
)
|
||||||
|
(root / "wiki" / "broken.md").write_text(
|
||||||
|
"# Broken\n\nSee also [[Missing Concept]].\n",
|
||||||
|
encoding="utf-8",
|
||||||
|
)
|
||||||
|
|
||||||
|
result = run_groundrecall_import(root, mode="quick", import_id="lint-test")
|
||||||
|
lint_payload = json.loads((result.out_dir / "lint_findings.json").read_text(encoding="utf-8"))
|
||||||
|
codes = {item["code"] for item in lint_payload["findings"]}
|
||||||
|
assert "orphan_concept" in codes
|
||||||
|
|
||||||
|
|
||||||
|
def test_groundrecall_lint_detects_relation_missing_target(tmp_path: Path) -> None:
|
||||||
|
import_dir = tmp_path / "imports" / "broken-import"
|
||||||
|
import_dir.mkdir(parents=True)
|
||||||
|
(import_dir / "manifest.json").write_text(
|
||||||
|
json.dumps({"import_id": "broken-import", "import_mode": "quick"}),
|
||||||
|
encoding="utf-8",
|
||||||
|
)
|
||||||
|
(import_dir / "artifacts.jsonl").write_text("", encoding="utf-8")
|
||||||
|
(import_dir / "observations.jsonl").write_text("", encoding="utf-8")
|
||||||
|
(import_dir / "claims.jsonl").write_text("", encoding="utf-8")
|
||||||
|
(import_dir / "concepts.jsonl").write_text(
|
||||||
|
json.dumps(
|
||||||
|
{
|
||||||
|
"concept_id": "concept::existing",
|
||||||
|
"title": "Existing",
|
||||||
|
"current_status": "triaged",
|
||||||
|
}
|
||||||
|
)
|
||||||
|
+ "\n",
|
||||||
|
encoding="utf-8",
|
||||||
|
)
|
||||||
|
(import_dir / "relations.jsonl").write_text(
|
||||||
|
json.dumps(
|
||||||
|
{
|
||||||
|
"relation_id": "rel_1",
|
||||||
|
"source_id": "concept::existing",
|
||||||
|
"target_id": "concept::missing",
|
||||||
|
"relation_type": "references",
|
||||||
|
"current_status": "draft",
|
||||||
|
}
|
||||||
|
)
|
||||||
|
+ "\n",
|
||||||
|
encoding="utf-8",
|
||||||
|
)
|
||||||
|
|
||||||
|
payload = lint_import_directory(import_dir)
|
||||||
|
codes = {item["code"] for item in payload["findings"]}
|
||||||
|
assert "relation_missing_target" in codes
|
||||||
|
|
@ -0,0 +1,70 @@
|
||||||
|
import sys
|
||||||
|
from pathlib import Path
|
||||||
|
|
||||||
|
from groundrecall.cli import main as groundrecall_cli_main
|
||||||
|
from groundrecall.export import export_canonical_bundle
|
||||||
|
from groundrecall.ingest import run_groundrecall_import
|
||||||
|
from groundrecall.inspect import inspect_store
|
||||||
|
from groundrecall.models import ClaimRecord
|
||||||
|
from groundrecall.query import query_concept
|
||||||
|
from groundrecall.store import GroundRecallStore
|
||||||
|
from groundrecall.lint import lint_import_directory
|
||||||
|
from groundrecall.promotion import promote_import_to_store
|
||||||
|
|
||||||
|
|
||||||
|
def _build_llmwiki_fixture(root: Path) -> Path:
|
||||||
|
(root / "wiki").mkdir(parents=True)
|
||||||
|
(root / "raw").mkdir()
|
||||||
|
(root / "wiki" / "channel-capacity.md").write_text(
|
||||||
|
"# Channel Capacity\n\n"
|
||||||
|
"- Reliable rate upper bound for a noisy channel.\n\n"
|
||||||
|
"See also [[Shannon Entropy]].\n",
|
||||||
|
encoding="utf-8",
|
||||||
|
)
|
||||||
|
(root / "raw" / "notes.md").write_text(
|
||||||
|
"Speculation: Capacity may depend on constraints.\n",
|
||||||
|
encoding="utf-8",
|
||||||
|
)
|
||||||
|
return root
|
||||||
|
|
||||||
|
|
||||||
|
def test_groundrecall_namespace_reexports_core_functions() -> None:
|
||||||
|
assert run_groundrecall_import.__module__ == "groundrecall.ingest"
|
||||||
|
assert query_concept.__module__ == "groundrecall.query"
|
||||||
|
assert export_canonical_bundle.__module__ == "groundrecall.export"
|
||||||
|
assert lint_import_directory.__module__ == "groundrecall.lint"
|
||||||
|
assert promote_import_to_store.__module__ == "groundrecall.promotion"
|
||||||
|
assert GroundRecallStore.__module__ == "groundrecall.store"
|
||||||
|
assert ClaimRecord.__module__ == "groundrecall.models"
|
||||||
|
|
||||||
|
|
||||||
|
def test_groundrecall_inspect_summarizes_store(tmp_path: Path) -> None:
|
||||||
|
source_root = _build_llmwiki_fixture(tmp_path / "llmwiki")
|
||||||
|
import_result = run_groundrecall_import(source_root, out_root=tmp_path / "imports", mode="quick", import_id="fixture-import")
|
||||||
|
store_dir = tmp_path / "store"
|
||||||
|
promote_import_to_store(import_result.out_dir, store_dir)
|
||||||
|
|
||||||
|
payload = inspect_store(store_dir, out_path=tmp_path / "inspect.json")
|
||||||
|
|
||||||
|
assert (tmp_path / "inspect.json").exists()
|
||||||
|
assert payload["claim_count"] >= 1
|
||||||
|
assert payload["concept_count"] >= 1
|
||||||
|
assert payload["snapshot_count"] >= 1
|
||||||
|
|
||||||
|
|
||||||
|
def test_groundrecall_cli_inspect_dispatches(tmp_path: Path, capsys) -> None:
|
||||||
|
source_root = _build_llmwiki_fixture(tmp_path / "llmwiki")
|
||||||
|
import_result = run_groundrecall_import(source_root, out_root=tmp_path / "imports", mode="quick", import_id="fixture-import")
|
||||||
|
store_dir = tmp_path / "store"
|
||||||
|
promote_import_to_store(import_result.out_dir, store_dir)
|
||||||
|
|
||||||
|
original_argv = sys.argv
|
||||||
|
try:
|
||||||
|
sys.argv = ["groundrecall.cli", "inspect", str(store_dir)]
|
||||||
|
groundrecall_cli_main()
|
||||||
|
finally:
|
||||||
|
sys.argv = original_argv
|
||||||
|
|
||||||
|
output = capsys.readouterr().out
|
||||||
|
assert '"claim_count"' in output
|
||||||
|
assert '"concept_count"' in output
|
||||||
|
|
@ -0,0 +1,96 @@
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import json
|
||||||
|
from pathlib import Path
|
||||||
|
|
||||||
|
from groundrecall.ingest import run_groundrecall_import
|
||||||
|
from groundrecall.promotion import promote_import_to_store
|
||||||
|
from groundrecall.store import GroundRecallStore
|
||||||
|
|
||||||
|
|
||||||
|
def test_groundrecall_promotion_writes_canonical_objects(tmp_path: Path) -> None:
|
||||||
|
root = tmp_path / "llmwiki"
|
||||||
|
(root / "wiki").mkdir(parents=True)
|
||||||
|
(root / "wiki" / "channel-capacity.md").write_text(
|
||||||
|
"# Channel Capacity\n\n"
|
||||||
|
"- Reliable rate upper bound for a noisy channel.\n\n"
|
||||||
|
"See also [[Shannon Entropy]].\n",
|
||||||
|
encoding="utf-8",
|
||||||
|
)
|
||||||
|
|
||||||
|
result = run_groundrecall_import(root, mode="quick", import_id="promote-test")
|
||||||
|
review_path = result.out_dir / "review_session.json"
|
||||||
|
review_payload = json.loads(review_path.read_text(encoding="utf-8"))
|
||||||
|
for concept in review_payload["draft_pack"]["concepts"]:
|
||||||
|
concept["status"] = "trusted"
|
||||||
|
review_path.write_text(json.dumps(review_payload, indent=2), encoding="utf-8")
|
||||||
|
|
||||||
|
store_dir = tmp_path / "groundrecall-store"
|
||||||
|
payload = promote_import_to_store(result.out_dir, store_dir, reviewer="R")
|
||||||
|
|
||||||
|
store = GroundRecallStore(store_dir)
|
||||||
|
concepts = store.list_concepts()
|
||||||
|
claims = store.list_claims()
|
||||||
|
relations = store.list_relations()
|
||||||
|
promotions = store.list_promotions()
|
||||||
|
snapshots = store.list_snapshots()
|
||||||
|
|
||||||
|
assert payload["promoted_concept_count"] >= 1
|
||||||
|
assert payload["promoted_claim_count"] >= 1
|
||||||
|
assert len(concepts) >= 2
|
||||||
|
assert any(item.current_status == "promoted" for item in concepts)
|
||||||
|
assert any(item.current_status == "promoted" for item in claims)
|
||||||
|
assert len(relations) >= 1
|
||||||
|
assert len(promotions) == 1
|
||||||
|
assert promotions[0].reviewer == "R"
|
||||||
|
assert len(snapshots) == 1
|
||||||
|
assert snapshots[0].metadata["source_import_id"] == "promote-test"
|
||||||
|
|
||||||
|
|
||||||
|
def test_groundrecall_promotion_respects_rejected_review_status(tmp_path: Path) -> None:
|
||||||
|
root = tmp_path / "llmwiki"
|
||||||
|
(root / "wiki").mkdir(parents=True)
|
||||||
|
(root / "wiki" / "solo.md").write_text(
|
||||||
|
"# Solo Concept\n\n- A solitary claim.\n",
|
||||||
|
encoding="utf-8",
|
||||||
|
)
|
||||||
|
|
||||||
|
result = run_groundrecall_import(root, mode="quick", import_id="reject-test")
|
||||||
|
review_path = result.out_dir / "review_session.json"
|
||||||
|
review_payload = json.loads(review_path.read_text(encoding="utf-8"))
|
||||||
|
review_payload["draft_pack"]["concepts"][0]["status"] = "rejected"
|
||||||
|
review_path.write_text(json.dumps(review_payload, indent=2), encoding="utf-8")
|
||||||
|
|
||||||
|
store_dir = tmp_path / "groundrecall-store"
|
||||||
|
promote_import_to_store(result.out_dir, store_dir, reviewer="R")
|
||||||
|
|
||||||
|
store = GroundRecallStore(store_dir)
|
||||||
|
assert store.list_concepts()[0].current_status == "rejected"
|
||||||
|
assert store.list_claims()[0].current_status == "rejected"
|
||||||
|
|
||||||
|
|
||||||
|
def test_groundrecall_promotion_preserves_contradiction_and_supersession_links(tmp_path: Path) -> None:
|
||||||
|
root = tmp_path / "llmwiki"
|
||||||
|
(root / "wiki").mkdir(parents=True)
|
||||||
|
(root / "wiki" / "notes.md").write_text(
|
||||||
|
"# Notes\n\n"
|
||||||
|
"- [claim_id: base] Channel capacity bounds reliable communication rate.\n"
|
||||||
|
"- [claim_id: revised] [supersedes: base] Channel capacity bounds reliable communication rate for a specified channel model.\n"
|
||||||
|
"- [claim_id: dissent] [contradicts: revised] Channel capacity has no stable interpretation.\n",
|
||||||
|
encoding="utf-8",
|
||||||
|
)
|
||||||
|
|
||||||
|
result = run_groundrecall_import(root, mode="quick", import_id="graph-test")
|
||||||
|
review_path = result.out_dir / "review_session.json"
|
||||||
|
review_payload = json.loads(review_path.read_text(encoding="utf-8"))
|
||||||
|
for concept in review_payload["draft_pack"]["concepts"]:
|
||||||
|
concept["status"] = "trusted"
|
||||||
|
review_path.write_text(json.dumps(review_payload, indent=2), encoding="utf-8")
|
||||||
|
|
||||||
|
store_dir = tmp_path / "groundrecall-store"
|
||||||
|
promote_import_to_store(result.out_dir, store_dir, reviewer="R")
|
||||||
|
|
||||||
|
store = GroundRecallStore(store_dir)
|
||||||
|
claims = {item.claim_id: item for item in store.list_claims()}
|
||||||
|
assert claims["clm_revised"].supersedes_claim_ids == ["clm_base"]
|
||||||
|
assert claims["clm_dissent"].contradicts_claim_ids == ["clm_revised"]
|
||||||
|
|
@ -0,0 +1,190 @@
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
from pathlib import Path
|
||||||
|
|
||||||
|
from groundrecall.models import (
|
||||||
|
ArtifactRecord,
|
||||||
|
ClaimRecord,
|
||||||
|
ConceptRecord,
|
||||||
|
ObservationRecord,
|
||||||
|
ProvenanceRecord,
|
||||||
|
RelationRecord,
|
||||||
|
)
|
||||||
|
from groundrecall.query import (
|
||||||
|
build_query_bundle_for_concept,
|
||||||
|
query_concept,
|
||||||
|
query_provenance,
|
||||||
|
search_claims,
|
||||||
|
)
|
||||||
|
from groundrecall.store import GroundRecallStore
|
||||||
|
|
||||||
|
|
||||||
|
def _seed_store(store: GroundRecallStore) -> None:
|
||||||
|
store.save_artifact(
|
||||||
|
ArtifactRecord(
|
||||||
|
artifact_id="ia_001",
|
||||||
|
artifact_kind="compiled_page",
|
||||||
|
title="Channel Capacity",
|
||||||
|
path="wiki/channel-capacity.md",
|
||||||
|
current_status="reviewed",
|
||||||
|
)
|
||||||
|
)
|
||||||
|
store.save_observation(
|
||||||
|
ObservationRecord(
|
||||||
|
observation_id="obs_001",
|
||||||
|
artifact_id="ia_001",
|
||||||
|
role="claim",
|
||||||
|
text="Reliable communication rate is bounded by channel capacity.",
|
||||||
|
provenance=ProvenanceRecord(
|
||||||
|
origin_artifact_id="ia_001",
|
||||||
|
origin_path="wiki/channel-capacity.md",
|
||||||
|
support_kind="derived_from_page",
|
||||||
|
grounding_status="grounded",
|
||||||
|
),
|
||||||
|
current_status="reviewed",
|
||||||
|
)
|
||||||
|
)
|
||||||
|
store.save_concept(
|
||||||
|
ConceptRecord(
|
||||||
|
concept_id="concept::channel-capacity",
|
||||||
|
title="Channel Capacity",
|
||||||
|
description="Reliable communication limit.",
|
||||||
|
source_artifact_ids=["ia_001"],
|
||||||
|
current_status="promoted",
|
||||||
|
)
|
||||||
|
)
|
||||||
|
store.save_concept(
|
||||||
|
ConceptRecord(
|
||||||
|
concept_id="concept::shannon-entropy",
|
||||||
|
title="Shannon Entropy",
|
||||||
|
description="Average uncertainty.",
|
||||||
|
current_status="promoted",
|
||||||
|
)
|
||||||
|
)
|
||||||
|
store.save_claim(
|
||||||
|
ClaimRecord(
|
||||||
|
claim_id="clm_001",
|
||||||
|
claim_text="Channel capacity bounds reliable communication rate.",
|
||||||
|
concept_ids=["concept::channel-capacity"],
|
||||||
|
source_observation_ids=["obs_001"],
|
||||||
|
confidence_hint=0.8,
|
||||||
|
review_confidence=0.9,
|
||||||
|
provenance=ProvenanceRecord(
|
||||||
|
origin_artifact_id="ia_001",
|
||||||
|
origin_path="wiki/channel-capacity.md",
|
||||||
|
support_kind="derived_from_page",
|
||||||
|
grounding_status="grounded",
|
||||||
|
),
|
||||||
|
current_status="promoted",
|
||||||
|
)
|
||||||
|
)
|
||||||
|
store.save_claim(
|
||||||
|
ClaimRecord(
|
||||||
|
claim_id="clm_002",
|
||||||
|
claim_text="Shannon entropy can inform channel coding intuition.",
|
||||||
|
concept_ids=["concept::shannon-entropy"],
|
||||||
|
contradicts_claim_ids=["clm_999"],
|
||||||
|
provenance=ProvenanceRecord(
|
||||||
|
origin_artifact_id="ia_001",
|
||||||
|
origin_path="wiki/channel-capacity.md",
|
||||||
|
support_kind="derived_from_page",
|
||||||
|
grounding_status="partially_grounded",
|
||||||
|
),
|
||||||
|
current_status="reviewed",
|
||||||
|
)
|
||||||
|
)
|
||||||
|
store.save_relation(
|
||||||
|
RelationRecord(
|
||||||
|
relation_id="rel_001",
|
||||||
|
source_id="concept::channel-capacity",
|
||||||
|
target_id="concept::shannon-entropy",
|
||||||
|
relation_type="references",
|
||||||
|
current_status="promoted",
|
||||||
|
)
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
def test_query_concept_returns_neighborhood_and_support(tmp_path: Path) -> None:
|
||||||
|
store = GroundRecallStore(tmp_path / "groundrecall")
|
||||||
|
_seed_store(store)
|
||||||
|
|
||||||
|
payload = query_concept(store.base_dir, "channel-capacity")
|
||||||
|
assert payload is not None
|
||||||
|
assert payload["concept"]["concept_id"] == "concept::channel-capacity"
|
||||||
|
assert len(payload["claims"]) == 1
|
||||||
|
assert len(payload["relations"]) == 1
|
||||||
|
assert any(item["concept_id"] == "concept::shannon-entropy" for item in payload["related_concepts"])
|
||||||
|
assert payload["supporting_observations"][0]["origin_path"] == "wiki/channel-capacity.md"
|
||||||
|
|
||||||
|
|
||||||
|
def test_search_claims_matches_text_and_concept_titles(tmp_path: Path) -> None:
|
||||||
|
store = GroundRecallStore(tmp_path / "groundrecall")
|
||||||
|
_seed_store(store)
|
||||||
|
|
||||||
|
payload = search_claims(store.base_dir, "entropy")
|
||||||
|
assert payload["query_type"] == "claim_search"
|
||||||
|
assert any(match["claim"]["claim_id"] == "clm_002" for match in payload["matches"])
|
||||||
|
|
||||||
|
|
||||||
|
def test_query_provenance_filters_by_origin_path(tmp_path: Path) -> None:
|
||||||
|
store = GroundRecallStore(tmp_path / "groundrecall")
|
||||||
|
_seed_store(store)
|
||||||
|
|
||||||
|
payload = query_provenance(store.base_dir, origin_path="wiki/channel-capacity.md")
|
||||||
|
assert len(payload["claims"]) == 2
|
||||||
|
assert len(payload["observations"]) == 1
|
||||||
|
|
||||||
|
|
||||||
|
def test_build_query_bundle_for_concept_is_assistant_neutral(tmp_path: Path) -> None:
|
||||||
|
store = GroundRecallStore(tmp_path / "groundrecall")
|
||||||
|
_seed_store(store)
|
||||||
|
|
||||||
|
payload = build_query_bundle_for_concept(store.base_dir, "channel capacity")
|
||||||
|
assert payload is not None
|
||||||
|
assert payload["bundle_kind"] == "groundrecall_query_bundle"
|
||||||
|
assert payload["concept"]["concept_id"] == "concept::channel-capacity"
|
||||||
|
assert isinstance(payload["suggested_next_actions"], list)
|
||||||
|
forbidden = {"assistant", "codex", "claude", "prompt_text"}
|
||||||
|
assert set(payload).isdisjoint(forbidden)
|
||||||
|
|
||||||
|
|
||||||
|
def test_query_bundle_surfaces_contradictions_and_supersessions(tmp_path: Path) -> None:
|
||||||
|
store = GroundRecallStore(tmp_path / "groundrecall")
|
||||||
|
_seed_store(store)
|
||||||
|
store.save_claim(
|
||||||
|
ClaimRecord(
|
||||||
|
claim_id="clm_003",
|
||||||
|
claim_text="Channel capacity is undefined in practice.",
|
||||||
|
concept_ids=["concept::channel-capacity"],
|
||||||
|
contradicts_claim_ids=["clm_001"],
|
||||||
|
provenance=ProvenanceRecord(
|
||||||
|
origin_artifact_id="ia_001",
|
||||||
|
origin_path="wiki/channel-capacity.md",
|
||||||
|
support_kind="derived_from_page",
|
||||||
|
grounding_status="partially_grounded",
|
||||||
|
),
|
||||||
|
current_status="reviewed",
|
||||||
|
)
|
||||||
|
)
|
||||||
|
store.save_claim(
|
||||||
|
ClaimRecord(
|
||||||
|
claim_id="clm_004",
|
||||||
|
claim_text="Channel capacity should be interpreted relative to a specific channel model.",
|
||||||
|
concept_ids=["concept::channel-capacity"],
|
||||||
|
supersedes_claim_ids=["clm_001"],
|
||||||
|
provenance=ProvenanceRecord(
|
||||||
|
origin_artifact_id="ia_001",
|
||||||
|
origin_path="wiki/channel-capacity.md",
|
||||||
|
support_kind="derived_from_page",
|
||||||
|
grounding_status="grounded",
|
||||||
|
),
|
||||||
|
current_status="reviewed",
|
||||||
|
)
|
||||||
|
)
|
||||||
|
|
||||||
|
payload = build_query_bundle_for_concept(store.base_dir, "channel-capacity")
|
||||||
|
assert payload is not None
|
||||||
|
contradiction_ids = {item["claim_id"] for item in payload["contradictions"]}
|
||||||
|
supersession_ids = {item["claim_id"] for item in payload["supersessions"]}
|
||||||
|
assert "clm_003" in contradiction_ids
|
||||||
|
assert "clm_004" in supersession_ids
|
||||||
|
|
@ -0,0 +1,60 @@
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
from types import SimpleNamespace
|
||||||
|
|
||||||
|
from groundrecall.review_server import _safe_show_entry, _safe_verify_entry
|
||||||
|
|
||||||
|
|
||||||
|
class _StoreWithoutConflicts:
|
||||||
|
def get_entry(self, citation_key: str):
|
||||||
|
if citation_key != "baum1974generalized":
|
||||||
|
return None
|
||||||
|
return {"citation_key": citation_key, "title": "On two types of deviation"}
|
||||||
|
|
||||||
|
def get_field_provenance(self, citation_key: str):
|
||||||
|
return [{"field_name": "title", "source_label": "refs.bib"}]
|
||||||
|
|
||||||
|
def get_entry_bibtex(self, citation_key: str):
|
||||||
|
return "@article{baum1974generalized, title={On two types of deviation}}"
|
||||||
|
|
||||||
|
|
||||||
|
class _ApiWithPartialSupport:
|
||||||
|
def __init__(self):
|
||||||
|
self.store = _StoreWithoutConflicts()
|
||||||
|
|
||||||
|
def show_entry(self, citation_key: str, **kwargs):
|
||||||
|
raise AttributeError("get_conflicts missing in underlying store")
|
||||||
|
|
||||||
|
def verify_bibtex(self, bibtex_text: str, *, context: str = "", limit: int = 5):
|
||||||
|
raise RuntimeError("pybtex unavailable")
|
||||||
|
|
||||||
|
def verify_strings(self, values: list[str], *, context: str = "", limit: int = 5):
|
||||||
|
return {"context": context, "results": [{"values": values, "limit": limit}]}
|
||||||
|
|
||||||
|
|
||||||
|
def test_safe_show_entry_falls_back_when_citegeist_show_entry_is_incompatible() -> None:
|
||||||
|
api = _ApiWithPartialSupport()
|
||||||
|
|
||||||
|
payload = _safe_show_entry(api, "baum1974generalized")
|
||||||
|
|
||||||
|
assert payload is not None
|
||||||
|
assert payload["citation_key"] == "baum1974generalized"
|
||||||
|
assert payload["conflicts"] == []
|
||||||
|
assert payload["provenance"][0]["source_label"] == "refs.bib"
|
||||||
|
assert "bibtex" in payload
|
||||||
|
|
||||||
|
|
||||||
|
def test_safe_verify_entry_falls_back_to_verify_strings() -> None:
|
||||||
|
api = _ApiWithPartialSupport()
|
||||||
|
entry = SimpleNamespace(
|
||||||
|
citation_key="baum1974generalized",
|
||||||
|
title="On two types of deviation",
|
||||||
|
author="W. M. Baum",
|
||||||
|
year="1974",
|
||||||
|
raw_bibtex="@article{baum1974generalized, title={On two types of deviation}}",
|
||||||
|
)
|
||||||
|
|
||||||
|
payload = _safe_verify_entry(api, entry, context="pieces/intro.tex Intro", limit=5)
|
||||||
|
|
||||||
|
assert payload["results"]
|
||||||
|
assert payload["results"][0]["values"][0] == "baum1974generalized"
|
||||||
|
|
@ -0,0 +1,86 @@
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import json
|
||||||
|
from pathlib import Path
|
||||||
|
|
||||||
|
from groundrecall.ingest import run_groundrecall_import
|
||||||
|
from groundrecall.review_workspace import GroundRecallReviewWorkspace
|
||||||
|
|
||||||
|
|
||||||
|
def _build_citation_fixture(root: Path) -> Path:
|
||||||
|
(root / "wiki").mkdir(parents=True)
|
||||||
|
(root / "wiki" / "learning-theory.md").write_text(
|
||||||
|
"# Learning Theory\n\n"
|
||||||
|
"Matching-law style regularities can be compared with machine learning optimization.\n\n"
|
||||||
|
"See \\\\cite{herrnstein1961matching} for the classic framing.\n",
|
||||||
|
encoding="utf-8",
|
||||||
|
)
|
||||||
|
return root
|
||||||
|
|
||||||
|
|
||||||
|
def test_review_workspace_populates_and_persists_citation_reviews(tmp_path: Path) -> None:
|
||||||
|
source_root = _build_citation_fixture(tmp_path / "llmwiki")
|
||||||
|
import_result = run_groundrecall_import(source_root, out_root=tmp_path / "imports", mode="quick", import_id="review-fixture")
|
||||||
|
|
||||||
|
workspace = GroundRecallReviewWorkspace(import_result.out_dir)
|
||||||
|
payload = workspace.load_review_data()
|
||||||
|
assert payload["citation_reviews"]
|
||||||
|
citation_review_id = payload["citation_reviews"][0]["citation_review_id"]
|
||||||
|
|
||||||
|
workspace.apply_updates(
|
||||||
|
concept_updates=[
|
||||||
|
{
|
||||||
|
"concept_id": "learning-theory",
|
||||||
|
"status": "trusted",
|
||||||
|
"notes": ["Strong framing concept.", "Citation support looks plausible."],
|
||||||
|
}
|
||||||
|
],
|
||||||
|
citation_updates=[
|
||||||
|
{
|
||||||
|
"citation_review_id": citation_review_id,
|
||||||
|
"status": "verified",
|
||||||
|
"notes": ["Classic matching-law citation."],
|
||||||
|
}
|
||||||
|
],
|
||||||
|
reviewer="Unit Test Reviewer",
|
||||||
|
)
|
||||||
|
|
||||||
|
session = json.loads((import_result.out_dir / "review_session.json").read_text(encoding="utf-8"))
|
||||||
|
concept = next(item for item in session["draft_pack"]["concepts"] if item["concept_id"] == "learning-theory")
|
||||||
|
citation = next(item for item in session["citation_reviews"] if item["citation_review_id"] == citation_review_id)
|
||||||
|
|
||||||
|
assert session["reviewer"] == "Unit Test Reviewer"
|
||||||
|
assert concept["status"] == "trusted"
|
||||||
|
assert citation["status"] == "verified"
|
||||||
|
|
||||||
|
review_data = json.loads((import_result.out_dir / "review_data.json").read_text(encoding="utf-8"))
|
||||||
|
assert any(item["citation_review_id"] == citation_review_id for item in review_data["citation_reviews"])
|
||||||
|
|
||||||
|
|
||||||
|
def test_review_workspace_resolves_citation_metadata_from_bibtex(tmp_path: Path) -> None:
|
||||||
|
root = tmp_path / "llmwiki"
|
||||||
|
(root / "wiki").mkdir(parents=True)
|
||||||
|
(root / "wiki" / "matching.md").write_text(
|
||||||
|
"# Matching\n\n"
|
||||||
|
"The manuscript cites \\\\cite{baum1974generalized} here.\n",
|
||||||
|
encoding="utf-8",
|
||||||
|
)
|
||||||
|
(root / "refs.bib").write_text(
|
||||||
|
"@article{baum1974generalized,\n"
|
||||||
|
" author = {W. M. Baum},\n"
|
||||||
|
" title = {On two types of deviation from the matching law: Bias and undermatching},\n"
|
||||||
|
" journal = {Journal of the Experimental Analysis of Behavior},\n"
|
||||||
|
" year = {1974}\n"
|
||||||
|
"}\n",
|
||||||
|
encoding="utf-8",
|
||||||
|
)
|
||||||
|
|
||||||
|
import_result = run_groundrecall_import(root, out_root=tmp_path / "imports", mode="quick", import_id="bib-fixture")
|
||||||
|
workspace = GroundRecallReviewWorkspace(import_result.out_dir)
|
||||||
|
payload = workspace.load_review_data()
|
||||||
|
|
||||||
|
entry = next(item for item in payload["citation_reviews"] if item["citation_key"] == "baum1974generalized")
|
||||||
|
assert entry["title"] == "On two types of deviation from the matching law: Bias and undermatching"
|
||||||
|
assert entry["source_bib_path"] == "refs.bib"
|
||||||
|
assert entry["raw_bibtex"]
|
||||||
|
assert payload["bibliography"]["entry_count"] >= 1
|
||||||
|
|
@ -0,0 +1,235 @@
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
from pathlib import Path
|
||||||
|
|
||||||
|
import groundrecall.ingest as ingest_module
|
||||||
|
import groundrecall.source_adapters # noqa: F401
|
||||||
|
from groundrecall.source_adapters.base import detect_source_adapter, list_source_adapters
|
||||||
|
from groundrecall.ingest import run_groundrecall_import
|
||||||
|
|
||||||
|
|
||||||
|
def test_groundrecall_source_adapter_registry_lists_expected_adapters() -> None:
|
||||||
|
names = set(list_source_adapters())
|
||||||
|
assert "llmwiki" in names
|
||||||
|
assert "polypaper" in names
|
||||||
|
assert "markdown_notes" in names
|
||||||
|
assert "transcript" in names
|
||||||
|
assert "didactopus_pack" in names
|
||||||
|
assert "doclift_bundle" in names
|
||||||
|
|
||||||
|
|
||||||
|
def test_detect_llmwiki_adapter(tmp_path: Path) -> None:
|
||||||
|
(tmp_path / "wiki").mkdir()
|
||||||
|
adapter = detect_source_adapter(tmp_path)
|
||||||
|
assert adapter.name == "llmwiki"
|
||||||
|
assert adapter.import_intent() == "grounded_knowledge"
|
||||||
|
|
||||||
|
|
||||||
|
def test_detect_didactopus_pack_adapter(tmp_path: Path) -> None:
|
||||||
|
(tmp_path / "pack.yaml").write_text("name: p\n", encoding="utf-8")
|
||||||
|
(tmp_path / "concepts.yaml").write_text("concepts: []\n", encoding="utf-8")
|
||||||
|
adapter = detect_source_adapter(tmp_path)
|
||||||
|
assert adapter.name == "didactopus_pack"
|
||||||
|
assert adapter.import_intent() == "both"
|
||||||
|
|
||||||
|
|
||||||
|
def test_detect_doclift_bundle_adapter(tmp_path: Path) -> None:
|
||||||
|
(tmp_path / "documents").mkdir()
|
||||||
|
(tmp_path / "manifest.json").write_text('{"documents": []}\n', encoding="utf-8")
|
||||||
|
adapter = detect_source_adapter(tmp_path)
|
||||||
|
assert adapter.name == "doclift_bundle"
|
||||||
|
assert adapter.import_intent() == "both"
|
||||||
|
|
||||||
|
|
||||||
|
def test_groundrecall_import_records_adapter_and_intent(tmp_path: Path) -> None:
|
||||||
|
(tmp_path / "wiki").mkdir()
|
||||||
|
(tmp_path / "wiki" / "note.md").write_text("# Title\n\n- A note.\n", encoding="utf-8")
|
||||||
|
result = run_groundrecall_import(tmp_path, mode="quick", import_id="adapter-test")
|
||||||
|
assert result.manifest["source_adapter"] == "llmwiki"
|
||||||
|
assert result.manifest["import_intent"] == "grounded_knowledge"
|
||||||
|
|
||||||
|
|
||||||
|
def test_markdown_notes_adapter_ingests_tex_files(tmp_path: Path) -> None:
|
||||||
|
(tmp_path / "draft.tex").write_text(
|
||||||
|
"\\section{Related Work}\n\n"
|
||||||
|
"We connect behaviorism and language models.\n",
|
||||||
|
encoding="utf-8",
|
||||||
|
)
|
||||||
|
|
||||||
|
adapter = detect_source_adapter(tmp_path)
|
||||||
|
assert adapter.name == "markdown_notes"
|
||||||
|
|
||||||
|
result = run_groundrecall_import(tmp_path, mode="quick", import_id="tex-test")
|
||||||
|
assert result.manifest["source_adapter"] == "markdown_notes"
|
||||||
|
assert result.manifest["artifact_count"] == 1
|
||||||
|
assert result.artifacts[0]["path"] == "draft.tex"
|
||||||
|
assert result.claims
|
||||||
|
|
||||||
|
|
||||||
|
def test_tex_import_uses_pandoc_markdown_when_available(tmp_path: Path, monkeypatch) -> None:
|
||||||
|
(tmp_path / "draft.tex").write_text(
|
||||||
|
"\\section{Ignored by fallback}\n"
|
||||||
|
"\\usepackage{amsmath}\n",
|
||||||
|
encoding="utf-8",
|
||||||
|
)
|
||||||
|
|
||||||
|
monkeypatch.setattr(
|
||||||
|
ingest_module,
|
||||||
|
"_convert_tex_to_markdown",
|
||||||
|
lambda path: "# Converted Draft\n\n- Converted claim from pandoc.\n",
|
||||||
|
)
|
||||||
|
|
||||||
|
result = run_groundrecall_import(tmp_path, mode="quick", import_id="tex-pandoc-test")
|
||||||
|
claim_texts = [item["claim_text"] for item in result.claims]
|
||||||
|
concept_ids = {item["concept_id"] for item in result.concepts}
|
||||||
|
|
||||||
|
assert "Converted claim from pandoc." in claim_texts
|
||||||
|
assert "concept::converted-draft" in concept_ids
|
||||||
|
|
||||||
|
|
||||||
|
def test_detect_polypaper_adapter_and_exclude_support_files(tmp_path: Path) -> None:
|
||||||
|
(tmp_path / "pieces").mkdir()
|
||||||
|
(tmp_path / "figs").mkdir()
|
||||||
|
(tmp_path / "setup").mkdir()
|
||||||
|
(tmp_path / "main.tex").write_text(
|
||||||
|
"\\include{pieces/discussion}\n"
|
||||||
|
"\\include{pieces/table-results}\n"
|
||||||
|
"\\input{figs/figure-system}\n",
|
||||||
|
encoding="utf-8",
|
||||||
|
)
|
||||||
|
(tmp_path / "paper.org").write_text("* draft\n", encoding="utf-8")
|
||||||
|
(tmp_path / "pieces" / "discussion.tex").write_text("\\section{Discussion}\n\nMore text.\n", encoding="utf-8")
|
||||||
|
(tmp_path / "pieces" / "table-results.tex").write_text("\\begin{tabular}x\\end{tabular}\n", encoding="utf-8")
|
||||||
|
(tmp_path / "pieces" / "unused.tex").write_text("\\section{Unused}\n\nIgnore me.\n", encoding="utf-8")
|
||||||
|
(tmp_path / "figs" / "figure-system.tex").write_text("\\begin{figure}x\\end{figure}\n", encoding="utf-8")
|
||||||
|
(tmp_path / "setup" / "venue-arxiv.tex").write_text("\\section{Setup}\n", encoding="utf-8")
|
||||||
|
(tmp_path / ".pp-export-tmp.tex").write_text("\\section{Tmp}\n", encoding="utf-8")
|
||||||
|
|
||||||
|
adapter = detect_source_adapter(tmp_path)
|
||||||
|
assert adapter.name == "polypaper"
|
||||||
|
|
||||||
|
result = run_groundrecall_import(tmp_path, mode="quick", import_id="polypaper-test")
|
||||||
|
paths = {item["path"] for item in result.artifacts}
|
||||||
|
assert "main.tex" not in paths
|
||||||
|
assert "pieces/discussion.tex" in paths
|
||||||
|
assert "pieces/table-results.tex" not in paths
|
||||||
|
assert "figs/figure-system.tex" not in paths
|
||||||
|
assert "pieces/unused.tex" not in paths
|
||||||
|
assert "setup/venue-arxiv.tex" not in paths
|
||||||
|
assert ".pp-export-tmp.tex" not in paths
|
||||||
|
|
||||||
|
|
||||||
|
def test_tex_import_skips_table_and_figure_markup_from_pandoc(tmp_path: Path, monkeypatch) -> None:
|
||||||
|
(tmp_path / "draft.tex").write_text("\\section{Draft}\n", encoding="utf-8")
|
||||||
|
|
||||||
|
monkeypatch.setattr(
|
||||||
|
ingest_module,
|
||||||
|
"_convert_tex_to_markdown",
|
||||||
|
lambda path: "\n".join(
|
||||||
|
[
|
||||||
|
"# Draft",
|
||||||
|
"",
|
||||||
|
"",
|
||||||
|
"| Col A | Col B |",
|
||||||
|
"| --- | --- |",
|
||||||
|
"| 1 | 2 |",
|
||||||
|
"</div>",
|
||||||
|
"\\begin{tabular}{ll}",
|
||||||
|
"- Real manuscript claim.",
|
||||||
|
]
|
||||||
|
),
|
||||||
|
)
|
||||||
|
|
||||||
|
result = run_groundrecall_import(tmp_path, mode="quick", import_id="tex-cleanup-test")
|
||||||
|
claim_texts = [item["claim_text"] for item in result.claims]
|
||||||
|
|
||||||
|
assert claim_texts == ["Real manuscript claim."]
|
||||||
|
|
||||||
|
|
||||||
|
def test_didactopus_pack_import_generates_structured_concepts_and_relations(tmp_path: Path) -> None:
|
||||||
|
(tmp_path / "pack.yaml").write_text(
|
||||||
|
"\n".join(
|
||||||
|
[
|
||||||
|
"name: sample-pack",
|
||||||
|
"display_name: Sample Pack",
|
||||||
|
"version: 0.1.0",
|
||||||
|
"schema_version: 0.1.0",
|
||||||
|
"didactopus_min_version: 0.1.0",
|
||||||
|
"didactopus_max_version: 9.9.9",
|
||||||
|
]
|
||||||
|
),
|
||||||
|
encoding="utf-8",
|
||||||
|
)
|
||||||
|
(tmp_path / "concepts.yaml").write_text(
|
||||||
|
"\n".join(
|
||||||
|
[
|
||||||
|
"concepts:",
|
||||||
|
" - id: basics",
|
||||||
|
" title: Basics",
|
||||||
|
" description: Foundational concept.",
|
||||||
|
" mastery_signals: [Explain the foundation.]",
|
||||||
|
" - id: advanced",
|
||||||
|
" title: Advanced",
|
||||||
|
" description: Builds on basics.",
|
||||||
|
" prerequisites: [basics]",
|
||||||
|
]
|
||||||
|
),
|
||||||
|
encoding="utf-8",
|
||||||
|
)
|
||||||
|
(tmp_path / "roadmap.yaml").write_text(
|
||||||
|
"\n".join(
|
||||||
|
[
|
||||||
|
"stages:",
|
||||||
|
" - id: stage1",
|
||||||
|
" title: Stage One",
|
||||||
|
" concepts: [basics, advanced]",
|
||||||
|
]
|
||||||
|
),
|
||||||
|
encoding="utf-8",
|
||||||
|
)
|
||||||
|
|
||||||
|
result = run_groundrecall_import(tmp_path, mode="quick", import_id="pack-test")
|
||||||
|
assert result.manifest["source_adapter"] == "didactopus_pack"
|
||||||
|
assert result.manifest["import_intent"] == "both"
|
||||||
|
concept_ids = {item["concept_id"] for item in result.concepts}
|
||||||
|
assert "concept::basics" in concept_ids
|
||||||
|
assert "concept::advanced" in concept_ids
|
||||||
|
relation_targets = {(item["source_id"], item["target_id"], item["relation_type"]) for item in result.relations}
|
||||||
|
assert ("concept::basics", "concept::advanced", "prerequisite") in relation_targets
|
||||||
|
claim_ids = {item["claim_id"] for item in result.claims}
|
||||||
|
assert "clm_pack_basics" in claim_ids
|
||||||
|
assert "clm_stage_stage1_basics" in claim_ids
|
||||||
|
|
||||||
|
|
||||||
|
def test_doclift_bundle_import_generates_structured_concepts(tmp_path: Path) -> None:
|
||||||
|
doc_dir = tmp_path / "documents" / "lesson-a"
|
||||||
|
doc_dir.mkdir(parents=True)
|
||||||
|
(tmp_path / "manifest.json").write_text(
|
||||||
|
'\n'.join(
|
||||||
|
[
|
||||||
|
"{",
|
||||||
|
' "documents": [',
|
||||||
|
" {",
|
||||||
|
' "document_id": "lesson-a",',
|
||||||
|
' "title": "Lecture 1. Example",',
|
||||||
|
' "document_kind": "lecture",',
|
||||||
|
f' "output_dir": "{doc_dir}",',
|
||||||
|
f' "markdown_path": "{doc_dir / "document.md"}",',
|
||||||
|
f' "figures_path": "{doc_dir / "document.figures.json"}"',
|
||||||
|
" }",
|
||||||
|
" ]",
|
||||||
|
"}",
|
||||||
|
]
|
||||||
|
),
|
||||||
|
encoding="utf-8",
|
||||||
|
)
|
||||||
|
(doc_dir / "document.md").write_text("# Lecture 1. Example\n\nBody.\n", encoding="utf-8")
|
||||||
|
(doc_dir / "document.figures.json").write_text('{"source_path": "/tmp/source.doc"}\n', encoding="utf-8")
|
||||||
|
|
||||||
|
result = run_groundrecall_import(tmp_path, mode="quick", import_id="doclift-test")
|
||||||
|
assert result.manifest["source_adapter"] == "doclift_bundle"
|
||||||
|
assert result.manifest["import_intent"] == "both"
|
||||||
|
concept_ids = {item["concept_id"] for item in result.concepts}
|
||||||
|
assert "concept::lesson-a" in concept_ids
|
||||||
|
claim_ids = {item["claim_id"] for item in result.claims}
|
||||||
|
assert "clm_doclift_1" in claim_ids
|
||||||
|
|
@ -0,0 +1,148 @@
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import json
|
||||||
|
from pathlib import Path
|
||||||
|
|
||||||
|
from groundrecall.models import (
|
||||||
|
ClaimRecord,
|
||||||
|
ConceptRecord,
|
||||||
|
GroundRecallSnapshot,
|
||||||
|
PromotionRecord,
|
||||||
|
ProvenanceRecord,
|
||||||
|
RelationRecord,
|
||||||
|
ReviewCandidateRecord,
|
||||||
|
SourceRecord,
|
||||||
|
)
|
||||||
|
from groundrecall.store import GroundRecallStore
|
||||||
|
|
||||||
|
|
||||||
|
def test_groundrecall_store_round_trips_canonical_objects(tmp_path: Path) -> None:
|
||||||
|
store = GroundRecallStore(tmp_path / "groundrecall")
|
||||||
|
|
||||||
|
source = store.save_source(
|
||||||
|
SourceRecord(
|
||||||
|
source_id="src_001",
|
||||||
|
title="Channel Notes",
|
||||||
|
source_type="markdown",
|
||||||
|
path="wiki/channel-capacity.md",
|
||||||
|
current_status="promoted",
|
||||||
|
)
|
||||||
|
)
|
||||||
|
claim = store.save_claim(
|
||||||
|
ClaimRecord(
|
||||||
|
claim_id="clm_001",
|
||||||
|
claim_text="Channel capacity bounds reliable communication rate.",
|
||||||
|
claim_kind="definition",
|
||||||
|
concept_ids=["concept::channel-capacity"],
|
||||||
|
confidence_hint=0.72,
|
||||||
|
provenance=ProvenanceRecord(
|
||||||
|
origin_artifact_id="ia_001",
|
||||||
|
origin_path="wiki/channel-capacity.md",
|
||||||
|
support_kind="derived_from_page",
|
||||||
|
grounding_status="partially_grounded",
|
||||||
|
),
|
||||||
|
current_status="reviewed",
|
||||||
|
)
|
||||||
|
)
|
||||||
|
concept = store.save_concept(
|
||||||
|
ConceptRecord(
|
||||||
|
concept_id="concept::channel-capacity",
|
||||||
|
title="Channel Capacity",
|
||||||
|
description="Imported concept.",
|
||||||
|
current_status="promoted",
|
||||||
|
)
|
||||||
|
)
|
||||||
|
relation = store.save_relation(
|
||||||
|
RelationRecord(
|
||||||
|
relation_id="rel_001",
|
||||||
|
source_id="concept::channel-capacity",
|
||||||
|
target_id="concept::shannon-entropy",
|
||||||
|
relation_type="references",
|
||||||
|
current_status="draft",
|
||||||
|
)
|
||||||
|
)
|
||||||
|
review_candidate = store.save_review_candidate(
|
||||||
|
ReviewCandidateRecord(
|
||||||
|
review_candidate_id="rc_001",
|
||||||
|
candidate_type="claim",
|
||||||
|
candidate_id="clm_001",
|
||||||
|
triage_lane="knowledge_capture",
|
||||||
|
priority=10,
|
||||||
|
current_status="triaged",
|
||||||
|
)
|
||||||
|
)
|
||||||
|
promotion = store.save_promotion(
|
||||||
|
PromotionRecord(
|
||||||
|
promotion_id="pr_001",
|
||||||
|
candidate_type="claim",
|
||||||
|
candidate_id="clm_001",
|
||||||
|
reviewer="R",
|
||||||
|
promoted_object_ids=["clm_001"],
|
||||||
|
promoted_at="2026-04-17T12:00:00Z",
|
||||||
|
)
|
||||||
|
)
|
||||||
|
|
||||||
|
assert store.get_source(source.source_id) is not None
|
||||||
|
assert store.get_claim(claim.claim_id) is not None
|
||||||
|
assert store.get_concept(concept.concept_id) is not None
|
||||||
|
assert store.get_relation(relation.relation_id) is not None
|
||||||
|
assert store.get_review_candidate(review_candidate.review_candidate_id) is not None
|
||||||
|
assert store.get_promotion(promotion.promotion_id) is not None
|
||||||
|
|
||||||
|
|
||||||
|
def test_groundrecall_store_builds_and_persists_snapshot(tmp_path: Path) -> None:
|
||||||
|
store = GroundRecallStore(tmp_path / "groundrecall")
|
||||||
|
store.save_source(SourceRecord(source_id="src_001", title="T", current_status="promoted"))
|
||||||
|
store.save_claim(
|
||||||
|
ClaimRecord(
|
||||||
|
claim_id="clm_001",
|
||||||
|
claim_text="A grounded claim.",
|
||||||
|
concept_ids=["concept::c1"],
|
||||||
|
current_status="promoted",
|
||||||
|
)
|
||||||
|
)
|
||||||
|
store.save_concept(ConceptRecord(concept_id="concept::c1", title="C1", current_status="promoted"))
|
||||||
|
|
||||||
|
snapshot = store.build_snapshot(
|
||||||
|
snapshot_id="snap_001",
|
||||||
|
created_at="2026-04-17T12:00:00Z",
|
||||||
|
metadata={"export_kind": "canonical"},
|
||||||
|
)
|
||||||
|
saved = store.save_snapshot(snapshot)
|
||||||
|
|
||||||
|
loaded = store.get_snapshot(saved.snapshot_id)
|
||||||
|
assert loaded is not None
|
||||||
|
assert isinstance(loaded, GroundRecallSnapshot)
|
||||||
|
assert loaded.metadata["export_kind"] == "canonical"
|
||||||
|
assert len(loaded.sources) == 1
|
||||||
|
assert len(loaded.claims) == 1
|
||||||
|
assert len(loaded.concepts) == 1
|
||||||
|
|
||||||
|
|
||||||
|
def test_groundrecall_models_remain_assistant_neutral() -> None:
|
||||||
|
claim_fields = set(ClaimRecord.model_fields)
|
||||||
|
concept_fields = set(ConceptRecord.model_fields)
|
||||||
|
snapshot_fields = set(GroundRecallSnapshot.model_fields)
|
||||||
|
forbidden = {"assistant", "assistant_name", "codex", "claude", "skill_bundle", "prompt_text"}
|
||||||
|
|
||||||
|
assert claim_fields.isdisjoint(forbidden)
|
||||||
|
assert concept_fields.isdisjoint(forbidden)
|
||||||
|
assert snapshot_fields.isdisjoint(forbidden)
|
||||||
|
|
||||||
|
|
||||||
|
def test_groundrecall_store_writes_json_atomically_without_tmp_artifacts(tmp_path: Path) -> None:
|
||||||
|
store = GroundRecallStore(tmp_path / "groundrecall")
|
||||||
|
|
||||||
|
claim = store.save_claim(
|
||||||
|
ClaimRecord(
|
||||||
|
claim_id="clm_atomic",
|
||||||
|
claim_text="Atomic writes should leave valid JSON on disk.",
|
||||||
|
concept_ids=["concept::atomicity"],
|
||||||
|
current_status="reviewed",
|
||||||
|
)
|
||||||
|
)
|
||||||
|
|
||||||
|
claim_path = store.claims_dir / f"{claim.claim_id}.json"
|
||||||
|
payload = json.loads(claim_path.read_text(encoding="utf-8"))
|
||||||
|
assert payload["claim_id"] == "clm_atomic"
|
||||||
|
assert list(store.claims_dir.glob("*.tmp")) == []
|
||||||
Loading…
Reference in New Issue