Enrich claim analysis and review support

This commit is contained in:
welsberr 2026-05-07 23:25:26 -04:00
parent a54082141a
commit 2f7696c115
10 changed files with 458 additions and 3 deletions

View File

@ -0,0 +1,172 @@
# Evidence-Docket Claims Analysis
GroundRecalls current import/review model is good at:
- preserving provenance
- turning observations into reviewable claims
- keeping concepts, claims, relations, and citations separate
It is still weak at a different task:
- structured adversarial or forensic analysis of an argument across multiple claim lanes
That gap became clearer in the local `evolutionnews.net` evidence-docket work.
Those dockets do not just collect claims. They classify how claims function in
an argument.
## Why this matters
In the Mason design-biology work, the useful analysis was not just:
- what claim appears in the text
- what source supports it
It also depended on:
- what role the claim plays in the overall argument
- whether the burden of proof is being shifted
- whether multiple domains are being bundled rhetorically
- whether citations merely exist or actually support the claim
- what empirical gap is being asserted versus what research program already exists
GroundRecall can already hold the raw ingredients for that kind of work, but it
does not yet model them explicitly.
## Evidence-docket structure worth borrowing
The local evidence-docket workflow has a few recurring sections that map well
onto richer GroundRecall review:
1. `claim map`
The operative argument structure, not just isolated statements.
2. `primary findings`
Higher-order judgments such as burden asymmetry, domain bundling, or model
overreach.
3. `evidence cards`
Focused support packets that connect one objection or claim to a bounded
source trail.
4. `rhetorical maneuvers flagged`
Moves such as burden shift, equivocation, present-function fallacy, or
overgeneralization from a narrow model.
5. `burden check`
What the author requires from the opposing view versus what their own view
must supply.
6. `citation and source audit`
Whether named sources are real, relevant, overextended, or contradicted by
the way they are being used.
7. `research program`
What empirical work would actually reduce the leverage of the objection.
## Implications for GroundRecall
GroundRecall should stay centered on grounded records, but claim analysis can be
enriched in a way that matches the docket workflow.
### 1. Expand claim kinds
Current `claim_kind` values are mostly low-level:
- `statement`
- `summary`
- adapter-specific kinds such as `mastery_signal`
Useful additions:
- `argument_step`
- `burden_check`
- `rhetorical_move`
- `citation_audit`
- `research_gap`
- `research_program`
- `counterexample`
These do not replace ordinary claims. They make higher-order analytical claims
first-class instead of burying them in reviewer notes.
### 2. Add argument-lane metadata
Claims should be able to carry lightweight analytical tags such as:
- `argument_role`: premise, inference, objection, counterargument, scope note
- `analysis_lane`: empirical, rhetorical, citation, burden, research_program
- `risk_flags`: overstatement, bundling, equivocation, unsupported_generalization
This can start as claim metadata without requiring a schema break.
### 3. Model evidence cards explicitly
An evidence card is more than one claim. It is a bounded support packet that
ties together:
- one focal issue
- one or more claims
- supporting observations
- cited sources
- reviewer verdict
GroundRecall does not need a new top-level store object immediately. A first
step could be review-export grouping by:
- lane
- concept
- citation cluster
## Bibliography and abstracts
The bibliography expansion work showed that abstracts are often the fastest way
to estimate:
- whether a source is in the right domain
- whether it actually addresses the asserted mechanism or phenomenon
- whether a citation is likely to support or overstate a claim
That suggests two concrete upgrades for GroundRecall review:
1. show more than citation-key existence
Review should expose whether resolved bibliography entries have abstracts,
DOI coverage, and enough metadata depth for meaningful support judgment.
2. use abstracts as first-pass support context
Abstract snippets should be available when a reviewer is deciding whether a
cited work materially supports a claim or merely sounds adjacent.
Important boundary:
- abstracts are triage evidence, not final adjudication
- direct source reading still matters for strong or controversial claims
## Recommended implementation order
1. Enrich bibliography summary and artifact citation summaries.
Surface abstract-bearing coverage, representative titles, DOI coverage, and
short abstract snippets in review payloads.
2. Add analytical claim metadata.
Start with optional metadata fields in claim rows and review exports.
3. Add review lanes mirroring the evidence-docket workflow.
Separate empirical support review from rhetorical and burden-check review.
4. Add evidence-card grouping in review UI/export.
Let reviewers inspect a bounded packet instead of isolated claim rows.
5. Add a bibliography-assisted claim-support pass.
Reuse CiteGeist support/verification capabilities so GroundRecall can move
from “citation exists” toward “citation probably supports this claim because…”
## Practical near-term change
The smallest worthwhile next step is:
- improve GroundRecall review payloads so bibliography strength is visible
- especially abstract-bearing resolved entries and representative titles
That does not solve richer claim analysis by itself, but it gives reviewers a
better support surface and aligns GroundRecall with the successful parts of the
evidence-docket workflow.

View File

@ -122,10 +122,30 @@ def materialize_citegeist_store(import_dir: str | Path, source_root: str | Path)
def bibliography_summary_payload(source_root: str | Path) -> dict[str, Any]: def bibliography_summary_payload(source_root: str | Path) -> dict[str, Any]:
index = load_bibliography_index(source_root) index = load_bibliography_index(source_root)
source_files = discover_bib_files(source_root) source_files = discover_bib_files(source_root)
abstract_entry_count = 0
doi_entry_count = 0
years: list[int] = []
representative_titles: list[str] = []
for payload in index.values():
fields = payload.get("fields", {})
if str(fields.get("abstract", "")).strip():
abstract_entry_count += 1
if str(fields.get("doi", "")).strip():
doi_entry_count += 1
year_text = str(fields.get("year", "")).strip()
if year_text.isdigit():
years.append(int(year_text))
title = str(fields.get("title", "")).strip()
if title and len(representative_titles) < 5:
representative_titles.append(title)
return { return {
"enabled": bool(index), "enabled": bool(index),
"entry_count": len(index), "entry_count": len(index),
"source_files": [str(path.relative_to(Path(source_root))) for path in source_files], "source_files": [str(path.relative_to(Path(source_root))) for path in source_files],
"abstract_entry_count": abstract_entry_count,
"doi_entry_count": doi_entry_count,
"year_range": [min(years), max(years)] if years else [],
"representative_titles": representative_titles,
} }
@ -151,6 +171,80 @@ def serialize_citegeist_entry_payload(payload: dict[str, Any] | None) -> dict[st
return json.loads(json.dumps(result)) return json.loads(json.dumps(result))
_SUPPORT_TOKEN_RE = re.compile(r"[a-z0-9]{4,}")
def build_local_claim_support_suggestions(
bibliography_index: dict[str, dict[str, Any]],
claim_text: str,
*,
context: str = "",
limit: int = 3,
exclude_keys: set[str] | None = None,
) -> list[dict[str, Any]]:
claim_tokens = _support_tokens(claim_text)
context_tokens = _support_tokens(context)
combined_tokens = claim_tokens | context_tokens
exclude = {item for item in (exclude_keys or set()) if item}
scored: list[tuple[float, dict[str, Any]]] = []
for citation_key, payload in bibliography_index.items():
if citation_key in exclude:
continue
fields = payload.get("fields", {})
title = str(fields.get("title", "")).strip()
abstract = str(fields.get("abstract", "")).strip()
venue = str(fields.get("journal", "") or fields.get("booktitle", "") or fields.get("publisher", "")).strip()
doi = str(fields.get("doi", "")).strip()
haystack_tokens = _support_tokens(" ".join(part for part in (title, abstract, venue) if part))
if not haystack_tokens:
continue
overlap = combined_tokens & haystack_tokens
if not overlap:
continue
title_tokens = _support_tokens(title)
abstract_tokens = _support_tokens(abstract)
title_overlap = claim_tokens & title_tokens
abstract_overlap = claim_tokens & abstract_tokens
score = (len(title_overlap) * 2.0) + len(abstract_overlap) + (0.5 if abstract else 0.0) + (0.25 if doi else 0.0)
scored.append(
(
score,
{
"citation_key": citation_key,
"title": title,
"year": str(fields.get("year", "")).strip(),
"authors": str(fields.get("author", "")).strip(),
"venue": venue,
"doi": doi,
"score": round(score, 3),
"reason": _support_reason(title_overlap, abstract_overlap, abstract=bool(abstract), context_overlap=bool(context_tokens & haystack_tokens)),
"abstract_snippet": abstract.replace("\n", " ")[:280] if abstract else "",
},
)
)
scored.sort(key=lambda item: (-item[0], item[1]["year"], item[1]["title"]))
return [item[1] for item in scored[: max(0, limit)]]
def _support_tokens(text: str) -> set[str]:
return {match.group(0) for match in _SUPPORT_TOKEN_RE.finditer(text.lower())}
def _support_reason(title_overlap: set[str], abstract_overlap: set[str], *, abstract: bool, context_overlap: bool) -> str:
reasons: list[str] = []
if title_overlap:
reasons.append("title overlap")
if abstract_overlap:
reasons.append("abstract overlap")
if context_overlap:
reasons.append("context overlap")
if abstract and not abstract_overlap:
reasons.append("abstract available")
return ", ".join(reasons) if reasons else "local bibliography match"
def _parse_bib_entries(text: str, *, symbols: dict[str, Any] | None) -> list[Any]: def _parse_bib_entries(text: str, *, symbols: dict[str, Any] | None) -> list[Any]:
if symbols is not None: if symbols is not None:
try: try:

View File

@ -101,11 +101,25 @@ def build_claim_record(
index: int, index: int,
fragment_ids: list[str] | None = None, fragment_ids: list[str] | None = None,
) -> dict[str, Any]: ) -> dict[str, Any]:
claim_kind = "statement" if observation_record["role"] == "claim" else "summary"
argument_role = "premise" if claim_kind == "statement" else "context"
risk_flags: list[str] = []
if observation.contradict_keys:
argument_role = "counterargument"
risk_flags.append("contradiction_linked")
if observation.supersede_keys:
argument_role = "revision"
risk_flags.append("supersession_linked")
return { return {
"claim_id": _claim_id_for_observation(observation_record, observation, index), "claim_id": _claim_id_for_observation(observation_record, observation, index),
"import_id": context.import_id, "import_id": context.import_id,
"claim_text": observation_record["text"], "claim_text": observation_record["text"],
"claim_kind": "statement" if observation_record["role"] == "claim" else "summary", "claim_kind": claim_kind,
"metadata": {
"analysis_lane": "empirical",
"argument_role": argument_role,
"risk_flags": risk_flags,
},
"source_observation_ids": [observation_record["observation_id"]], "source_observation_ids": [observation_record["observation_id"]],
"supporting_fragment_ids": list(fragment_ids or []), "supporting_fragment_ids": list(fragment_ids or []),
"concept_ids": [f"concept::{concept_id}" for concept_id in concept_ids], "concept_ids": [f"concept::{concept_id}" for concept_id in concept_ids],

View File

@ -69,6 +69,7 @@ class ClaimRecord(BaseModel):
claim_id: str claim_id: str
claim_text: str claim_text: str
claim_kind: str = "statement" claim_kind: str = "statement"
metadata: dict = Field(default_factory=dict)
source_observation_ids: list[str] = Field(default_factory=list) source_observation_ids: list[str] = Field(default_factory=list)
supporting_fragment_ids: list[str] = Field(default_factory=list) supporting_fragment_ids: list[str] = Field(default_factory=list)
concept_ids: list[str] = Field(default_factory=list) concept_ids: list[str] = Field(default_factory=list)

View File

@ -209,6 +209,7 @@ def promote_import_to_store(
claim_id=claim["claim_id"], claim_id=claim["claim_id"],
claim_text=claim.get("claim_text", ""), claim_text=claim.get("claim_text", ""),
claim_kind=claim.get("claim_kind", "statement"), claim_kind=claim.get("claim_kind", "statement"),
metadata=dict(claim.get("metadata", {})),
source_observation_ids=list(claim.get("source_observation_ids", [])), source_observation_ids=list(claim.get("source_observation_ids", [])),
supporting_fragment_ids=list(claim.get("supporting_fragment_ids", [])), supporting_fragment_ids=list(claim.get("supporting_fragment_ids", [])),
concept_ids=concept_ids, concept_ids=concept_ids,

View File

@ -126,12 +126,21 @@ function renderConceptPanel(concept) {
const review = concept.review || {}; const review = concept.review || {};
const statusSpec = (state.reviewData.field_specs || []).find((item) => item.field === "status"); const statusSpec = (state.reviewData.field_specs || []).find((item) => item.field === "status");
const guidance = (state.reviewData.review_guidance?.priorities || []).map((item) => `<li>${escapeHtml(item)}</li>`).join(""); const guidance = (state.reviewData.review_guidance?.priorities || []).map((item) => `<li>${escapeHtml(item)}</li>`).join("");
const laneGuidance = (state.reviewData.review_guidance?.analysis_lanes || []).map((item) => `<li>${escapeHtml(item)}</li>`).join("");
const laneSummary = Object.entries(review.analysis_lanes || {}).map(([lane, count]) => `
<span class="chip">${escapeHtml(lane)} · ${escapeHtml(count)}</span>
`).join("");
const claims = (review.top_claims || []).map((claim) => ` const claims = (review.top_claims || []).map((claim) => `
<article class="claim-card"> <article class="claim-card">
<div class="claim-head"> <div class="claim-head">
<strong>${escapeHtml(claim.claim_kind || "claim")}</strong> <strong>${escapeHtml(claim.claim_kind || "claim")}</strong>
<span class="chip">${escapeHtml(claim.grounding_status || "unknown")}</span> <span class="chip">${escapeHtml(claim.grounding_status || "unknown")}</span>
</div> </div>
<div class="meta-row">
<span class="chip">${escapeHtml(claim.analysis_lane || "empirical")}</span>
<span class="chip">${escapeHtml(claim.argument_role || "premise")}</span>
${(claim.risk_flags || []).map((flag) => `<span class="chip chip-warn">${escapeHtml(flag)}</span>`).join("")}
</div>
<p>${escapeHtml(claim.claim_text || "")}</p> <p>${escapeHtml(claim.claim_text || "")}</p>
<div class="tiny">Artifacts: ${escapeHtml((claim.artifact_paths || []).join(", ") || "none")}</div> <div class="tiny">Artifacts: ${escapeHtml((claim.artifact_paths || []).join(", ") || "none")}</div>
${(claim.supporting_observations || []).slice(0, 2).map((obs) => ` ${(claim.supporting_observations || []).slice(0, 2).map((obs) => `
@ -140,6 +149,19 @@ function renderConceptPanel(concept) {
<div>${escapeHtml(obs.text || "")}</div> <div>${escapeHtml(obs.text || "")}</div>
</div> </div>
`).join("")} `).join("")}
${(claim.support_suggestions || []).length ? `
<div class="suggestion-block">
<div class="tiny"><strong>Local support suggestions</strong></div>
${(claim.support_suggestions || []).map((item) => `
<div class="support-block">
<div><strong>${escapeHtml(item.title || item.citation_key || "candidate source")}</strong></div>
<div class="tiny">${escapeHtml(item.citation_key || "")}${item.year ? ` · ${escapeHtml(item.year)}` : ""}${item.venue ? ` · ${escapeHtml(item.venue)}` : ""}</div>
<div class="tiny">${escapeHtml(item.reason || "")}${item.score !== undefined ? ` · score ${escapeHtml(item.score)}` : ""}</div>
${item.abstract_snippet ? `<div>${escapeHtml(item.abstract_snippet)}</div>` : ""}
</div>
`).join("")}
</div>
` : ""}
</article> </article>
`).join(""); `).join("");
@ -179,6 +201,11 @@ function renderConceptPanel(concept) {
<h3>Reviewer guidance</h3> <h3>Reviewer guidance</h3>
<ul>${guidance}</ul> <ul>${guidance}</ul>
</section> </section>
<section class="subpanel">
<h3>Analysis lanes</h3>
<div class="chip-row">${laneSummary || "<span class=\"muted\">No analytical lane summary available.</span>"}</div>
<ul>${laneGuidance}</ul>
</section>
<section class="subpanel"> <section class="subpanel">
<h3>Representative claims</h3> <h3>Representative claims</h3>
<div class="stack">${claims || "<div class=\"muted\">No representative claims available.</div>"}</div> <div class="stack">${claims || "<div class=\"muted\">No representative claims available.</div>"}</div>

View File

@ -194,6 +194,12 @@ textarea {
border: 1px solid var(--line); border: 1px solid var(--line);
} }
.suggestion-block {
margin-top: 12px;
display: grid;
gap: 10px;
}
.claim-head { .claim-head {
display: flex; display: flex;
justify-content: space-between; justify-content: space-between;
@ -201,6 +207,14 @@ textarea {
margin-bottom: 8px; margin-bottom: 8px;
} }
.meta-row,
.chip-row {
display: flex;
flex-wrap: wrap;
gap: 8px;
margin-bottom: 8px;
}
.chip, .chip,
.pill { .pill {
display: inline-flex; display: inline-flex;
@ -220,6 +234,10 @@ textarea {
color: var(--warn); color: var(--warn);
} }
.chip-warn {
color: var(--warn);
}
ul { ul {
margin: 0; margin: 0;
padding-left: 20px; padding-left: 20px;

View File

@ -6,7 +6,12 @@ import re
import sys import sys
from collections import defaultdict from collections import defaultdict
from typing import Any, Callable from typing import Any, Callable
from .citation_support import bibliography_summary_payload, load_bibliography_index, serialize_bib_entry from .citation_support import (
bibliography_summary_payload,
build_local_claim_support_suggestions,
load_bibliography_index,
serialize_bib_entry,
)
from .review_schema import CitationReviewEntry, ReviewSession from .review_schema import CitationReviewEntry, ReviewSession
def export_review_state_json(session: ReviewSession, path: str | Path) -> None: def export_review_state_json(session: ReviewSession, path: str | Path) -> None:
@ -220,15 +225,57 @@ def _artifact_citation_payloads(
"extracted_reference_count": len(extracted_refs), "extracted_reference_count": len(extracted_refs),
"citegeist_backends": backends, "citegeist_backends": backends,
} }
resolved_entries = [entry for entry in payload["resolved_entries"] if entry]
abstract_entries = [
entry
for entry in resolved_entries
if str(entry.get("fields", {}).get("abstract", "")).strip()
]
artifact_payloads.append(payload) artifact_payloads.append(payload)
summaries[artifact["artifact_id"]] = { summaries[artifact["artifact_id"]] = {
"citation_key_count": len(citation_keys), "citation_key_count": len(citation_keys),
"extracted_reference_count": len(extracted_refs), "extracted_reference_count": len(extracted_refs),
"resolved_entry_count": len(resolved_entries),
"abstract_entry_count": len(abstract_entries),
"title_samples": [
str(entry.get("fields", {}).get("title", "")).strip()
for entry in resolved_entries[:3]
if str(entry.get("fields", {}).get("title", "")).strip()
],
"abstract_snippets": [
str(entry.get("fields", {}).get("abstract", "")).strip().replace("\n", " ")[:280]
for entry in abstract_entries[:2]
],
"has_citation_support": bool(citation_keys or extracted_refs), "has_citation_support": bool(citation_keys or extracted_refs),
} }
return artifact_payloads, summaries return artifact_payloads, summaries
def _claim_analysis_metadata(claim: dict[str, Any]) -> dict[str, Any]:
metadata = dict(claim.get("metadata", {}))
lane = str(metadata.get("analysis_lane", "")).strip() or "empirical"
argument_role = str(metadata.get("argument_role", "")).strip()
if not argument_role:
if claim.get("contradicts_claim_ids"):
argument_role = "counterargument"
elif claim.get("supersedes_claim_ids"):
argument_role = "revision"
elif claim.get("claim_kind") == "summary":
argument_role = "context"
else:
argument_role = "premise"
risk_flags = [str(item) for item in metadata.get("risk_flags", []) if str(item).strip()]
if claim.get("contradicts_claim_ids") and "contradiction_linked" not in risk_flags:
risk_flags.append("contradiction_linked")
if claim.get("supersedes_claim_ids") and "supersession_linked" not in risk_flags:
risk_flags.append("supersession_linked")
return {
"analysis_lane": lane,
"argument_role": argument_role,
"risk_flags": risk_flags,
}
def build_citation_review_entries_from_import(import_dir: str | Path) -> list[CitationReviewEntry]: def build_citation_review_entries_from_import(import_dir: str | Path) -> list[CitationReviewEntry]:
base = Path(import_dir) base = Path(import_dir)
manifest = _read_json(base / "manifest.json") manifest = _read_json(base / "manifest.json")
@ -310,6 +357,7 @@ def build_citation_review_entries_from_import(import_dir: str | Path) -> list[Ci
def _build_import_review_payload(session: ReviewSession, import_dir: Path) -> dict[str, Any]: def _build_import_review_payload(session: ReviewSession, import_dir: Path) -> dict[str, Any]:
manifest = _read_json(import_dir / "manifest.json") manifest = _read_json(import_dir / "manifest.json")
resolved_source_root = _resolve_source_root(import_dir, manifest.get("source_root", "")) resolved_source_root = _resolve_source_root(import_dir, manifest.get("source_root", ""))
bibliography_index = load_bibliography_index(resolved_source_root) if resolved_source_root else {}
lint_payload = _read_json(import_dir / "lint_findings.json") lint_payload = _read_json(import_dir / "lint_findings.json")
queue_payload = _read_json(import_dir / "review_queue.json") queue_payload = _read_json(import_dir / "review_queue.json")
graph_payload = _read_json(import_dir / "graph_diagnostics.json") graph_payload = _read_json(import_dir / "graph_diagnostics.json")
@ -344,16 +392,37 @@ def _build_import_review_payload(session: ReviewSession, import_dir: Path) -> di
queue_entry = queue_by_candidate_id.get(full_concept_id, {}) queue_entry = queue_by_candidate_id.get(full_concept_id, {})
claim_payloads: list[dict[str, Any]] = [] claim_payloads: list[dict[str, Any]] = []
has_citation_support = False has_citation_support = False
lane_counts: dict[str, int] = defaultdict(int)
for claim in concept_claims[:25]: for claim in concept_claims[:25]:
supporting_observations = [observations_by_id[item] for item in claim.get("source_observation_ids", []) if item in observations_by_id] supporting_observations = [observations_by_id[item] for item in claim.get("source_observation_ids", []) if item in observations_by_id]
artifact_ids = {item["artifact_id"] for item in supporting_observations} artifact_ids = {item["artifact_id"] for item in supporting_observations}
citation_support = [artifact_citation_summary.get(artifact_id, {}) for artifact_id in artifact_ids] citation_support = [artifact_citation_summary.get(artifact_id, {}) for artifact_id in artifact_ids]
has_citation_support = has_citation_support or any(item.get("has_citation_support") for item in citation_support) has_citation_support = has_citation_support or any(item.get("has_citation_support") for item in citation_support)
analysis = _claim_analysis_metadata(claim)
lane_counts[analysis["analysis_lane"]] += 1
cited_keys = {
key
for artifact_id in artifact_ids
for key in next(
(item.get("citation_keys", []) for item in artifact_citations if item.get("artifact_id") == artifact_id),
[],
)
}
support_suggestions = build_local_claim_support_suggestions(
bibliography_index,
claim.get("claim_text", ""),
context=concept.title,
limit=3,
exclude_keys=cited_keys,
)
claim_payloads.append( claim_payloads.append(
{ {
"claim_id": claim["claim_id"], "claim_id": claim["claim_id"],
"claim_text": claim.get("claim_text", ""), "claim_text": claim.get("claim_text", ""),
"claim_kind": claim.get("claim_kind", ""), "claim_kind": claim.get("claim_kind", ""),
"analysis_lane": analysis["analysis_lane"],
"argument_role": analysis["argument_role"],
"risk_flags": analysis["risk_flags"],
"grounding_status": claim.get("grounding_status", "unknown"), "grounding_status": claim.get("grounding_status", "unknown"),
"supporting_observations": [ "supporting_observations": [
{ {
@ -367,6 +436,7 @@ def _build_import_review_payload(session: ReviewSession, import_dir: Path) -> di
for obs in supporting_observations for obs in supporting_observations
], ],
"citation_support": citation_support, "citation_support": citation_support,
"support_suggestions": support_suggestions,
"artifact_paths": [artifact_by_id[item]["path"] for item in artifact_ids if item in artifact_by_id], "artifact_paths": [artifact_by_id[item]["path"] for item in artifact_ids if item in artifact_by_id],
"finding_messages": [item["message"] for item in findings_by_target.get(claim["claim_id"], [])], "finding_messages": [item["message"] for item in findings_by_target.get(claim["claim_id"], [])],
} }
@ -390,6 +460,7 @@ def _build_import_review_payload(session: ReviewSession, import_dir: Path) -> di
"triage_lane": str(queue_entry.get("triage_lane", "knowledge_capture")), "triage_lane": str(queue_entry.get("triage_lane", "knowledge_capture")),
"finding_codes": list(queue_entry.get("finding_codes", [])), "finding_codes": list(queue_entry.get("finding_codes", [])),
"graph_codes": list(queue_entry.get("graph_codes", [])), "graph_codes": list(queue_entry.get("graph_codes", [])),
"analysis_lanes": dict(sorted(lane_counts.items())),
"top_claims": claim_payloads, "top_claims": claim_payloads,
"notes": list(concept.notes), "notes": list(concept.notes),
} }
@ -413,10 +484,18 @@ def _build_import_review_payload(session: ReviewSession, import_dir: Path) -> di
"Downgrade or reject concepts whose claims are fragmented, duplicated, or missing meaningful support.", "Downgrade or reject concepts whose claims are fragmented, duplicated, or missing meaningful support.",
"For academic material, citation-bearing claims deserve special scrutiny for fit, contradiction, and fabrication risk.", "For academic material, citation-bearing claims deserve special scrutiny for fit, contradiction, and fabrication risk.",
], ],
"analysis_lanes": [
"Empirical lane: what the source directly supports.",
"Citation lane: whether cited work exists and materially fits the claim.",
"Burden lane: what explanatory burden is being imposed or evaded.",
"Rhetorical lane: bundling, overstatement, equivocation, or burden shifting.",
"Research-program lane: what evidence or experiments would reduce the objection.",
],
"citation_guidance": [ "citation_guidance": [
"A citation key or extracted reference is evidence of traceability, not correctness.", "A citation key or extracted reference is evidence of traceability, not correctness.",
"Check whether the cited work actually supports the claim and whether the claim overstates it.", "Check whether the cited work actually supports the claim and whether the claim overstates it.",
"Use the citation track to prioritize claims that can move into a separate citation-ingestion workflow.", "Use the citation track to prioritize claims that can move into a separate citation-ingestion workflow.",
"Treat abstract-based support suggestions as triage help, not as a substitute for direct source inspection.",
], ],
}, },
"field_specs": [ "field_specs": [

View File

@ -55,6 +55,8 @@ def test_groundrecall_import_emits_normalized_artifacts(tmp_path: Path) -> None:
claims = _read_jsonl(result.out_dir / "claims.jsonl") claims = _read_jsonl(result.out_dir / "claims.jsonl")
assert any("Reliable rate upper bound" in item["claim_text"] for item in claims) assert any("Reliable rate upper bound" in item["claim_text"] for item in claims)
assert any(item["supporting_fragment_ids"] for item in claims) assert any(item["supporting_fragment_ids"] for item in claims)
assert all("metadata" in item for item in claims)
assert any(item["metadata"].get("analysis_lane") == "empirical" for item in claims)
concepts = _read_jsonl(result.out_dir / "concepts.jsonl") concepts = _read_jsonl(result.out_dir / "concepts.jsonl")
concept_ids = {item["concept_id"] for item in concepts} concept_ids = {item["concept_id"] for item in concepts}
@ -87,6 +89,7 @@ def test_groundrecall_import_emits_normalized_artifacts(tmp_path: Path) -> None:
assert "concept_reviews" in review_data assert "concept_reviews" in review_data
assert "citations" in review_data assert "citations" in review_data
assert "citation_reviews" in review_data assert "citation_reviews" in review_data
assert "analysis_lanes" in review_data["review_guidance"]
def test_concept_standardization_merges_duplicate_titles_into_aliases() -> None: def test_concept_standardization_merges_duplicate_titles_into_aliases() -> None:

View File

@ -79,7 +79,9 @@ def test_review_workspace_resolves_citation_metadata_from_bibtex(tmp_path: Path)
" author = {W. M. Baum},\n" " author = {W. M. Baum},\n"
" title = {On two types of deviation from the matching law: Bias and undermatching},\n" " title = {On two types of deviation from the matching law: Bias and undermatching},\n"
" journal = {Journal of the Experimental Analysis of Behavior},\n" " journal = {Journal of the Experimental Analysis of Behavior},\n"
" year = {1974}\n" " year = {1974},\n"
" doi = {10.1901/jeab.1974.22-231},\n"
" abstract = {Classic analysis of deviations from the matching law in operant choice experiments.}\n"
"}\n", "}\n",
encoding="utf-8", encoding="utf-8",
) )
@ -93,3 +95,47 @@ def test_review_workspace_resolves_citation_metadata_from_bibtex(tmp_path: Path)
assert entry["source_bib_path"] == "refs.bib" assert entry["source_bib_path"] == "refs.bib"
assert entry["raw_bibtex"] assert entry["raw_bibtex"]
assert payload["bibliography"]["entry_count"] >= 1 assert payload["bibliography"]["entry_count"] >= 1
assert payload["bibliography"]["abstract_entry_count"] == 1
assert payload["bibliography"]["doi_entry_count"] == 1
assert payload["bibliography"]["year_range"] == [1974, 1974]
concept_review = next(item for item in payload["concept_reviews"] if item["concept_id"] == "matching")
assert "analysis_lanes" in concept_review
citation_support = concept_review["top_claims"][0]["citation_support"][0]
assert concept_review["top_claims"][0]["analysis_lane"] == "empirical"
assert concept_review["top_claims"][0]["argument_role"] in {"premise", "context"}
assert citation_support["resolved_entry_count"] == 1
assert citation_support["abstract_entry_count"] == 1
assert "matching law" in citation_support["abstract_snippets"][0].lower()
suggestions = concept_review["top_claims"][0]["support_suggestions"]
assert suggestions == []
def test_review_workspace_surfaces_local_bibliography_support_suggestions(tmp_path: Path) -> None:
root = tmp_path / "llmwiki"
(root / "wiki").mkdir(parents=True)
(root / "wiki" / "drift.md").write_text(
"# Drift\n\n"
"- Random genetic drift can dominate allele-frequency change in small populations.\n",
encoding="utf-8",
)
(root / "refs.bib").write_text(
"@article{kimura1968evolutionary,\n"
" author = {Motoo Kimura},\n"
" title = {Evolutionary Rate at the Molecular Level},\n"
" journal = {Nature},\n"
" year = {1968},\n"
" abstract = {The rate of molecular evolution is compatible with neutral changes driven by random genetic drift in populations.}\n"
"}\n",
encoding="utf-8",
)
import_result = run_groundrecall_import(root, out_root=tmp_path / "imports", mode="quick", import_id="support-suggestions")
workspace = GroundRecallReviewWorkspace(import_result.out_dir)
payload = workspace.load_review_data()
concept_review = next(item for item in payload["concept_reviews"] if item["concept_id"] == "drift")
suggestions = concept_review["top_claims"][0]["support_suggestions"]
assert concept_review["analysis_lanes"]["empirical"] >= 1
assert suggestions
assert suggestions[0]["citation_key"] == "kimura1968evolutionary"
assert "abstract" in suggestions[0]["reason"].lower() or "title" in suggestions[0]["reason"].lower()