12 KiB

Raw Blame History

GroundRecall `llmwiki` Import Specification

This document defines the first-pass import path for users who already have some form of llmwiki-style repository and want to migrate it into the broader GroundRecall substrate while staying compatible with Didactopus review and promotion flows.

Goal

The import path should let an existing llmwiki corpus become:

searchable without immediate manual cleanup
reviewable rather than blindly trusted
grounded in explicit provenance
promotable into durable structured knowledge objects
exportable back into compiled wiki pages, assistant adapter bundles, and queryable graph artifacts

The key rule is:

Imported wiki pages are derived artifacts, not automatic source truth.

Import philosophy

Users coming from llmwiki often have a mixture of:

raw notes
compiled markdown pages
local source files
generated summaries
ad hoc link graphs
session transcripts
speculative or weakly-supported synthesis

GroundRecall should preserve that work without pretending all of it is already promoted knowledge.

The import pipeline therefore has two responsibilities:

Preserve the original material with minimal loss.
Reify explicit structured objects that can later be reviewed and promoted.

Scope of the first implementation

The first implementation should support common llmwiki layouts such as:

raw/
wiki/
schema.*
logs/
sources/
top-level markdown pages

The importer should not require a canonical upstream schema. It should operate from directory conventions plus simple heuristics.

Import modes

1. `archive`

Purpose:

preserve an existing llmwiki tree as read-only imported artifacts
index it for search and later review

Behavior:

no claim promotion
minimal extraction
all compiled pages remain draft

Use when:

the user wants backward compatibility first
the corpus quality is unknown

2. `quick`

Purpose:

bootstrap usable structured objects fast

Behavior:

import pages and raw sources
extract candidate claims and concepts heuristically
attach lightweight provenance
queue uncertain items for review

Use when:

the user wants early utility and accepts heuristic noise

3. `grounded`

Purpose:

perform a migration suitable for long-lived shared knowledge

Behavior:

require provenance for promoted claims
mark unsupported statements explicitly
produce review records and lint findings
populate promotion queues rather than auto-promoting

Use when:

the imported corpus will be shared across machines or agents

Pipeline stages

1. Capture

The importer records the source repository as an import artifact.

Required metadata:

import_id
import_mode
source_root
imported_at
machine_id
agent_id
source_repo_kind=llmwiki

Outputs:

import manifest
artifact records for all discovered files

2. Segment

Imported content is split into stable units.

Primary segment types:

source_document
source_fragment
compiled_page
section_summary
candidate_claim
candidate_concept
candidate_relation
session_observation

Segmentation should preserve:

original path
section heading
line or byte offsets when possible
page title
frontmatter fields

3. Classify

Each segment gets a semantic role.

Recommended roles:

source
derivation
claim
summary
question
todo
speculation
obsolete
transcript

This prevents unsupported prose from being confused with grounded knowledge.

4. Ground

Each imported segment gets provenance and support metadata.

Required grounding fields:

origin_artifact_id
origin_path
origin_section
source_url when known
retrieval_date when known
machine_id
session_id when known
support_kind
grounding_status

Suggested values:

support_kind: direct_source, derived_from_page, derived_from_session, inferred, unknown
grounding_status: grounded, partially_grounded, ungrounded

5. Normalize

The importer emits explicit GroundRecall objects.

Minimum object set:

Source
Fragment
Artifact
Observation
Claim
Concept
Relation

6. Lint

The importer produces machine-readable findings before promotion.

Required lint checks:

claim has no supporting fragment
multiple claims appear text-identical
concept is orphaned
relation points to missing concept
page summary has no cited support
imported item marked obsolete still linked as current
same claim imported with conflicting confidence or polarity

7. Promote

Imported objects enter existing Didactopus review/promotion lanes rather than becoming trusted immediately.

Recommended states:

draft
triaged
reviewed
promoted
superseded
archived

8. Export

Promoted objects can then be rendered back out as:

compiled wiki pages
graph snapshots
assistant adapter bundles
review reports
query bundles for assistant-facing use

Object contracts

`ImportedArtifact`

{
  "artifact_id": "ia_001",
  "import_id": "imp_2026_04_16_a",
  "artifact_kind": "compiled_page",
  "path": "wiki/channel-capacity.md",
  "title": "Channel Capacity",
  "sha256": "abc123",
  "created_at": "2026-04-16T14:00:00Z",
  "metadata": {
    "frontmatter": {},
    "headings": ["Definition", "Examples"]
  },
  "current_status": "draft"
}

`ImportedObservation`

{
  "observation_id": "obs_001",
  "import_id": "imp_2026_04_16_a",
  "artifact_id": "ia_001",
  "role": "summary",
  "text": "Capacity bounds reliable communication over a noisy channel.",
  "origin_path": "wiki/channel-capacity.md",
  "origin_section": "Definition",
  "line_start": 12,
  "line_end": 14,
  "grounding_status": "partially_grounded",
  "support_kind": "derived_from_page",
  "confidence_hint": 0.63,
  "current_status": "draft"
}

`ImportedClaim`

{
  "claim_id": "clm_001",
  "import_id": "imp_2026_04_16_a",
  "claim_text": "Channel capacity is the maximum reliable communication rate for a channel model.",
  "claim_kind": "definition",
  "source_observation_ids": ["obs_001"],
  "supporting_fragment_ids": ["frag_014"],
  "concept_ids": ["concept::channel-capacity"],
  "confidence_hint": 0.74,
  "grounding_status": "grounded",
  "current_status": "triaged"
}

`ImportedConcept`

{
  "concept_id": "concept::channel-capacity",
  "import_id": "imp_2026_04_16_a",
  "title": "Channel Capacity",
  "aliases": [],
  "description": "Imported concept from llmwiki corpus.",
  "source_artifact_ids": ["ia_001"],
  "current_status": "triaged"
}

`ImportedRelation`

{
  "relation_id": "rel_001",
  "import_id": "imp_2026_04_16_a",
  "source_id": "concept::shannon-entropy",
  "target_id": "concept::channel-capacity",
  "relation_type": "supports_understanding_of",
  "evidence_ids": ["obs_015"],
  "current_status": "draft"
}

Mapping from `llmwiki` into GroundRecall

Recommended first-pass mapping:

raw/* -> Source or Artifact(kind=raw_note)
wiki/*.md -> Artifact(kind=compiled_page)
frontmatter -> artifact metadata
headings -> section boundaries
linked page names -> candidate Concept and Relation
bullet or sentence extraction -> candidate Observation and Claim
chat or session logs -> Observation(kind=session_note)
schema files -> import metadata only unless a future adapter exists

Confidence and trust policy

Imported confidence must remain clearly separate from reviewed confidence.

Recommended fields:

confidence_hint
review_confidence
grounding_status
review_verdict

Policy:

confidence_hint comes from heuristic import scoring
review_confidence exists only after review
promotion requires at least partially_grounded
fully ungrounded claims can be stored, but only as draft or archived

Provenance policy

The importer should follow the existing Didactopus provenance direction:

preserve source identity
preserve retrieval date when available
preserve adaptation status
keep both human-readable and machine-readable provenance

When only a compiled wiki page exists and the original source is missing:

the compiled page becomes the immediate origin artifact
all extracted claims must be marked derived_from_page
such claims should not auto-promote in grounded mode

Review and promotion integration

Imported Claim and Concept objects should feed into the same general review machinery already used for pack-oriented promotion:

create candidate records
attach lint findings
route to a triage lane
collect review verdicts
emit promotion records

Suggested triage lanes:

knowledge_capture
pack_improvement
skill_export
source_cleanup
conflict_resolution

Module layout

First-pass module layout:

didactopus.groundrecall_import Entry points and top-level orchestration.
didactopus.groundrecall_discovery Finds llmwiki-style files and classifies paths.
didactopus.groundrecall_segmenter Splits pages and logs into stable observations and candidate claims.
didactopus.groundrecall_normalizer Emits normalized import objects.
didactopus.groundrecall_lint Import-time lint checks.
didactopus.groundrecall_review_bridge Converts imported objects into review candidates and promotion records.
didactopus.groundrecall_export Renders promoted objects back to wiki, graph, and skill artifacts.

CLI shape

Suggested CLI:

python -m didactopus.groundrecall.cli import /path/to/llmwiki --mode archive
python -m didactopus.groundrecall.cli import /path/to/llmwiki --mode quick
python -m didactopus.groundrecall.cli import /path/to/llmwiki --mode grounded
python -m didactopus.groundrecall.cli lint imports/<import-id>
python -m didactopus.groundrecall.cli promote imports/<import-id> /path/to/store
python -m didactopus.groundrecall.cli export /path/to/store exports/groundrecall --concept channel-capacity

Compatibility wrappers still exist during migration:

python -m didactopus.groundrecall_import /path/to/llmwiki --mode grounded
python -m didactopus.groundrecall_lint imports/<import-id>
python -m didactopus.groundrecall_export /path/to/store exports/groundrecall --concept channel-capacity

Filesystem layout

Suggested repository-local layout:

imports/<import-id>/manifest.json
imports/<import-id>/artifacts.jsonl
imports/<import-id>/observations.jsonl
imports/<import-id>/claims.jsonl
imports/<import-id>/concepts.jsonl
imports/<import-id>/relations.jsonl
imports/<import-id>/lint_findings.json
imports/<import-id>/review_queue.json

This keeps imported state auditable and easy to sync across machines.

Multi-machine sync implication

For distributed assistant use, imported state should be append-oriented and rebuildable.

Recommended sync primitives:

import manifests
normalized jsonl object streams
review records
promotion records

Non-authoritative derived artifacts:

rendered wiki pages
local indexes
embeddings
cache files

This allows multiple machines to contribute import events without making the compiled page tree the merge primitive.

First implementation milestones

Milestone 1

discover raw/ and wiki/
import artifacts
segment markdown by headings
emit observations and candidate claims
write import manifest and jsonl outputs

Milestone 2

add grounding metadata
add lint checks
add triage lanes and review queue output

Milestone 3

map promoted claims into assistant-neutral exports plus assistant adapter bundles
render compiled wiki views from promoted objects
support multi-machine import manifests and merge-safe event storage

Non-goals for the first pass

perfect semantic claim extraction
automatic trust assignment
full upstream llmwiki schema compatibility
lossless import of every custom plugin or script
embeddings-first retrieval

The first pass should be conservative, inspectable, and easy to improve.

12 KiB Raw Blame History

GroundRecall llmwiki Import Specification

Goal

Import philosophy

Scope of the first implementation

Import modes

1. archive

2. quick

3. grounded

Pipeline stages

1. Capture

2. Segment

3. Classify

4. Ground

5. Normalize

6. Lint

7. Promote

8. Export

Object contracts

ImportedArtifact

ImportedObservation

ImportedClaim

ImportedConcept

ImportedRelation

Mapping from llmwiki into GroundRecall

Confidence and trust policy

Provenance policy

Review and promotion integration

Module layout

CLI shape

Filesystem layout

Multi-machine sync implication

First implementation milestones

Milestone 1

Milestone 2

Milestone 3

Non-goals for the first pass

12 KiB

Raw Blame History

GroundRecall `llmwiki` Import Specification

1. `archive`

2. `quick`

3. `grounded`

`ImportedArtifact`

`ImportedObservation`

`ImportedClaim`

`ImportedConcept`

`ImportedRelation`

Mapping from `llmwiki` into GroundRecall