|
|
||
|---|---|---|
| python | ||
| .gitignore | ||
| LICENSE | ||
| Makefile | ||
| README.md | ||
| pyproject.toml | ||
README.md
DiffSeeker
DiffSeeker scans directory trees, records file metadata plus content hashes, and supports cross-volume comparison for:
- duplicates (same hash + size) across volumes
- missing files (present on one volume, absent on others by hash+size)
- suspicious divergences (same name, different size)
Python CLI (mpchunkcfa compatible)
Install (editable dev install):
pip install -e .
Scan a directory and emit CSV:
mpchunkcfa --walk /path/to/root -V "VOL_A" -c vol_a.csv
Scan and ingest into SQLite:
mpchunkcfa --walk /path/to/root -V "VOL_A" --db diffseeker.db
Exclude directory elements (repeatable):
mpchunkcfa --walk . -V "VOL_A" --exclude .git --exclude .svn
Data model
Each file record includes: name, relative_path, extension, size, creation_date, modified_date, hash_value, file_type, number_of_files, volume_name.
5) Notes on compatibility and correctness
- Your earlier runtime error (
rootundefined in worker) is eliminated by passingrootanddirectoryinto_compute_record. - The exclusion logic is path-element based, so
.gitignoreis not excluded when excluding.git. creation_dateis OS-dependent semantics (ctime on Unix). If you later want “birth time” portability, we can normalize explicitly per platform.