# DiffSeeker DiffSeeker scans directory trees, records file metadata plus content hashes, and supports cross-volume comparison for: - duplicates (same hash + size) across volumes - missing files (present on one volume, absent on others by hash+size) - suspicious divergences (same name, different size) ## Python CLI (mpchunkcfa compatible) Install (editable dev install): ```bash pip install -e . ``` Scan a directory and emit CSV: ```bash mpchunkcfa --walk /path/to/root -V "VOL_A" -c vol_a.csv ``` Scan and ingest into SQLite: ```bash mpchunkcfa --walk /path/to/root -V "VOL_A" --db diffseeker.db ``` Exclude directory elements (repeatable): ```bash mpchunkcfa --walk . -V "VOL_A" --exclude .git --exclude .svn ``` ## Data model Each file record includes: name, relative_path, extension, size, creation_date, modified_date, hash_value, file_type, number_of_files, volume_name. ## 5) Notes on compatibility and correctness - Your earlier runtime error (`root` undefined in worker) is eliminated by passing `root` and `directory` into `_compute_record`. - The exclusion logic is **path-element based**, so `.gitignore` is not excluded when excluding `.git`. - `creation_date` is OS-dependent semantics (ctime on Unix). If you later want “birth time” portability, we can normalize explicitly per platform.