diff --git a/README.md b/README.md index 95c3967..2ab574f 100644 --- a/README.md +++ b/README.md @@ -10,4 +10,36 @@ DiffSeeker scans directory trees, records file metadata plus content hashes, and Install (editable dev install): ```bash pip install -e . +``` + +Scan a directory and emit CSV: + +```bash +mpchunkcfa --walk /path/to/root -V "VOL_A" -c vol_a.csv +``` + +Scan and ingest into SQLite: + +```bash +mpchunkcfa --walk /path/to/root -V "VOL_A" --db diffseeker.db +``` + +Exclude directory elements (repeatable): + +```bash +mpchunkcfa --walk . -V "VOL_A" --exclude .git --exclude .svn +``` + +## Data model + +Each file record includes: +name, relative_path, extension, size, creation_date, modified_date, +hash_value, file_type, number_of_files, volume_name. + + +## 5) Notes on compatibility and correctness + +- Your earlier runtime error (`root` undefined in worker) is eliminated by passing `root` and `directory` into `_compute_record`. +- The exclusion logic is **path-element based**, so `.gitignore` is not excluded when excluding `.git`. +- `creation_date` is OS-dependent semantics (ctime on Unix). If you later want “birth time” portability, we can normalize explicitly per platform.