[macOS] Detecting files that have changed in folders (i.e. length changed to zero, contents changed, etc.)

Detecting changed files

Since DEVONthink’s verify database only verifies the database, and not files that are indexed, and (I don’t think) files that are contained in a database, I went looking for a utility that would detect changed files. This is more broadly applicable than chasing DT problems though.

binsnitch is a Python program that will create a SHA256 hash for the files within a folder, and files in its subfolders, then let you know if they’ve changed when you run it again. It can either run periodically itself, or can be run on an as-needed basis.

Installation

Download the code from here and put it somewhere like /usr/local/bin.

Initial run in a directory:

-s = run a single time (i.e. don’t monitor the folder)
-a = do all files, not just dangerous ones like .exe, .dmg, etc.
-b = create a baseline for the folder, and don’t generate alerts for new files (since they’re all new at this point)

/usr/local/bin/binsnitch.py -s -a -b dirname

Checking a directory later:

-s = run a single time (i.e. don’t monitor the folder)
-a = do all files, not just dangerous ones like .exe, .dmg, etc.
-n = let me know about new files that have appeared since the last run

/usr/local/bin/binsnitch.py -s -a -n dirname

Miscellaneous

If run as a cron job, the output of binsnitch could probably be piped to a notifier, or add code to use pync (something I might do later).

Hashes are stored in a json database in a folder binsnitch_data which is automatically created within the monitored folder. Alerts and information is logged in binsnitch_data/alerts.log, and look like this:

08/18/2021 10:44:09 AM - INFO - New file detected: ./Notes/_Archived Items/.obsidian/workspace_v023 - hash: 19cc4f9065551ec538c3b4d66f776530fb6501a69d3b3ffd898668032dc04577
08/18/2021 10:44:09 AM - INFO - New file detected: ./Notes/_Archived Items/.git/logs/refs/heads/main_v334 - hash: 3f6db7347059d1268bf400b97246702aabe0d3a5c1189726eb90b8a73dfaf7b9
08/18/2021 10:44:09 AM - INFO - New file detected: ./Notes/_Archived Items/.git/logs/HEAD_v334 - hash: f81a570043ba40b3ffd7b18d7176343766a4685ba396847131c301bf12a81ee5
08/18/2021 10:44:10 AM - INFO - New file detected: ./Notes/_Archived Items/.git/refs/heads/main_v334 - hash: 2766281ed2fb420ea0f32e225c47ae142e74bafa7afc9b14057f26d3512b8414
08/18/2021 10:44:10 AM - INFO - New file detected: ./Notes/_Archived Items/.git/COMMIT_EDITMSG_v335 - hash: 839e77ed3d63b0a195230c4c2b273e702e651fc60616b876ba9a4f09fd8ae348
08/18/2021 10:44:10 AM - INFO - New file detected: ./Notes/_Archived Items/.git/index_v335 - hash: 5c8d1c31bab1420e82c7c06644b2600210803b6464fa3afc0e014d533b01b2ca
08/18/2021 10:44:11 AM - INFO - New file detected: ./Notes/_Archived Items/.obsidian-git-data_v335 - hash: 580fa6b993e29580def0adf7b9ccc43a5cbc15357b55f822d95e096aa71a84be
08/18/2021 10:44:11 AM - INFO - New file detected: ./Notes/sensory demand.md - hash: 78ba2c0f91c999f2076313cca5af545a0d8391c1903e42bc58cb97bf94fba7c9
08/18/2021 10:44:12 AM - INFO - Modified file detected: ./Notes/.obsidian-git-data - new hash: 35333ab602265c598a9b175b553862e03e06cefbf7e31e1d27713c45a0b07c72
08/18/2021 10:44:12 AM - INFO - Finished!

So you can see ChronoSync has been moving files into the _Archived Items folder, which binsnitch detected.

The folder I’m running it in at the moment is my folder of safe copies for DT to index (so that it doesn’t access my original files). It’s about 11GB, and has about 19,000 files. It takes about a minute for binsnitch to run.

3 Likes

Neat! MacOS also has a built-in API for filesystem changes, and this project uses that instead of periodic scanning: https://github.com/emcrisostomo/fswatch

1 Like

Cool! I feel a mashup coming on.

Right, one of my favorite technical parts of macOS going back to Tiger is the Spotlight firehose of fsevents. We don’t think much of it now, but at the time it was a marvel. Every change to a file on the OS has to go through the kernel, so the kernel started writing a log of it, which is then indexed by Spotlight and whatever other application needed to keep up to date. I’m writing this from memory, so there might be areas where I’m wrong, but that’s how I recall it. Siracusa’s Ars Technica review of Tiger is still great reading. Mac OS X 10.4 Tiger | Ars Technica

1 Like

Had to go back to my database. The public fsevents framework debuted in Leopard, not Tiger. But there was a private API in Tiger that made Spotlight work.

1 Like