Hazel Alternative?

ibuys · May 8, 2024, 6:09pm

Is there anything that can do what Hazel can do?

The matching rule inconsistency with Hazel is so frustrating. I’m trying to build out a system where Hazel matches particular bank account statements by the account number in a scanned PDF, but Hazel just randomly fails to match the number.

I’ve copy/pasted the number out of the PDF, and I’ve copy/pasted the number out of hazel’s window that shows you what Hazel can see. Both have the exact same number, but Hazel refuses to match on it.

WayneG · May 8, 2024, 6:35pm

I’ve had the same problem and never found a solution. Hazel will be able to read an account number in one statement and fail to do it in another, from the same bank.

My “solution” is to download statements on my iPad Pro to specific iCloud folders and let the folder determine where Hazel files the document.

rms · May 8, 2024, 6:42pm

Frankly, I do not think the root cause is Hazel. It is from my experience how the OCR Text layer in the PDF (which i assume you are using) is not consistent, or changing, or the pattern you specified is incorrect.

I have found the Hazel forum very helpful in sorting these things out.

I have found good results with Hazel and have found nothing better.

WayneG · May 8, 2024, 6:47pm

I agree. I failed to explain why I mentioned statements “from the same bank”.

ibuys · May 8, 2024, 6:59pm

I’d be very happy to find out that it’s something I’m doing wrong. Here’s what I’m doing:

So, in here let’s say my bank account number is a bunch of 5s. I know that’s right because I copied it straight out of the PDF statement that I scanned and OCR’d with my ScanSnap. Older statements have dashes, newer ones do not. When that doesn’t match, if you click on that red X you can get to the text that Hazel can see:

So, I copy everything out of there and paste into BBEdit, find my account number, copy it from BBEdit back into Hazel, and it still doesn’t recognize the number. Not sure what else I could do.

cornchip · May 8, 2024, 7:21pm

Your structure looks good. Does contain match work any better than contain? Supposedly it can fix situations where something is wrong with Spotlight indexing, because Hazel does its own slower crawl of the document.

Brisbane · May 8, 2024, 9:13pm

There are two solutions. You can OCR again to a better standard or instead of just using “contents contain”" - use “contents contain match”. Hazel seems a little more reliable on contents contain match.

liminal · May 9, 2024, 12:15am

I also find that contains match works much better than a simple contains in Hazel. YMMV

tomalmy · May 9, 2024, 1:26am

Yep. Just use “contents contain match”. Then Hazel will scan the actual file contents rather than just using the Spotlight data. It’s slower but far more accurate, and since it’s working in the background if it takes a little longer it doesn’t matter.

ibuys · May 9, 2024, 1:39pm

I’ve switched over to “contains match” and you are all correct, works much much better. For some reason I guess I thought that what “contains match” does is what “contains” did the entire time. Didn’t realize it was relying on Spotlight.

Thanks everyone!

tomalmy · October 23, 2024, 3:58pm

Readdressing this, the just released Hazel version 6 has this feature under Core changes:

“Contents contain” should now be more reliable. While it still uses Spotlight, if it fails, it will fall back to using the same mechanism that “Contents contain match” does.

Also of interest, Hazel now appears to have built-in OCR on the fly for text matching.

DEVONtech_Jim · October 24, 2024, 8:59pm

Be aware it’s technically OCR (as characters are recognized), not traditional OCR which actually creates a proper text layer or generates a document of the text. This is mentioned in the release notes.

tomalmy · October 24, 2024, 11:16pm

Yep, for text matching. It seems to have always done this for JPEGs, but not for PDFs until the new version.