Hazel Rule Issue

OogieM · November 4, 2020, 2:50pm

So I have this hazel rule to rename and move a downloaded electric statement

The filename is 2020_11_2_9804847301.pdf

And the first date in the contents is

Screen Shot 2020-11-04 at 7.54.16 AM

But somehow the rename always put the date of the current date when I downloaded the file not the date found in the document. I tried to preview and all it says is the rule matches but I can’t seem to see where it’s picking up today’s date under the latest Hazel version.

Any suggestions as to where to look or how to fix this?

memex · November 4, 2020, 3:22pm

is the text layer / ocr of the statement ok?

Sometimes ago my bank somehow managed to send me pdf monthly statements with the text layer all messed up: if I tried to select text I got highlighted some weird region of the page, usually white ones with no text at all and copy/pasting resulted in some random and partial text…

long story short my hazel rules similar to yours stopped working (selected the third or fourth date,because couldn’t read the first one) and never got back

ChrisUpchurch · November 4, 2020, 3:41pm

Is the date that it is matching and renaming the file with somewhere in the document?

One thing I’ve found is that the “1st” date in a PDF document isn’t always what a person reading the document would think is first. My assumption is that Hazel is looking at the first date in the order of the text in the file on disk, not what appears closest to the top as the PDF is laid out on screen in a PDF reader.

WayneG · November 4, 2020, 4:33pm

I had a similar problem and found it by copying the text layer of the PDF and pasting it into BBEdit.

In my case the info I wanted wasn’t legible in the text layer.

r2d2 · November 4, 2020, 6:07pm

Does anybody know how to fix the text layer? If somebody wanted to manually?

rms · November 4, 2020, 6:07pm

Try reprinting to create a new PDF then do a new OCR.

r2d2 · November 4, 2020, 6:11pm

I check the preview to see what date it’s picking up and like people have said look at the text layer

JKoopmans · November 4, 2020, 10:22pm

If you open the statement and do a search for the date: does it find it where you expect it to be?

nlippman · November 5, 2020, 1:08am

I have had the same problem with some PDFs, especially ones that have been scanned vs downloaded (although I am not sure why that part makes a difference).

If you double click the rule to edit it, you can see a button on the top right called “preview” which lets you see how the rule will match a selected file. You will get a dialog to select the file that you want the rule to parse. Next to each rule criterion there will appear a red X or green checkmark indicating if that rule was matched or not. If you click the X or check you get a pop-up showing you details of the match, and there is an icon with three dots which you can click to see the entire match data.

Sometimes scanning through that will show you what the actual data that Hazel is seeing will look like, and you might be able to figure out how to modify your rule accordingly.

I have found this helps quite a bit, but there are still a few files that I just haven’t been able to make Hazel figure out. Sometimes, as others have noted, having an OCR’s text layer helps. Sometimes it doesn’t.

I have about 20 rules in my downloads folder that automagically rename scans and downloads, but I have another couple that I am still working on and haven’t been able to solve (yet).

OogieM · November 5, 2020, 1:38pm

Not anywhere that I can see. when I check in Preview I see the date it’s supposed to find first and I don’t see the date it is finding at all.

Yes

JKoopmans · November 5, 2020, 2:19pm

You have automatic date format set, could that be the issue?
Does it know what date format is parsed?

icolomby · November 6, 2020, 3:25am

What I find helps when I’m having issues matching PDF content is to use hazelimport from the Terminal. This is the tool that Hazel uses to get the PDF’s text. It’s part of the Hazel application bundle.

This command will dump the PDF text out to the terminal:
~/Library/PreferencePanes/Hazel.prefPane/Contents/MacOS/hazelimporter [PDF Fillename]

Alternatively, you can use this command will save the PDF text to file:
~/Library/PreferencePanes/Hazel.prefPane/Contents/MacOS/hazelimporter [PDF Fillename] > [Text Filename]

I find the 2nd command more useful, as I can then open the text file in a text editor and search for the pieces of text I want to use in my Hazel rules.

Ian

r2d2 · November 6, 2020, 10:19pm

That’s really cool. I typically open the PDF and then select all and copy

OogieM · November 7, 2020, 3:04pm

So far I keep getting command not found no matter which version of that idea I try. I have verified that I am in the top level Library PreferencePanes folder and put the full complete pathnames for both the input and output files.

It’s probably something I’m doing wrong in Terminal.

Dealing with urgent farm work, I will revisit this as soon as I handle the stuff that has to be done before it starts snowing later today.

nlippman · November 8, 2020, 2:10am

@OogieM: It would be helpful if you posted the Terminal command you are using and the output.

However, if you have cd’d to ~/Library/PreferancePanes…, but the Hazel command is not found there, it could be because IF you have installed Hazel for all users of your Mac and not just for yourself, then the needed app, hazelimporter, and the Hazel preference pane itself, will be located in the system library folder at /Library (no initial ~).

OogieM · November 8, 2020, 4:11am

Let me go try to see. I know I installed Hazel for all users but I thought I had tried ti without the ~ and got the same error .

icolomby · November 8, 2020, 5:49pm

I have Hazel 4.4.5 installed. If you have a different version, find the Hazel.prefPane in Finder and right click on it and Show Package Contents. See if the hazelimporter program is in the Contents/MacOS folder.

OogieM · November 8, 2020, 11:11pm

OK Got the commands to work, found the date in the file. It is the first date and I did have automatic date detection set up in my rule. In preview mode the underlined yellow section (which I presume is what Hazel thinks is the date, is actually a time of the hours for the office being open. It’s not a date at all. I’m playing with making it a later occurance to see if that helps.

JKoopmans · November 9, 2020, 9:30pm

That is awesome! Thanks for this one