Understanding Hazel

joemcgrath · October 24, 2018, 4:35am

I’m considering purchasing and using Hazel to help automate my paperless files. I’m confused about the rules you set in Hazel. I understand it is a process as the software is powerful. What concerns me is that if I scan using OCR, am I using Hazel to read OCR to take action, or do I rename the file somewhat myself and then let Hazel do further modifications and move the document to the appropriate folder?

dfay · October 24, 2018, 4:45am

Either way works. I typically have Hazel rename files but include the existing file name in the new name. So if I download a file and the name is fjfirjsbsjekr3453.pdf I’ll rename it manually but if it’s something human readable I won’t.

Re OCR - Hazel won’t do the OCR - it will call whatever third party OCR software you’re using, once you set up rules to do so of course.

rlamarch · October 24, 2018, 3:23pm

When I started my paperless workflows, my main computer at home was a PC. I never found a program that would watch a directory and have the ability to read the OCR text. So I built text expansion rules for each type of document and used them to rename my files as they were scanned. The PC software keyed on the file names to move the files to the appropriate directory.

Today my main computer is an iMac and I have Hazel rules that look at the OCRed text, rename and move the file to the appropriated directory. The OCR is 100%, 100% of the time, so sometimes Hazel doesn’t pickup on the file. My text expansion rules carried over from the PC, so I use TextExpander to name the files and then manually file those document from my action folder.

tomalmy · October 24, 2018, 7:01pm

My rules OCR the document if it isn’t already OCRed and then moves, renames, and tags the document based on it’s contents. It’s actually pretty straightforward but ends up with many rules because of all the various documents that are ingested. 30 rules for downloaded files, 42 rules for scanned files, 3 to keep my desktop clean, and 4 more for special purposes, so 79 total.

Jonathan_Davis · October 24, 2018, 11:02pm

Do you have a list of the rules or where you were inspired to make all of them? That sounds like an awesome system!

Bill_Aus · October 25, 2018, 12:10am

I have some time today, I’m going to try this and see how much automation I can get out of Hazel.

Noerah · October 25, 2018, 12:59am

How do you get Hazel to check if something has already been OCR’d?

Woteva · October 25, 2018, 9:52am

Theres a number of different ways that you can check the OCR status;

Contents do not contain a,e,i,o,u
Content Creator does not contain ScanSnap (or similar)
File passes this shell script

if [ $(grep -ci "Font" "$1") -gt 0 ]; then 
     echo 1; 
else 
     echo 0; 
fi

Noerah · October 25, 2018, 1:57pm

Thanks @Woteva. Do you also use PdfPen Pro for OCRing through Hazel?

tomalmy · October 25, 2018, 11:10pm

I use this as the last rule (I don’t check for OCRing until all the other rules have failed):
Screenshot_10_25_18__4_06_PM
The Applescript does the OCR using PDFpenPro and is (sorry I lost the indenting in the paste):

 tell application "PDFpenPro"
 open theFile as alias
 tell document 1
 ocr
 repeat while performing ocr
 delay 1
 end repeat
 delay 1
 close with saving
 end tell
 tell application "PDFpenPro"
 quit
 end tell
 end tell

This is based on a Hazel rule I got from Katie Floyd’s website.

Woteva · October 25, 2018, 11:35pm

I used to use PdfPenPro primarily just for OCR but I wanted to reduce the software on my system so now I either OCR via ScanSnap Home when scanning documents or if I obtain a document that requires OCRing from another source I use the ABBYY FineReader for ScanSnap software that came with my iX500 Scansnap scanner.

The ABBYY software is restricted to only working with documents that were scanned with the ScanSnap so my routine is that I determine if the document requires to OCRed, I then run an Automator routine to set the Content Creator to ScanSnap then an AppleScript to process the document.

13%20am

28%20am

Noerah · October 26, 2018, 12:27am

Sneaky!! I love it. I was trying to avoid spending $100+ on PdfPen Pro.

Noerah · October 26, 2018, 12:29am

Thanks @tomalmy! I wish pdfPen Pro was less expensive. I tried the script with a free trial of pdfPen (not pro) but it doesn’t work.

Woteva · October 26, 2018, 1:25am

Apart from reducing software bloat on my iMac $ savings using the supplied ScanSnap software was another goal.

I had been stuck on the annual upgrade cycle (more akin to a subscription) with PDFpen Pro but my current routine breaks that shackle. I also especially loathed PDFpen’s interface which I found most antiquated and un Mac like. Now if I need to edit a PDF I just either use Preview or Readdle’s PDF Expert.

dfay · October 26, 2018, 2:10am

OCRKit is US $40 and works great for me in similar scripts.

Noerah · October 27, 2018, 2:27pm

@Woteva I tried your Apple script, and it works great, but after ABBYY is done with the OCR the program gives me a dialog box asking me to click Save. Is there a way to bypass that last manual action?

joshsullivan · October 28, 2018, 10:51am

tell application "ABBYY FineReader for ScanSnap"
	recognize theFile and export to theFile as pdf with silent mode
end tell

That will OCR the document and save it with its own name.

Noerah · October 28, 2018, 12:30pm

Thanks so much @joshsullivan! This works perfectly.

rebornrock · October 29, 2018, 1:21am

My knowledge in Applescript is non-existent. I copied the script that uses ABBYY to OCR the file but it’s returning an error: “The variable theFile is not defined”.
I put it in a quick action (macOS Mojave) and tried to run it from the finder.

Woteva · October 29, 2018, 3:12am

theFile is the file reference used by Hazel. Create a quick action to move the file to a Hazel monitored folder and run the AppleScript via Hazel.