Understanding Hazel

Either way works. I typically have Hazel rename files but include the existing file name in the new name. So if I download a file and the name is fjfirjsbsjekr3453.pdf I’ll rename it manually but if it’s something human readable I won’t.

Re OCR - Hazel won’t do the OCR - it will call whatever third party OCR software you’re using, once you set up rules to do so of course.

When I started my paperless workflows, my main computer at home was a PC. I never found a program that would watch a directory and have the ability to read the OCR text. So I built text expansion rules for each type of document and used them to rename my files as they were scanned. The PC software keyed on the file names to move the files to the appropriate directory.

Today my main computer is an iMac and I have Hazel rules that look at the OCRed text, rename and move the file to the appropriated directory. The OCR is 100%, 100% of the time, so sometimes Hazel doesn’t pickup on the file. My text expansion rules carried over from the PC, so I use TextExpander to name the files and then manually file those document from my action folder.

My rules OCR the document if it isn’t already OCRed and then moves, renames, and tags the document based on it’s contents. It’s actually pretty straightforward but ends up with many rules because of all the various documents that are ingested. 30 rules for downloaded files, 42 rules for scanned files, 3 to keep my desktop clean, and 4 more for special purposes, so 79 total.

Do you have a list of the rules or where you were inspired to make all of them? That sounds like an awesome system!

I have some time today, I’m going to try this and see how much automation I can get out of Hazel. :slight_smile:

How do you get Hazel to check if something has already been OCR’d?

Theres a number of different ways that you can check the OCR status;

  1. Contents do not contain a,e,i,o,u
  2. Content Creator does not contain ScanSnap (or similar)
  3. File passes this shell script
if [ $(grep -ci "Font" "$1") -gt 0 ]; then 
     echo 1; 
else 
     echo 0; 
fi

Thanks @Woteva. Do you also use PdfPen Pro for OCRing through Hazel?

I use this as the last rule (I don’t check for OCRing until all the other rules have failed):
Screenshot_10_25_18__4_06_PM
The Applescript does the OCR using PDFpenPro and is (sorry I lost the indenting in the paste):

 tell application "PDFpenPro"
 open theFile as alias
 tell document 1
 ocr
 repeat while performing ocr
 delay 1
 end repeat
 delay 1
 close with saving
 end tell
 tell application "PDFpenPro"
 quit
 end tell
 end tell

This is based on a Hazel rule I got from Katie Floyd’s website.

1 Like

I used to use PdfPenPro primarily just for OCR but I wanted to reduce the software on my system so now I either OCR via ScanSnap Home when scanning documents or if I obtain a document that requires OCRing from another source I use the ABBYY FineReader for ScanSnap software that came with my iX500 Scansnap scanner.

The ABBYY software is restricted to only working with documents that were scanned with the ScanSnap so my routine is that I determine if the document requires to OCRed, I then run an Automator routine to set the Content Creator to ScanSnap then an AppleScript to process the document.

13%20am

28%20am

Sneaky!! I love it. I was trying to avoid spending $100+ on PdfPen Pro.

Thanks @tomalmy! I wish pdfPen Pro was less expensive. I tried the script with a free trial of pdfPen (not pro) but it doesn’t work.

Apart from reducing software bloat on my iMac $ savings using the supplied ScanSnap software was another goal.

I had been stuck on the annual upgrade cycle (more akin to a subscription) with PDFpen Pro but my current routine breaks that shackle. I also especially loathed PDFpen’s interface which I found most antiquated and un Mac like. Now if I need to edit a PDF I just either use Preview or Readdle’s PDF Expert.

OCRKit is US $40 and works great for me in similar scripts.

@Woteva I tried your Apple script, and it works great, but after ABBYY is done with the OCR the program gives me a dialog box asking me to click Save. Is there a way to bypass that last manual action?

tell application "ABBYY FineReader for ScanSnap"
	recognize theFile and export to theFile as pdf with silent mode
end tell

That will OCR the document and save it with its own name.

1 Like

Thanks so much @joshsullivan! This works perfectly.

My knowledge in Applescript is non-existent. I copied the script that uses ABBYY to OCR the file but it’s returning an error: “The variable theFile is not defined”.
I put it in a quick action (macOS Mojave) and tried to run it from the finder.

theFile is the file reference used by Hazel. Create a quick action to move the file to a Hazel monitored folder and run the AppleScript via Hazel.

I think that’s what I’m doing wrong then. I’m trying to run it as an Automator workflow via quick actions on the Finder.
How can I fix the script to to do it manually on the finder instead of doing through Hazel?