Either way works. I typically have Hazel rename files but include the existing file name in the new name. So if I download a file and the name is fjfirjsbsjekr3453.pdf I’ll rename it manually but if it’s something human readable I won’t.
Re OCR - Hazel won’t do the OCR - it will call whatever third party OCR software you’re using, once you set up rules to do so of course.
When I started my paperless workflows, my main computer at home was a PC. I never found a program that would watch a directory and have the ability to read the OCR text. So I built text expansion rules for each type of document and used them to rename my files as they were scanned. The PC software keyed on the file names to move the files to the appropriate directory.
Today my main computer is an iMac and I have Hazel rules that look at the OCRed text, rename and move the file to the appropriated directory. The OCR is 100%, 100% of the time, so sometimes Hazel doesn’t pickup on the file. My text expansion rules carried over from the PC, so I use TextExpander to name the files and then manually file those document from my action folder.
My rules OCR the document if it isn’t already OCRed and then moves, renames, and tags the document based on it’s contents. It’s actually pretty straightforward but ends up with many rules because of all the various documents that are ingested. 30 rules for downloaded files, 42 rules for scanned files, 3 to keep my desktop clean, and 4 more for special purposes, so 79 total.
I use this as the last rule (I don’t check for OCRing until all the other rules have failed):
The Applescript does the OCR using PDFpenPro and is (sorry I lost the indenting in the paste):
tell application "PDFpenPro"
open theFile as alias
tell document 1
ocr
repeat while performing ocr
delay 1
end repeat
delay 1
close with saving
end tell
tell application "PDFpenPro"
quit
end tell
end tell
This is based on a Hazel rule I got from Katie Floyd’s website.
I used to use PdfPenPro primarily just for OCR but I wanted to reduce the software on my system so now I either OCR via ScanSnap Home when scanning documents or if I obtain a document that requires OCRing from another source I use the ABBYY FineReader for ScanSnap software that came with my iX500 Scansnap scanner.
The ABBYY software is restricted to only working with documents that were scanned with the ScanSnap so my routine is that I determine if the document requires to OCRed, I then run an Automator routine to set the Content Creator to ScanSnap then an AppleScript to process the document.
Apart from reducing software bloat on my iMac $ savings using the supplied ScanSnap software was another goal.
I had been stuck on the annual upgrade cycle (more akin to a subscription) with PDFpen Pro but my current routine breaks that shackle. I also especially loathed PDFpen’s interface which I found most antiquated and un Mac like. Now if I need to edit a PDF I just either use Preview or Readdle’s PDF Expert.
@Woteva I tried your Apple script, and it works great, but after ABBYY is done with the OCR the program gives me a dialog box asking me to click Save. Is there a way to bypass that last manual action?
My knowledge in Applescript is non-existent. I copied the script that uses ABBYY to OCR the file but it’s returning an error: “The variable theFile is not defined”.
I put it in a quick action (macOS Mojave) and tried to run it from the finder.
I think that’s what I’m doing wrong then. I’m trying to run it as an Automator workflow via quick actions on the Finder.
How can I fix the script to to do it manually on the finder instead of doing through Hazel?