Hazel OCR AppleScript for ABBYY FineReader PDF

NotAClue72 · August 31, 2023, 4:22pm

I used to have an excellent Hazel workflow using PDFpen Pro and an AppleScript (I think that originally came from Katie Floyd?) that would automatically OCR PDFs that were added to a folder. PDFpen Pro has not been working the same since it was taken over by Nitro, and so I’m looking for an alternative. I’ve settled on ABBYY FineReader PDF, but I can’t for the life of my work out how to create an AppleScript to run in Hazel to automatically run ABBYY FineReader PDF to OCR PDFs. I have tried to use ones I’ve round on the Internet but Hazel keeps giving errors, and as I have very very very basic coding skills I’m at a loss.
Here’s my current iteration which just isn’t working - please could someone tell me where I’m going wrong? Thank you?

tell application “System Events”
tell disk item (theFile as text)
set {theName, theExtension} to {name, name extension}
if theExtension is not “” then set theName to text 1 thru -((count theExtension) + 2) of theName – the name part
end tell
tell application “Finder”
set hazelPath to (container of alias (theFile as string)) as text
set pdfPath to hazelPath & "(OCR) " & theName & “.pdf”
end tell
tell application “ABBYY FineReader PDF”
repeat while is busy
delay 1
end repeat
export to pdf from file
repeat while is busy
delay 1
end repeat
end tell
end tell

KVZ · August 31, 2023, 5:53pm

Have you run this through Script Editor (or Script Debugger). I have, and numerous errors are reported. Probably more efficient and a better learning experience for you to run it through Script Editor yourself, and fix the errors reported.

Katie

DannyR · August 31, 2023, 6:21pm

If you get it working please post your solution- I tried to find something like that but failed. Good luck!

NotAClue72 · August 31, 2023, 7:42pm

Thanks for your replies. I went back to basics and found a version of the original Katie Floyd AppleScript posted by Rosemary Orchard, and I’ve to it working so that AppleScript will now open the PDF in ABBYY FineReader and run OCR on it, but I can’t get it to then automatically save the OCR version of the PDF and close ABBYY FineReader, I have to do that part manually. Which isn’t the end of the world but in an ideal world it would just open, run, save the OCR PDF and close like it used to with PDFPen Pro

tell application “ABBYY FineReader PDF”
open theFile
tell document 1
ocr
delay 1
export to pdf from file theFile
WaitUntilDone()
close with saving
end tell
quit
end tell

NotAClue72 · August 31, 2023, 8:03pm

Actually - this script doesn’t work - if there are multiple PDF files it adds them all at once to ABBYY FineReader to make one enormous PDF.
I’ve seen there is a ‘hot folder’ for ABBYY FineReader but I can’t find that option - I’ve got the ABBYY FineReader Premium subscription through the App Store.

Percussor · September 1, 2023, 6:00am

Might also be worth looking at this approach which works well for me

vco1 · September 2, 2023, 7:10am

This script is working for me. It’s slightly longer (more complex) than the other examples. Which may explain the difference. Some settings (e.g. langList) may need some changes for you particular usecase.

on hazelProcessFile(theFile, inputAttributes)
	
	using terms from application "FineReader"
		set langList to {Dutch, English}
		set saveType to single file
		set keepPageNumberHeadersAndFootersBoolean to yes
		set pageSizePageSizeEnum to automatic
		set keepPicturesBoolean to yes
		set imageOptionsImageQualityEnum to balanced quality
		set keepTextAndBackgroundColorsBoolean to yes
		set makePDFABoolean to yes
	end using terms from
	
	WaitWhileBusy()
	
	tell application "FineReader"
		export to pdf theFile ¬
			from file theFile ¬
			ocr languages enum langList ¬
			page size pageSizePageSizeEnum ¬
			saving type saveType ¬
			keep page numbers headers and footers keepPageNumberHeadersAndFootersBoolean ¬
			keep pictures keepPicturesBoolean ¬
			image quality imageOptionsImageQualityEnum ¬
			keep text and background colors keepTextAndBackgroundColorsBoolean ¬
			make pdfa makePDFABoolean
	end tell
	
	WaitWhileBusy()
	
	tell application "FineReader"
		quit
	end tell
	
end hazelProcessFile

on IsMainApplicationBusy()
	tell application "FineReader"
		set resultBoolean to is busy
	end tell
	return resultBoolean
end IsMainApplicationBusy

on WaitWhileBusy()
	repeat while IsMainApplicationBusy()
	end repeat
end WaitWhileBusy

NotAClue72 · September 2, 2023, 10:50am

Thank you so much! This is exactly what I’m looking for. I just have a couple of problems - Hazel doesn’t like the script starting with ‘on’ (it reports an error saying “Expected “end” but found “on””).

And both ScriptEditor and Hazel don’t like the word ‘file’ in
set SaveType to single file
They both say “Expected end of line, etc. but found class name”.

I’m using Hazel v.5.1.1 on Venture 13.5.1 if that’s any help?

vco1 · September 2, 2023, 12:30pm

Are you sure you have the ABBYY version that still supports AppleScript?

Not the cause of the issue, but still noteworthy: Hazel 5.2.2 is currently the latest version.

NotAClue72 · September 2, 2023, 12:46pm

Ah maybe that’s the problem - the version of ABBYY. Which version of ABBYY is the one I should be using? I’ve only just downloaded it from the App Store so I’ve probably got a newer version…

vco1 · September 2, 2023, 12:56pm

AFAIK the last version to support AppleScript was 12.x.x
I am (or rather was) using version 12.1.14.

ABBYY told for a long time that they would add AppleScript support again. But unfortunately FineReader for Mac is basically abandonware (apart from some very minor updates - none of them related to automation).

Even though their OCR engine is the best one out there, I moved over to OCRmyPDF (based on Tesseract).

NotAClue72 · September 2, 2023, 1:55pm

Interesting - I might do the same, just go to OCRmyPDF. I’m by no means very experienced in scripting and coding so the whole Brew aspect is a bit daunting but I’ll make it a project to figure it out!

vco1 · September 4, 2023, 7:29am

/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"

and

brew install ocrmypdf

That’s basically all there is to do.

rlivingston · September 5, 2023, 4:44am

Why do you say that? I am curious just because I am very dependent on their OCR engine for one of my projects.

vco1 · September 5, 2023, 6:12am

Just look at the facts. There have been a few updates, but all are minor. They don’t listen to user requests. They removed basically all automation options. Which is quite essential for an OCR engine.
And there is still no Apple Silicon support. The reply on their website is bonkers:

Starting from [Release 1 Update 2] FineReader PDF for Mac can work properly on computers powered by the Apple M1 chips. It runs using Rosetta 2 technology.

That comments is from about 2 years ago. And as far as I know there’s still no ARM version of Finereader available.

Applegeek · September 5, 2023, 6:39am

I’m going to throw out a recommendation for OwlOCR on the Mac App Store. I’ve found it to be one of the fastest and most reliable ways to ocr stuff using hazel.

rlivingston · September 6, 2023, 2:46am

Thanks.

Sad.

I wish companies were more straightforward about abandonware. I tend to support subscription apps in that they are less likely to stop communicating with users and disappear into the fog without any real explanation.