PDF Expert 3 OCR performance (okay, could be better)

Just upgraded to PDF Expert 3. I’ve been using it on iOS and MacOS for some time. The addition of OCR is welcome. However, while it works well on clear scanned documents, performance with lower quality scans is poor compared with ABBYY FineReader.

Take, for instance, this text. It’s from an academic journal, printed in the 1970s and digitised in the 2000s:

FineReader:

Designed for sixth-formers and first-year students, this is Volume I of 'The Making of the Modem World* edited by the Professor of French History at University College, London, The three volumes - on the voyages of discovery, the colonial period and “the end of Europe” - will be of introductory interest to historians, geographers, anthropologists and internationalists especially since their American text-books tend to be littered with allusions to non-European systems.

PDF Expert:

De8i8ned for 8ixth-former8 aiid fir8t-year 8tudents. th16 L8 Volume I of •The Makins of the Modern World. edited by the Profe68or of French Hi8tory at Unlver8lty College, IA)ndon. The

three volume8 on the voya8e8 of di8coveryg the colonial period and l•the end of Europe will be of introductory intere6t to hlatorian8 • seo8rapher8, anthropolo818ts and internationali8t8 e8peclally 8lnce their American text-book8 tend to be littered with allu8ion8 to non-buropean 8y8tell!.

I also tried the command line application ocrmypdf (which uses the Tesseract OCR engine) and the result was basically gibberish.

I believe that PDF Expert is using the underlying Apple OCR engine that was introduced to the OS recently. Obviously that wasn’t trained or designed for documents such as the above (and probably won’t be any time soon).

In sum: ABBYY FineReader is still unrivalled for document OCR, which is a shame as version 13.0 of their Mac app (which supports Apple Silicon) killed a huge amount of functionality, including all automation capability. PDF Expert is nice but still a long way from being an ABBYY/Adobe killer. It seems likely that more apps will integrate Apple’s OCR capabilities; however, it seems to be a limited tool at this point.

2 Likes

Interesting. I’ve been very impressed with OCR in PDF Expert 3 so far – especially how fast it is. Out of curiosity, are you using it on an Apple Silicon Mac (seems like yes?), and do you have the OCR set to “Fast” or “Accurate”?

Y’know, I totally didn’t notice that fast/accurate setting (silly me)! Changing it to ‘accurate’ helps a lot, although ABBYY still has the edge. Mostly because PDF Expert OCR doesn’t handle the line breaks properly. Here’s how it comes out:

Designed for sixth-formers and first-year students, this 18

Volume I of ‘The Making of the Modern World’ edited by the Professor of French History at University College, London.

The

three volumes on the voyages of discovery, the colonial period and “the end of Europe” will be of introductory interest to historians, geographers, anthropologists and internationalists

especially since their American text-books tend to be littered with allusions to non-European systems.

So, consider my previous opinion significantly revised :slight_smile:

P.S. I’m on Intel but planning to upgrade soon… A speed test would also be interesting.

1 Like

Personally, I have yet to find anything better than ABBYY. Which is unfortunate because I don’t love the interface and it’s expensive. Interestingly, the engine is built into DevonThink, but with no real options to adjust it. I generally use the defaults anyway, so this has been totally fine for my use, and has largely replaced using the stand-alone ABBYY.

As an experiment I tried the OCR option in the CleanShot app. Here is the result. I have no idea what OCR engine they use.

Designed for sixth-formers and first-year students, this 18
Volume I of *The Making of the Modern World’ edited by the
Professor of French History at University College, London. The
three volumes on the voyages of discovery, the colonial period
and “the end of Europe” will be of introductory interest to
historians, geographers, anthropologists and internationalists
especially since their American text-books tend to be littered
with allusions to non-European systems.

The killer benefit of PDF Expert 3 OCR for me so far is the speed. It seems to use all cores of Apple Silicon Macs, and whatever magic ML is built into those chips. ABBYY and Acrobat both seem to be single-threaded and quite slow. When I’ve tested longer documents (e.g., 500+ pages), the speed difference is astounding.

@prc, you’re a godsend! Thanks also for posting an example.

You might be amused to know that I googled “PDF expert ocr accuracy comparison,” to which this post from two days ago was a top hit. This compelled me to make an account here :slightly_smiling_face:.

I’m in the middle of OCRing 90,000+ pages of a history journal that launched in the 1840s, and have been using the ABBYY FineReader API that is integrated into DEVONthink.

I did however just load PDF Expert 3. Note: perpetual license holders for PDF Expert 2 can transfer this to the new v.3 without issue (I emailed Readdle to assure this was the case).

About two years ago, I did a fairly involved comparison of OCR results between ABBYY, Adobe, Tesseract and Google Drive, using images from a book published around 1900 (English, standard serif typeface as source). ABBYY outperformed the others by a landslide.

4 Likes