Easy-but-solid one-off OCR ...?

I have a scanned document that was saved as a PDF. It’s readable to me, but Preview’s “automatic text selection” functionality doesn’t seem to work.

The text is basically an Excel spreadsheet. Is there a simple-but-reliable piece of OCR software that could convert this? I just need this as a one-off, and I don’t have Acrobat Pro.

Thoughts?

What happens if you open the PDF in Safari?

Doesn’t let me highlight the text.

Never tried it, but isn’t that something you can do with a shortcut on iOS?
Or what about downloading one of those scanner apps from the AppStore like „Scanner Pro“ and try them?

I wouldn’t be without Jeff Johnson’s Stop the Madness browser extension which often fixes the problem of not being able to select text. But I’m not sure it works with PDF. Especially not scanned PDF.

I just tried selecting text in a number of apps from a PDF I scanned using my iPhone in continuity camera mode and I was not able to select text.

Does the method you used to scan the document mention anything about also OCR-ing it? Somewhere in your process the graphic image has to be converted to text. Didn’t that feature (with limitations) get added to macOS? Although I’m still on Monterey.

OCRmyPDF you can spin up in a docker container and run as a GUI for tesseract OCR engine. I believe there is an older thread in the automators forum that had step by step directions if you are not familiar with it.

I didn’t scan it, and I can’t ask for it to be re-scanned - that’s part of the problem. It’s also a little bit low-resolution (not unreadable).

This might be worth looking into. Thanks!

Would Textsniper do the job?

2 Likes

Here’s a link to a PDF software comparison spreadsheet that someone did. I don’t know the last date it was updated:

Owl OCR looks like what you want and the price is great, BF sale for $7.

Edit: But I don’t know for sure that it will retain the table formatting or how it deals with spreadsheets.

1 Like

thanks for sharing. Wow, this spreadsheet is pretty thorough, Thanks to whoever pulled this together

1 Like

OCR tools to create Excel Spreadsheet

  1. FineReader is the best of the OCR engines available for Mac IMO. It will create an Excel file
  2. PDF Pen Pro ( newer name Nitro PDF Pro) will also try and create an Excel file using IMO.
  3. PDF Expert (Premium) will convert to Excel.

I have used 1. & 2. to convert PDF content to Excel with some success. Not perfect.


To rant a bit while on the topic. I am frankly amazed, in the era of how awesome AI is supposed to be, that OCR is not better. When just getting a couple pages “read” it is fine. A couple errors that you can see and correct and move on without thinking much about it.

But I have a workflow that requires OCR of 1000 + pages. The original scanned quality is pretty good. The text is easily read by a human. However, there is a huge number of errors created. In an attempt to end up with a “perfect” product I will take the same document and submit it to two OCR engines and then resolve the “differences” between the two products. It is this experience that is the basis for my statement that Fine Reader does the “best” job. However, it is far from perfect. Using FineReader and PDFPenPro I will end up with more than 10,000 discrepancies in a 1000 page document. That means that one or the other engine made an error in a particular location. Cleaning up this many “differences” is a brutal project.

That said, digits are generally very accurately read. Depending on the content of the scanned Excel-like document, you will probably get good results. In ordinary text, punctuation is a major problem which you generally will not find in a spreadsheet.

I have an old version of Acrobat Pro. The OCR there is good but, at least that version, does not try and resolve line endings as anything more that the original line endings as seen in the original document. Dealing with this is a big problem. Because of the expense, I do not have experience with the newer version of Acrobat Pro (subscription)

1 Like