OCR Produces Duplicate Text

gkgriffith · April 6, 2026, 3:28pm

When I perform OCR on a pdf the results frequently duplicate words and/or phrases. This has been happening more and more frequently. What can be done?

I’m using Nitro PDF Pro (26.0 in Setapp).

Also, what are other recommended PDF apps that do good OCR?

LisaSpangenberg · April 6, 2026, 8:34pm

It may be a PDF that also contains a text layer, in addition to the image layer.

I no longer have it, but the PDF Pro app used to be able to extract the text from the document via the text layer.

KVZ · April 6, 2026, 9:39pm

Are you perhaps OCRing a document that has already been OCRd? Getting a second text layer is not unusual if the document already has a text layer.

If you open an OCRd PDF in macOS Preview, do Edit > Select All, then Copy, then open a plain text document in TextEdit or other editor, then paste the contents of the clipboard, you’ll get the text layer. It will NOT be formatted and may be ugly.

Katie