PDF automation - splitting two page scans

Years ago, I had scanned in a bunch of magazines, for the sake of preservation. At that time, I wasn’t thinking about OCRing the text. Unfortunately, I have scanned in two pages at a time, which I believe will wreak havoc during OCR.

I’m looking for suggestions to see if it’s possible to automate (partially or entirely) splitting a two-page scan, into two individual pages.

Below is an example of a two-page scan.
two_pages

I use an Alfred workflow for this but the post below also has a link to the underlying script:

1 Like

Nice, will try this. I wrote my own PDF joiner/splitter, to separate standard (single-page) PDF pages into separate files, using PyPDF or similar package. I didn’t know I could also split double scans though. I’m going to be busy the next few days trying this out.

Thank you!

1 Like

And having tried it, you definitely want to split before you OCR.

Yes, of course. This is why I posted my question.

I just wish PDFPenPro was easier to automate/script. Unfortunately you can’t call it from the cmd line to do the OCR. I couldn’t get David/Katie’s PDF automation PDFPen script working, but will tackle that issue again, once I get the duoble pages split and back into a PDF.

I use OCRKit which has quite good scripting support.

http://ocrkit.com/help/

1 Like

Thanks for this link. I have been looking for something like OCRKit for a while. This just does what I want.

1 Like

If you have PDFpenPro, you can use the selection tool and then either crop the page or the whole document.

To do it right, you need to make two copies of the original, one for left/odd pages and another for right/even pages.

  1. Keep the left/odds by cropping out the right/even pages.
  2. Repeat for right/evens by cropping out the left/odd pages.
  3. For lots pages, I use PDFGenius (Mac App Store) to merge and interweave the left/odds with right/evens into the proper order.
1 Like