Years ago, I had scanned in a bunch of magazines, for the sake of preservation. At that time, I wasn’t thinking about OCRing the text. Unfortunately, I have scanned in two pages at a time, which I believe will wreak havoc during OCR.
I’m looking for suggestions to see if it’s possible to automate (partially or entirely) splitting a two-page scan, into two individual pages.
Nice, will try this. I wrote my own PDF joiner/splitter, to separate standard (single-page) PDF pages into separate files, using PyPDF or similar package. I didn’t know I could also split double scans though. I’m going to be busy the next few days trying this out.
I just wish PDFPenPro was easier to automate/script. Unfortunately you can’t call it from the cmd line to do the OCR. I couldn’t get David/Katie’s PDF automation PDFPen script working, but will tackle that issue again, once I get the duoble pages split and back into a PDF.