Split scanned PDFs by blank pages...automatically?


#1

Hi guys - new here and first post so Hello :grinning:

After spending an entire day searching for a specific letter from my bank, I’m currently taking early steps into scanning and archiving home documents. Initially, I thought great - dump a sheaf of papers in the feeder and press ‘Go’ and have Hazel sort out I’me rest (Im also new to Hazel, but that’s another post…)

Alas - it seems I can scan each piece of paper as its own PDF (not ideal when a statement is several pages for example) or the whole sheaf as a single PDF.

I thought it may be possible to split by inserting a blank page between different documents to act as a trigger to split them but can’t find any software that might automate this.

Any suggestions/help in this regard would be greatly appreciated.

If it helps I have a brother ADS3600 scanner (Got it new, for a steal and didn’t have the budget for a scan snap) and PDF expert / DTP / Hazel (but happy to get additional software if it fixes things).


#2

This is something I had to deal with too, unfortunately I don’t have any amazing solutions for you.

I split my documents into two stacks, one page documents, and multi page documents. The multi page ones were alternated landscape and portrait (one document portrait, one landscape, to make it easy to grab a document at a time later). Then I put the single page ones through, having it split on each page, got my noise cancelling headphones and a good podcast playlist, and did the rest - document by document. It wasn’t a very efficient workflow, but I got plenty of podcast listening done and didn’t scan many documents I didn’t really need as a result.


#3

Not a solution, but fortunately you have PDF Expert, and copying and pasting pages goes quickly.


#4

Why bother separating the files? Just OCR them and search. You can separate individual files if needed later. Most won’t be needed, and you can probably put your time to better use.


#5

Thanks guys - I figured splitting into multi and single page documents as a start. Also, yes splitting out of PDF expert is easy (if manual). The one big bucket of all sorts of PDFs approach is appealing though -so may give that a try (at least for the documents I’m pretty sure I won’t need often).

If I get a chance I might have a play with python and see if there is an automated solution with its pdf modules.


#6

I’m not familiar with PDF Expert, but perhaps it includes a tool similar to the “split” tool in Adobe Pro. I frequently have PDFs in excess of 1000 pages. I run through it and add bookmarks at the start of each new document. Then I use the split tool to separate them into individual PDFs at each bookmark. Set your options to “split by top level bookmark” and “use bookmark for file name,” and voila! Instant individual files! If you name your bookmarks efficiently, your files will automatically sort in the order desired. Yes it takes time to set up the bookmarks (be sure to avoid using characters that are not permitted in file names and you also don’t want to have identical bookmarks), but it’s all automatic after that.


#7

Dragging pages individually or in groups from the thumbnail view in Apple Preview on the mac off to the desktop works quite well to create new PDFs from a big scanned stack. Totally manual but quite quick. Option+command 2 to show thumbnails.


#8

Hi @Nick73! How au fait are you with using the Terminal?

For fun, I have written a Python script which I think will do what you want.

I’m trying to figure out how to make it into a standalone .app application — and I’ll let you know if I do! — but in the meantime if you are comfortable installing Python 3.7 (python.org or https://www.anaconda.com/distribution/) and running a few commands from the Terminal, I can walk you through it.


#9

@Nick73 - I got the same setup and am wondering what is the most efficient way to get the docs in. Do you use the touch screen on the scanner to initiate the scan to a folder? Or image capture on the mac, or other software? When does it get OCRd?


#10

Well here is the Python script to do the pdf splitting, in case it’s useful for anyone.