I’m not sure if what I’m about to ask is possible or not but here goes.
Essentially I’m digitising my paper Piano Sheet Music Books by running them through my ScanSnap and saving them as PDF’s. The written data should be available via the OCR’d layer.
My plan is to add them to the iOS music app ForScore, so I can simply place my iPad up on the piano and be able to bring up any piece of music from my collection, but as you can see there’s a lot to get through! I’ll just do a bit here and there whenever I can.
Luckily the de-binding and the scanning of the books is actually quite quick, but it’s the metadata input into the PDF’s afterwards that time consuming!!
So, I was wondering whether any clever person out there can think of any automated way to extract the Title, Composer, etc from the PDF and be able to use that and add it to the metadata of the PDF.
The data in the scans come in a few different varieties, which only makes the problem more difficult as these screenshots show
If I can find a way to extract that text data I can then use it to name the files and map it to certain metadata within the PDF for filing away into ForScore.
As far as I’m aware the available metadata in a PDF is (which is perfect for piano music)
I already own Hazel which I use quite a bit, Keyboard Maestro which I use far less.
I’m more than happy to purchase further software if someone recommends it.
Basically, if someone can solve the text data extraction, I’m happy to use any combination of software/scripts to get the job done!