I’m trying to get started on organizing my home document digital filing system. I have some scanned pdfs. I think most are already OCR’d, but some may not be. Two questions:
-
Is there any harm in simply batch re-OCR’ing them all or does OCR’ing a file for a second time wreak any havoc?
-
Is there a way to batch search a folder to determine quickly which files are and are not OCR’d? Right now, all I know to do is pull up each file individually in a PDF application and try searching for text. Surely a better way? (in tinkering with the trial of DEVONthink Pro, it seems like there is a “PDF+text” attribute that does this, but upon some reflection I don’t think I am taking the DEVONthink plunge for now (simple file/folder management and OCR text search should be sufficient for me for now).