I finally decided to move from Evernote to Devonthink because security of my content in the cloud was too risky for me…and the annual price keeps going up. So here is my journey migrating my content to Devonthink.
It was no small task getting my Evernote contents into Devonthink and still have it fully searchable. One of the big benefits of Evernote was its OCR of attachment contents. I found it to be one of the best. Google has pretty good OCR but has a 2Mb file limit, PDFPen has a great product. I found it was outstanding with documents but hit or miss on images. Devonthink has an okey OCR feature. I tried several others and have come to the conclusion that FineReader by ABBYY had the best results. The reason I had to research these OCR tools was because the notes coming out of Evernote had only half of the OCR text. I tried several ways to get the data out and here is my results.
Imported EN notebooks directly into Devonthink Pro
The notes come into DTP as formated text and not available to convert to searchable PDF. Anything typed into your evernote note will come up in a search but nothing else.
EN Note printed to PDF and dragged into DTP
Notes come in to DTP as PDF+Text but only for the text you actually typed into the note, nothing in images is searchable…web clipper captures anyone?
Even running Searchable PDF did not capture text in images with any degree of accuracy.
EN Note Save PDF to DevonThink Pro
This had the same result as printed to PDF AND you lost the title of the note and any tags as well.
The note would come in as untitled.
EN Note printed to PDF and OCR’d with FineReader
Notes come in to DTP as PDF+Text AND the text in images are captured /searchable as well.
There is no need to run ‘convert to searchable PDF’ in DTP.
I’m sure there is a way to do this as a single script but I needed to do this in bulk for many EN notes.
This is my setup
-
I created one folder on my desktop to receive my EN PDFs. I called it Fish Tank1…I have to have some fun. I spent a whole Saturday testing this out.
-
I created a Services > Folder Action calling FineReader by ABBYY to convert the doc to PDF and save the output to another folder called Fish Tank2.
-
Another folder action watches Fish Tank2 and takes any file, adds it to DTP and then deletes it.
Once my fully OCR’d pdf document was in DTP, I would review the title, add any tags and move it to the appropriate database.
I don’t have to worry about database placement too much since the new doc is fully searchable.
My database structure in DTP is pretty sparse as I rely mostly on the search ability.
I also have a hazel rule that cleans up my Fish Tank directories with any files older than a day so I don’t have any duplication.
Some lessons learned.
In FineReader set image quality to high. Even with this setting I found some degradation in picture quality but it was manageable.
Every method I tried, lost tags. I just add them back again in DTP.
The number one factor in successful OCR is image quality…300 DPI or better will get you great results.