There has been considerable discussion about using DEVONthink to store and manage files. Much of the discussion has centered on sync issues and lost and or corrupted files. As a consequence, many of us have moved to programs like Obsidian to manage our research files and notes. I have.
But, I use DT extensively as a utility. One of my primary utility uses is file conversion and OCR. I recently read a series of HBR articles dealing with conflict and polarization in the workplace (one of my book chapters deals with conflict and criticism). I had my EA scan the selected articles and send them to me for reading over the weekend (I do my book research and writing over the weekend). The copies-scanner at work does not OCR scans. So, when I started the reading and note taking I took those files and uploaded them to DT, which automatically OCRs the PDFs. After I’d highlighted the key points from the articles I selected and copied the highlights from the DT OCRed PDFs and pasted them to my Obsidian research folder.
This works extremely well when this process is needed. There may be a more efficient way (I’M OPEN TO SUGGESTIONS) but this works. Below is an example of the result as shown in DT and my notes. Note, my notes are copied and pasted, not yet summarized in my own words, which is my next task. I like to extract selected material word for word so that I have the option to insert an extended quote in the book or other articles. I use Bookends for the citations.
Sorry @JohnAtl, EA is Executive Assistant. As to the digital version, I read the articles from the print version because even when I log into my HBR account, there is a charge for the digital version–unless I’m missing something. Below is a screenshot AFTER I’ve logged in.
I’m using DEVONthink’s OCR and conversion tools a lot here. The ability to locate and perform OCR on PDFs that require it — some ebooks, a lot of financial documents — has been a game-changer for me, as has the ability to merge email conversations into one RTF document, trim out quoted text then convert to a PDF for archive.
I’m still working out how to use things like classification, annotations and linking in DEVONthink, as I feel that would help me retrieve information and improve my understanding of various subjects.
That, and there’s at least one limitation: ABBYY restricts multicore in third party apps using the engine. OCRing a few hundred or more pages at once should be several times faster in ABBYY Finereader on an M1 Pro.
Quite possibly. Those GUIs are obviously rough compared to ABBYY Finereader and don’t have the other features, and most users don’t have the patience or ability to always work from the CLI. But why shouldn’t someone like Devon use Tesseract instead? Users wouldn’t notice.
As for as accuracy, I came across an interesting study a month or so ago that suggests Tesseract and ABBYY desktop are about equal in performance.
ABBYY Cloud competes against Google Cloud Vision, Rekognition AI et al and they’re all trying to sell metered use for enterprise/SaaS applications. I don’t have any experience but they all supposedly can outperform all desktop options with training/weighting. ABBYY has a lot of patents and I could definitely see them falling behind if they couldn’t keep their larger competitors from using certain techniques.
I use my ScanSnap iX1600 to scan and OCR documents. It comes with ABBY fine reader which is licensed to the scanner (you need a scanner serial number to install). This works great.
To OCR existing documents, PDFPEN PRO works great. You can even convert PDFs to Word, Excel and other formats by exporting them. The application sends it to the cloud and sends back an editable document in the selected format. Very cool.
Devonthink works great too, but if you are only using it to OCR documents, your wasting the real strength of the app.
I agree that using DT as I am is under utilizing it. However, I ran into syncing issues and others have reported corrupted, missing files, etc. The problem seems to arise primarily with indexed files (at least that is my impression). I could import my files to DT but I don’t want them residing exclusively in a proprietary database and I don’t want duplicate files (in DT and in Finder) so I’ve elected to keep all of my files in Finder and I use Obsidian to access, notate, and link my files.
How does Obsidian work with its non-native file formats (basically Office files). Last time I checked it does not even include them in the file tree view. So basically you’re using Obsidian for Markdown and Finder for the rest?
That is correct. In my opinion Obsidian is neither a good writing environment nor a good repository for files unless they are all in text form–which the majority are not. The majority of my research tends to be highlights from books or research articles, essays, webpages, etc., saved as PDFs. However, I’ve solved this problem in part by using DEVONthink to convert files, including PDFs, webpages, and Word docs, to markdown so I can use Obsidian as needed to annotate and link notes. The fact is, if I fully trusted DT to not lose or corrupt files I’d just use DT for the research because it has good linking capabilities combined with a robust feature set. But, the research is so important that after reading the problems with file syncing and file corruption reported in this forum I’m hesitant, perhaps unfairly, to fully trust DT.
I looked at Obsidian, very briefly. I quickly decided that it adds another layer of complexity and learning curve that I chose not to tackle.
I try to use as much of the free stuff that Apple provides supplemented with DT3, Notability and my business CRM, RedTail, and our business file sharing system, Box.com.
I would not describe my use of DT as “a document inbox”. I use DT more as an “as needed” utility, e.g., when I need to OCR a file(s) or when I need to convert a file(s). I have not been routinely putting files in DT. That said, I’m still considering using DT to once again index my Obsidian research folder (so I have the advantage of both systems) but I’m monitoring developments regarding DT sync issues and corrupted files before I try again.