Advice please: index or not in Devonthink through the Finder?

I need advice from experienced DT users.

My Current Setup

  • I have three databases: Personal, Research, and a database for the graduate course I teach.
  • Everything in the research and college databases is in DT directly.

My Problem
My personal database is where my problem exists. I have many personal files, e.g., tax papers, receipts, archived papers, manuals etc., on my Mac AND many others in DT. I want to consolidate all of these files. I’m trying to decide if I should move all of my personal files onto the Mac and then index them into DT OR move all of them directly into DT. I am perfectly willing to use DT as it has a lot of power and lots of connections with other apps. But, moving everything either to my Mac and indexing or moving everything into DT will require some juggling of files and folders to get everything setup .

I assume that having everything in DT increases storage needs–if for no other reason than backing up the DT database/s.

Any advice on best practice regarding this? IF the consensus advice is to index the files to DT rather than move all of them directly into DT, then I may decide to take the same approach with the other two databases–move all files/folders to the Mac and index them in DT.

Advantages/disadvantages to either approach? Advice?

Thanks in advance for your help!

Indexing adds a bit of overhead and thus consumes a bit more space. So a 100MB file that is indexed in DEVONthink will add overhead for the concordance (list of words found in the file) that DEVONthink creates in the background to aid search and other features. Size of the concordance depends on content of the file. DEVONthink makes the same concordance for imported files, so the net difference from a storage perspective should be close to zero. Of course, if you do not use DEVONthink at all, then there is no increased storage cost.

I suggest creating a small test database, index a subset of your folders, of known storage size (using Finder’s Get Info), and see how big the database is. Then create a second test database and import those files, and see how big the second test databases is. This will give you an idea of what will happen.

BUT – a few caveats. If you index files, and then start moving them around in the database, or start moving them around in the filesystem with Finder or something like it, then I find the best practice is as follows. Before you index, move all the files that are going to be indexed into a single parent file and index that parent. The indexing will cover all the child folders and files, and as long as you keep your reorganizing within that single parent file, you will not encounter the “missing files” syndrome in DEVONthink.

On the other hand, if your collection if files is not going to be referenced frequently, then putting them into a database is probably a waste of your time. Spotlight – or more sophisticated tools like FoxTrot – will serve you just fine.

The “index vs import” quandary has occupied hundreds of threads in the DEVONthink forum – which is a good place to go if you want from-the-horses-mouth advice. Indexing is a bit more reliable in DEVONthink 3 than DEVONthink 2, and is a very valid way of using DEVONthink. The discussion on which to use becomes rather talmudic and arcane – bottom line, again, is if you will not actually use those fines within DEVONthink, then do not bother.

IMO, DEVONthink makes a lousy archive if all you want is an archive – because you’re adding overhead that you will not be using.

Finally, some folks worry about lock-in with DEVONthink for files stored internally. Files are always stored in their native format; DEVONthink does not change files. And a hierarchy of DEVONthink groups including child groups and files can always be exported at any time, with the same structure in the file system that the hierarchy has in a DEVONthink database. Even at the worst worse case where DEVONthink hasn’t been maintained in years and macOS for some reason cannot run the software, the database itself is nothing but a bunch of files in a package folder, and your files can be retrieved – albeit the internal folder structure is not meaningful to you.

3 Likes

Thank you for such a helpful response! My main use of DT is for archiving so you probably have answered my question—just put everything in Finder and use Spotlight.

I have a follow-up. If I decide to export all of my personal files (perhaps from all three databases) from DT to the Mac and use Finder, I will end up with a lot of duplicates and a mess to clean up.

Is there a way to find file duplicates on the Mac? I don’t use scripts but I’m perfectly willing to buy and use a third party app to find and delete duplicates. Any recommended Mac apps for this if it can’t be done with Finder?

Thanks again!

1 Like

There’s Gemini 2 and other options like Beyond Compare for finding dupes. Gemini is probably your best choice. Be wary, though, it’s easy with this sort of software to zap something you didn’t mean to zap. Don’t do this on the day you’re prepping to testify before the Senate or some other stressful exercise.

The best way to avoid duplicates is first to find them in the file system, before exporting from DEVONthink. Then after exporting, zip up the database and move if off your machine to some other storage device for off-machine archiving.

2 Likes

I have eliminated the duplicates within DT but I just know that I’ll have duplicates on my Mac once I export several thousand files from DT to the Mac. I’ll try Gemini 2 and Beyond Compare. Have you tried both and do you have one you recommend over the other?

1 Like

I think it’s a draw. Different features, different fit and finish, but about the same outcomes.

1 Like

Thanks again for your kind help.

1 Like

DT is my favorite archiving tool. It far surpasses Spotlight to the point I never use spotlight. The majority of stuff is indexed in a DT database were I index my top folder called File_Cabinet.

I do have a DT database that has smaller items but most documents, PDFs. receipts etc are stored in a finder folder and I use DT to index it.

That way I can add stuff to the folder int eh File_Cabinet folder and then update indexed items in DT easily and get all the benefits of robust search in DT and all the ease of directly going to things in Finder.

1 Like

But the whole purpose of an archive is to access and search it and extract useful information from it when needed. So IMO DEVONThink is the superior tool for that job. :slight_smile:

1 Like

What do you mean by archiving?

To me that means a collection of stuff that is easily searched and used as a reference. An Archive in the library sense. So for me DEVONThink is the best tool for creating a robust searchable archive.

1 Like

Do you use “see also” or any of DevonThink’s non-Boolean, AI search and classification features?

Thank you for this topic. I’m currently struggling with this very question — and whether to use DEVONthink at all. And if so, how to use it.

I’ve been going back and forth on this for two years!

2 Likes

That is essentially what I mean. I should add that I seldom access the documents archived. They are there when needed, e.g., tax papers, receipts, research documents (active research documents related to writing are in the research folder of Scrivener and then removed when I finished).

True, but that’s not what I said. If “all you want is an archive” (a virtual box-o-stuff) then “you’re adding overhead that you will not be using”. If you want to use the archive for data mining, possibly to create information, then the DEVONthink overhead can be very useful.

Two different use cases.

As @quorm said, there are many tales of woe on the DT forums, including mine. I’ve blocked that out of my memory now, but I recall one woe involved moving a folder that DT had indexed, and the other woe (possibly combined with the first woe to make a third woe), was that I had some items “in” a database, and some indexed in the same database. It can get ugly, and (as I know you are aware) backups are essential before doing anything invasive.

I think the difference is based on what we think archive means. I use the definition that an archive is an easily accessible collection of materials about a person, place, business, subject or group. In other words a living collection of useful stuff. Not some dusty place files go to die.

2 Likes

Things not accessed often are not technically to me an archive. They may be archived but they are not an archive. But I have a hard time making such distinctions. To me it’s all useful and must be readily accessible and searchable.

After all, I’m the person working GTD projects that have been going on for 30+years, regularly use and reference emails I sent or received 20+ years ago and more. I have hundreds of documents that are no longer available on the net that I found there originally 35 years ago.

2 Likes

Very rarely. Only when boolean logic doesn’t find what I am looking for. Usually that works best for me.

1 Like

Suggests to me that maybe your DEVONthink use predates Houdahspot and other modern Mac search tools, as well as modern iCloud sync to iPad/iPhone and multiple Macs. And that you stick with DevonThink because it works for you and that you might not use it if you were starting today?

Or do you get value from DT that is still not available from other tools?

Forgive me if you’ve already covered this — I know you’ve discussed DT here and elsewhere. I’m trying to figure out whether to continue my on-and-off relationship with DT or quit for good.

1 Like

Devonthink is not only a means to search for documents (similar to the others you mentioned) but it is also a platform for creation of new content in ways that are much more convenient than the other solutions you mention.

If you only collect documents and never annotate them or create notes about them or create metadata about them, then maybe the others would suffice. But if Notetaking, creation of related documents Annotation, or creation of custom metadata about the documents are important to you, then DT3 has no true competition.

3 Likes