Tags - what are they good for?

I rarely use tags. I’d be likely to micromanage.

I try to get a maximum amount of info in the file folders’ names.

I did start using them with Things 3 and that’s about it.

But now I know why other people use them.

absolutely agree. I think in tabs for sure. But don’t use them on IT. I have DEVONthink 3 and more or less just look for stuff from my own mind and via Houdah Spot now. I have folders for some things out of neatness and I like a demarcation between areas of my life, not for ‘finding things’ though.

Using both groups and tags are definitely acceptable.
They aren’t contrary to each other and both are useful, even together.

As an example, say I own a rental property.
I could create a group for 123 Any Street.
Files in that group could be tagged real estate, investments, or retirement, etc.
You could even tag the files with a city, state, or country name, e.g., Anytown, FL, US.

You could then search for tags:FL to look for any files - in this case rental properties - in Florida. Or create a smart group to display rentals in Anytown.

Do bear in mind, you should be judicious with your tagging.
If you’re generating thousands of tags, this could affect the performance of apps on the machine, not just DEVONthink.

2 Likes

Also another downside of huge tagging structures (say more than one hundred!): there is definitely a mental overhead when entering stuff into the system, because everytime you move something out of the Inbox you need to remember your taxonomy of tags and set them accordingly.

I used to use have lots of tags in Evernote because you could not nest Notebooks in it, and finally ended resorting to Evernote’s powerful search. Now in DEVONThink I can nest Groups and structure becomes visually more obvious (and also have the powerful search feature too =) .

So, to answer @TudorEynon point: yes, you can mix folders/groups with tags, but be liberal with groups, and conservative with tags.

2 Likes

Thanks for the follow-up and yes, I’d agree with your general assessment. :slight_smile:

Don’t you find you duplicate your renaming rules across the different rules?

No - the renaming rules generate different names for things like energy invoices, bank statements, investment reports, service charges, receipts, etc. All have in common date matching (so that some form of the date of the invoice will appear in the renamed file) but name generated by each rule is unique.

The “move” part of the rules is common because all documents are moved to the DEVONThink golbal inbox for automatic filing by a number of DEVONThink smart rules (assisted by the unique names of the documents being filed).

Stephen

1 Like

Tags and folders are the best we have for file management, but every implementation that I have encountered – be that in the file system or document management systems like DEVONThink – is fundamentally flawed.

As soon as you have more than a handful of files, you start needing some form of file management to locate and retrieve them. The strategy for which we reach is that of divide and conquer; we break things into groups of like objects and, if any group is too large to be easily comprehended, subdivide into smaller groups as needed. In doing this you are forming sets and proper subsets within them. In the file system, sets are folders and subsets nested folders.

Here lies the first problem. File system search operations, like listing a folder (directory), should by default be recursive and they are not. If I have the folder structure for, say, years and months (2021/January, 2021/February etc.) and I request a list of the files 2021 files, what I get is the folders January, February … not the files within those folders. This is counterintuitive. If I have a set its content is its closure, its members, be they members of any proper subset or not.

The second problem is that, despite the fact that Unix like file systems have since their inception supported hard links – i.e. the ability for an inode (an actual “disk” object) to be referenced from more than one directory – it remains spectacularly difficult/ impossible to create such links in the Finder, or indeed any other file manager of which I know.

Hard links are, however, conceptually powerful. They allow you to locate a file within multiple, independent classifications. I can have a client/ project folder structure along side my year/month structure and the same file can be saved into or found by navigating either. The spectacular difficulty lies, I suspect, in designing an intuitive file save dialogue that lets you specify not one but a list of folders into which a new files should be saved.

Even that, however, would not be enough. Although hard links let you assert that a given file belongs to multiple sets, nested sets, when searching you also need to be able to find the intersection of sets. Again this is a file management application/ user interface problem. I want only the files for Bob’s project X from January 2021; file system alone lets me get either project X or January 2021, but not the intersection.

To address this many seem to suggest that search, not structure is the solution. Rather than place a given file with some fixed structure, just search by date or other item of metadata. A tag is just an item of metadata attached to file that represents membership of a set, so the intersection required is a matter of filtering the January 2021 folder by the Project X tag. If needs be, one can embed that search as a “smart folder”, which would be fine if you could also save files to smart folders and have them tagged to meet the search criteria.

The problem with tags, however, is that there is no way of imposing structure them. There can be a tag for “Bob” the client and a tag for project “X”, but one can’t establish that project X was one of Bob’s projects. At a more general level, even if one could relate tags to one another, you also need to be able to state the nature of that relationship. To understand this better a good place to look is Princeton’s WordNet. Tags are words, typically nouns; and nouns can be related to one another in many ways – hyper- and hyponyms, synonyms and homonyms to name the most common.

As an aside, the metadata problem goes deeper in that as far as I know you can’t declare the type of, or value constraints for, metadata values; I can’t, for example, enforce that a metadata item I expect to be a date is in fact a date within some legitimate range, or that a “contract type” value can accept only one of a predetermined set of legitimate values.

The infrastructure – the file systems under our feet so as to speak – is what it is. Whilst the file system would be the best place to structure metadata, that’s not going to happen. What’s needed, perhaps should say what’d I’d like rather than need, is a file management application – a new and better Finder replacement – that (i) treats sets properly and understands set operations; (ii) supports set “meta-relationships” in the manner of WordNet; and (iii) allows types and constraints to be imposed on metadata.

Comments and thoughts welcome!

2 Likes

The difficulties listed here exemplify why I think the operating system needs to do a better job at being the external brain.There are several third-party solutions that attempt to make data storage, classification, and retrieval easier, but to me it really seems like it should be part of the OS. I wrote about this years ago on my blog, but basically, I’d like to see macOS (and perhaps iPadOS?) incorporate the abilities of Hazel with the AI classification and “see also” capabilities of DEVONthink.

There are workaround ways to address specific issues that you list, but the overall problem is that computers today simply aren’t good enough with large amounts of unstructured data. What I’d like is to be able to dump whatever I want into it, and let the system classify, name, tag, and store the information in the appropriate place and make it easy for me to find it later.

The combination of Hazel rules, a folder hierarchy, and Finder tags does a decent job for me, but it’s still got a long, long ways to go.

As soon as you have more than a handful of files, you start needing some form of file management to locate and retrieve them.

Actually, tags are one way of relieving the need for a hierarchical filing structure.
Files can be searched for by tags.
Data segregation can be done in the Finder via smart folders and in DEVONthink via smart groups, creating virtual containers of related items regardless of where the file is actually located.

That being said, I am not suggesting people give up folder structures unless it makes sense to them. A combination of the two can be a useful and powerful thing.


Re: Hardlinks. Yes, they are a powerful concept but their utility is reduced by the way most applications save their files. On saving, a new file, hence a new inode, is created so the hardlink in broken.

tritonx@TritonX test for smartrule % ls -li e.txt
40835827 -rw-r--r--  1 tritonx  staff  5 Sep  2 09:36 e.txt
# Inode: 40835827 
tritonx@TritonX test for smartrule % open e.txt
# Open the file and make a small chenge.
tritonx@TritonX test for smartrule % ls -li e.txt
40839353 -rw-r--r--@ 1 tritonx  staff  22 Sep  2 10:07 e.txt
#Inode after save: 40839353 

On a side note: DEVONthink has replicants, which function like hardlinks. Changes to any instance of a file propagate to the others.

1 Like

Thanks and to @DEVONtech_Jim too. I might try them again on that basis.

1 Like

This is why in the QDA world, it’s recommended that you keep a code (i.e. tag) book with definitions from the outset. See Codebook with Category Definitions - MAXQDA for a fairly sophisticated implementation.

Another thing that really helps is smart prompting with existing tags - a lot of my tagging is done by copying keywords (i.e. tags) to BibDesk file attachments, and I’ve put together a script which will prompt me with a list of keywords when I add a new publication. It’s not terribly sophisticated - it assumes authors write about the same topics and journals publish on the same topics, but that’s enough to get me about 40% accuracy & I edit tags from there.

What’'s so hard about using ls -R? Or column view in Finder? Not to be flip but this seems very easy to overcome.

Nothing per se, I’ll just comment that ls -R is not the default behaviour for ls and that it reports full pathnames relative to the current working directory, not filenames (and yes, I know I could pipe the results through grep to strip them).

If you view subdirectories as proper subsets, the pathname conflates the name of the file (object) with the name of the sets (and supersets) containing it. That is clearly valid behaviour for defining an object locator (address), but the initial problem – the problem that folders and tags and metadata all attempt to solve (badly, I suggest) – is actually knowing what we have. In the parlance of the Web, a filename is an URL, but when we’re searching for something we’re likely more interested in its URI.

@DEVONtech_Jim Hmmm… is that a side effect of copy on write? When I used to write Unix software (for Sun workstations, which I guess dates it) we used hard links all over the place and I don’t recall that being a problem. Moving to macOS X I stopped relying on them because you couldn’t create them in Finder and I got too lazy to keep doing do everything in a terminal window!

It is due to the use of atomic saves which save changes to a new file then overwrites the original. The new file obviously gets a new inode. The benefit of an atomic save is if there is a crash while saving,at least the contents of the old file is still preserved. The downside for us nerdier folk :wink: is it makes hardlinks useless regarding file changes.

@DEVONtech_Jim Ahh, thanks. When you were working off a 100Mb (yes, Mb, not Gb) spinning disk…

Also, removing a hard linked file… does not recover disk space until the other hard links are removed.

Hard links can be confusing for regular users, that’s why it’s a operating system feature that Apple decided not to put into the Finder.

1 Like

Two other problems with hard links – they don’t cross file systems and you can’t hard link folders.

True story

you can’t hard link folders

I sometimes wished Apple had let us in on the black magic they use in Time Machine to hardlink directories in their backups.