Thoughts on file naming and organization

nlippman · May 11, 2019, 2:49pm

Advance warning: this post is probably going to be too long.

I have been thinking about how I name and organize my files. I wanted to through out a collection of my thoughts to see what others think and might suggest. Like probably everyone else here, I have a huge number of files which I have tried to organize in some fashion that allows me to find things, but often find myself searching (sometimes unsuccessfully) for what I need. So here’s what I have been trying to think through.

My current scheme is basically to try to name files with something relatively descriptive, and then to put files in some fashion into a rather complex folder structure that in theory lets me chase down what I need to by drilling down through the folders. I often append a date (in yyyy.mmdd format) to files when “relevant.” For example, my checking account statement for April 30th would be called “checking statement 2019.0430.pdf” and would eventually be filed somewhere like Scans/Bank Statements/Checking Account/checking statement 2019.0430.pdf.

I made the decision to append the date rather than prefix it in the filename so that if I had multiple different types of files in a given folder they would sort by the type of file, eg all the “checking account” files would sort together as would all of the “savings account” files if they were in the same folder, while if I pretended the date, then they would be intermixed. This is, in retrospect, unnecessary if these files are in different folders, of course.

I use tagging in a fairly limited fashion so far, primary as a trigger for Hazel to be able to sort files. For example, I have a single folder called Dispatch where all scans wind up, and once I add a tag “personal” then Hazel can move the file into the folder tree for my personal files, where they can they be further sorted based on filename or content to the proper folders, while files tagged “business” can be similarly sorted by Hazel in to business folders. I haven’t adopted a more extensive tagging system beyond a few special cases to assist in sorting and finding.

Part of the problem here is that a) my rules for Hazel are incomplete, so the folder of things that need sorting is generally packed with files, b) many things that are scanned or otherwise received or created are one-offs that don’t necessary fall into the preset Hazel rules, and c) a mistake in naming the file causes it to remain unfiled.

Recently there was a posting here referring to an old blog post by “Dr. Bunsen” where he talked about how he handled files, and I found that post very interesting. He names all of his files with a prefix that starts with a date time (yyyymmdd.hhmmss) followed by his own sort of two character coding scheme to indicate the type of file (he has a series of categories and urgency levels) followed by a a filename that is descriptive and then an extension which sets the file type (eg .pdf).

I was struck by a few things here. Firstly, he is essentially encoding what would otherwise be considered metadata into the filename (eg the creation date/time). Secondly, but doing it this way, the files in a folder are by default represented in a chronological fashion (with alphabetic sorting, not depending on setting the sort order to “by date”) which has some appeal to me given that it may be more common to remember roughly when I created a file compared to what I might have named it. Arguably, his encoding of the file designator is basically putting a single tag into the filename.

He basically uses only three folders, an Inbox where everything created is initially swept, and Active folder for files he is working on, and an Archive folder for things he is no longer actively working with.

I would not necessarily adopt the three folder approach, as I still think there is value in having to some extent a folder structure to group files related to a specific activity or task, but I am less convinced that the extensive folder tree that I have put together is useful either given the need to search and drill through so many levels to find things. For example, it may be easier to search for all files in a folder named “All Bank Statements” for any files with “checking account” in the name, especially with advanced search tools including Spotlight indexing and HoudahSpot.

Further, the three folder paradigm could be imposed as a meta-folder structure. For example, using tags “active” and “archive” I could have all files daily swept into an Inbox folder, then go through them and add active and archive tags. Then move the files into a more shallow folder structure as may be relevant for filing away (bank statements) vs moving in to a working folder tree for things I am working on, and then have smart folders that are based on file with these respective tags, creating a meta-folder structure. It seems to me this might be a good working paradigm.

In contrast, Brett Terpstra has posted on his blog about a much more complex tagging scheme that he uses with his “tagfiler” script, which auto-files things into what sounds like a fairly complex folder structure based on a hierarchical tagging scheme, which he then depends upon for complex searching. WHile intriguing, I suspect this will be too complex for me.

I have also debated the idea of incorporating file meta data into the filename (eg the file creation date and a category designator as Dr. Bunson does) vs using filesytstem metadata for this purpose. Using file system metadata works better into the Spotlight based indexing system of MacOS, but makes the files less portable to non-MacOS systems, but is not a major impediment as it would be easy to create a script to rename files by inserting metadata into the filename vs extracting metadata from the filename and creating tags with it, etc.

Not every file I create would necessarily need such a complex name. For example, my photography library would probably not benefit from adding the filename stamp into the filename, because those files are not really organized and utilized the same was as other files in my file system.

I am not sure I would necessarily put a category designation into the filename because I am not sure I could easily creation a comprehensive listing of categories I would use a priori, vs simply adding tags as need arises.

So, what I am leaning towards is the idea of name all files in a format with a leading date time stamp (yyyy.mmdd-hhmmss) format, and underscore, then descriptive words for the file content, and the file extension. I would add tags as I presently due to aid Hazel in filing, recognizing that a lot might wind up unfiled and simply be stored in an unfiled folder unless/until a purpose arose for creating a folder to group some files together. Additional tags would be added as appropriate, but the number of tags would be kept relatively small so make it easy to find and search using built in tools and/or HoudahSpot. My current archive of files would be rearranged (over time as I had free time for this) to reduce the huge number of folders presently there, largely based on need. For example, the frequency with which i need to pull out the past 2 years of statements for my checking account is very very low, so searching is a better solution than putting in the effort to create that kind of sorting into folders, but searching still makes it easy to find as needed.

I hope that some of you have made it this far and can provide me with some thoughts and ideas as I start to think about how I will improve my filling system and access to files.

Joost · May 11, 2019, 2:54pm

I’d say, spend a few bucks on DevonThink and do all your organizing there. Much more flexible, much more capable than a coded approach as you describe.

JohnAtl · May 11, 2019, 4:00pm

Chiming in on DEVONthink, it has the ability to automatically categorize documents after it sees where you’ve put similar documents. It also has smart rules to automate processing of files. more here

I don’t use these features (yet), perhaps someone else could elaborate on their workflows?

ddunbar3 · May 11, 2019, 4:17pm

If you are mentoring or collaborating and have to trade files back and forth, include a revision number and perhaps the initials in the file naming scheme. Being able to find the last edited copy that you made can save you when a junior staffer makes a major formatting/editing/deletion.

JohnAtl · May 11, 2019, 6:36pm

This tutorial on ScreenCastsOnline goes into the details of DEVONthink’s Auto Group and Auto Classify.
Parts one and three of the tutorial are worth a watch too.

dfay · May 11, 2019, 8:37pm

For most purposes a time stamp in the file name is probably overkill - the date should suffice and looks cleaner.

OogieM · May 12, 2019, 1:00pm

The long form is:

I am extremely cloud averse. Not just for security but that is now my main reason. I do my digital filing almost entirely on my own machines. The one exception is starting to move some old archive files to cold storage on Amazon Glacier.

First off file for retrieval not for storage. I.E. File things based on how you will search for them not to be efficient. If you are looking for a file and it’s not in the first place you look, when you eventually find it move it and its brethren to a the place you first looked.

Define a clear naming scheme that is platform independent in case you switch operating systems. Specifically no spaces in file names for example so I could switch to UNIX if necessary. Define the naming system so that looking at a name will get you most of the way to knowing what’s inside the folder or the file.

Standardize how you will use dates if any and what formats keeping in mind how computers sort things. Dated files are in the format YYYY-MM-DD_filename.extension Circa dates use -00- in place of any missing data. Dates that are not circa but to a single level just go that far i.e. 2016_filename.extension or 2016-01_filename.extension Range dates use _ between the data ranges i.e 2014-10-05_2015-01-01_filename.extension

Use standard file formats that are open source or ubiquitous for my system as much as possible. (ODT, ODS, PNG, TIFF, CSV, SQLITE, JPEG, PDF, ZIP, DNG etc.)

Define a very flat filing system that mimics a flat paper system. I have 5 digital “file cabinets” as folders on my main computer. They are labeled Active_Projects, File_Cabinet and then 3, one for each organization where I am a current officer. Within those I have a single layer of folders that sort by name. In my system all someday/maybe folders are kept in my main file cabinet folder, not separate. I don’t tag and I prefer a large number of folders in a flatter structure rather than nested folders. If I have collections of files with lots of metadata, as in my photos, those data re cataloged in something else. In my case LightRoom and not in the filing structure. I’ve been burned by Finder file metadata being lost through transitions into and out of systems so I never depend on it. If the file needs metadata attached to it I make sure it’s in its own separate database somehow.

Decide whether to lump all someday/maybe and waiting for files into the main system or into separate folders. I ended up putting them all into my File_Cabinet folder and do not segregate reference from project support for currently inactive files.

Decide what parts of the system need to be mobile. I use DEVONThink for all electronic filing I need to have with me all the time. I am experimenting with using DEVONThink to index my main filing cabinet folders so I can take advantage of the search functions.

Plan for a robust backup system that you also test regularly. (An untested backup is worthless. Make sure you can actually retrieve files from you backup by testing it.)

Plan for how to review the filing system to remove unneeded files on a regular basis.

And from experience:

When switching from whatever you are using now to a more complete and well defined system dump everything into a new folder called backlog and explicitly move files out as you rename them and decide where they should go. I didn’t and I’m still weeding out the junk within my otherwise nice clean system and I redid mine almost 10 years ago!

Edited to remove caret symbols causing all sorts of weird formatting.

anon41602260 · May 12, 2019, 1:39pm

I used Dr Bunsen’s method for many years after his original post. (His blog was great — too bad he stopped.) I have TextExpander snippets and DEVONthink scripts with fill- in forms to make the method easy.

But in recent years I have abandoned the whole practice of encoding meaning into names — other than using yyyymmdd in “important” files.

The reason: search on macOS is excellent. Spotlight, DEVONthink, etc. make fancy naming methods pointless. It is the content of files I need to search, and the name is superfluous. I rarely cannot find what I’m looking for.

nlippman · May 13, 2019, 6:34pm

Thanks to all who responded. Great input and very helpful.

I am leaning towards prefixing files with a date/time stamp, in the format yyyy.mmdd (which has been my longstanding date format) or yyyy.mmdd.hhmmss if I want the time included as well, with both formats being used as needed. Using ‘00’ for missing elements makes sense as well. For future parsing needs I will probably not truncate (eg yyyy.mm vs yyyy.mm00) because the former means more work parsing later, but I’m still thinking that through.

I have been a DevonThink user in the past, got away from it, but I may go back to it with version 3 - I downloaded the beta but have not had time to work with it as of yet.

I too store all of my data locally on DAS storage on my iMac Pro, with a the data distributed between a folder structure only available on the iMac Pro (but shared via smb for remote mounting to my laptop) and a folder sync’d to laptop, iPhone and iPad via ResilioSync. I do have a backup strategy, which is based on daily Carbon Copy Cloner clones to a second DAS device (both are Drobos); TimeMachine to a separate drive; a daily bootable clone to another drive; and offsite via Are to BackBlaze B2. It has been a while since I had to restore but I have done from from the Arq remote storage, and I check the CCC clones periodically to ensure they are working. Plus, both CCC and Arq email me on completion which I always look for and check for success or failure.

Part of my interest in incorporating meta data into the filename is for platform portability. I have not, as @OogieM mentioned, had a Finder failure and lost metadata, but I did run into an issue where Sync.com (which is was using for a while) turns out not to sync Finder extended attributes which severely messed up my Hazel based workflow and other things. ResilioSync, fortunately, does not share this problem.

That being said, while I don’t plan to leave the Mac ecosystem any time soon, having documents portable to whatever the future may bring is not a bad idea. It would not be hard to script grabbing Finder metadata and adding it to file names if I really had to.

Previously, I tended to use a post-fix date format, eg files were named " yyyy.mmdd.extension". That actually led to a few issues, such as a typo in the descriptive information (say for a related documented like a monthly checking account statement) would lead to not-evident files out of sort order due to the type, check ‘chcking 2019.0430’ would not sort between 'checking 2019.0331 and ‘checking 2019.0531’ in an alpha sorting order as desired. Prefixing the date/time would fix that, and is probably worthwhile for me to pursue.

I also tended to use spaces as separators in descriptive naming which is actually a pain when scripting or working at the command line, so I am moving to the use of underscores.

Rather than moving the old files into a location and pulling them out as needed, I have written a script that can parse my old filename format and rearrange the elements to the new format. It replaces spaces with underscores, handles some common typos (eg date> (missing space between the elements), - (dash instead of space), etc. I will probably run the script on folders as needed until everything has been converted over.

Still a bit of an exploration for me, but getting there I think.

OogieM · May 13, 2019, 7:59pm

Might want to exclude the . from within the filename else lots of the rest will be considered part of the extension/filetype. I use - between elements of the same field and _ to separate fields .

nlippman · May 13, 2019, 8:57pm

@Oogie:

That’s a good point, and I have considered it. There have been many a battle on MPU about . Vs - for date separator…

I might well decide to switch to a different delimiter. I won’t use underscore as that is the field delimiter between the date and the rest of the filename and also inside elements within the filename as well.

I have used the . character in my dates for years without running into issues with the extension, and parsing is easy by just pulling off the last field after splitting on the . character, but I can see the value in switching to a - for that purpose. Since I am using a script to rename the files anyway, and will be using some snippets (in Alfred as I have not gone back to TextExpander as of yet) to generate formatted datetimes, I might well just switch over to the dash character.

It will certainly make parsing easier in the future. Maybe I will have the script change all . characters in the body of the filename to a _ as well…