How to organize files, redux

nlippman · June 1, 2019, 1:17am

Recently, I posted asking for suggestions on naming and organization of files, with my proposed file naming scheme. I got a lot of useful feedback which helped my thinking quite a bit, so I thought (since I always enjoy reading about how others have approached this problem) I would do a follow up post on where I am in my organization and thinking so far.

I have decided on a scheme for file and folder naming that is a big change from my previous approach. I used to name many files in the format of “some text .ext” where date was in the format of yyyy.mmdd. I tend to do a reasonable amount of work at the terminal shell, and for those who also do so, you know that spaces in filenames can be a real problem with bash scripting. I was also thinking about prefixing files with the date rather than post-fixing for sorting purposes, although I was on the fence about that. I also tended to haphazardly name folders with, with spaces and other punctuation in the names as well.

After some consideration, I decided to go with a naming scheme for files in the format of “yyyy-mmdd-hhmmss_filename.ext”. The hhmmss part is optional, and the filename can be multi word, with all words separated by underscores. The helpful part here is that parsing filenames in scripts becomes much easier with a standard format; the date/time can be parsed off at the first underscore and the extension at the only period in the name. Filenames are all in lowercase.

This is not a hard and fast rule; some files have no date/time prefix depending on location and purpose, but I will follow the rule of no embedded spaces or periods (except before the extension) to parsing can still be done easily if needed.

Folders will all be done in CamelCase (but with the first C in Camel capitalized in contrast to the usual programming convention) and with no embedded spaces or punctuation.

I find that old habits die hard, and so I still wind up creating a lot of files with the name in my old format, and messing up folder name creation as well. As a result, I wrote two scripts, one of which fixes filenames and the other fixes folder names. The filename fixer is smart enough to detect the date at the end of the filename and move it to the front, reformat spaces and other punctuation, etc. It has switches that allow me to specify that if there is no date in the filename, add a specified date (or today or now as keyword shortcuts) or pull out the creation date from the file’s metadata instead. Both scripts take a filename and output a filename, and so basically serve as “filters” which can be embedded in shell scripts, called by other programs, used in scripts by Keyboard Maestro, or by Hazel, so they are versatile. In this manner I have embedded the “smarts” in one script which everything else can use in common. Hazel can watch a folder, for example, and rename files to the correct format getting the fixed filename from the script.

I adopted one suggestion from the earlier thread to keep my folder hierarchy much shallower than I had before, and have restructured folders quite a bit. This is working well; it’s much easier to see what is there with a longer list at the top level rather than having things buried (and so hidden) multiple levels deep.

I also spend some time rereading the old posts from Brett Terpstra on his “tagfiler” program, and after a lot of experimentation and trial and error, basically found that his scheme was better than any alternative I tried out. Under this concept, I have designated a few “top level” folders where I store things. He calls them contexts, I just see them as areas where I file things for different purposes. Those folders are tagged with a “context tag” starting with an equal sign. Under each area, folders can have a tag with an “@“ sign first character which is basically a short cut down to that part of the folder hierarchy.

Files get a context tag which tells which folder hierarchy they belong to, and what I call a “path tag” which is colon separated tags that basically chain the file down the folder hierarchy to its desired place. These two tags allow automatically filing into the folder tree.

I won’t go further into how this all works as I wound up very closely following Brett’s approach so you can, if interested, read it all there. I did write my own filing script to do all this, partly because I wanted to fine tune or adjust a few things, and partly because I just like to do that. This filing script again can be used by Hazel to gather up files from a common “Dispatch” folder for automatic filing once the tags have been applied.

I also use a relatively sparse bunch of additional single-word tags that add additional context, such as my name, my wife’s name, kids names, etc, and other topics (like house, manual, photography, etc) to help with searching.

I adopted a convention that “context” tags start with an equal sign and are a single work, first letter capitalized, just as Brett specified, and the context tag applied to a file to get it to be filed away is the same work but starting with # instead of =. Tags on folders to make shortcuts start with an @ and first letter capitalized as well. When a tag is created on a file, however, it is all lower case. This is, I think, identical to what Brett specified so I think I followed his scheme pretty closely. There are a lot of advantages to this scheme when it comes to searching for things with Spotlight which he describes very well.

I realized that one useful feature is that when you are tagging a file, the autocomplete in Finder is very helpful when you are trying to get a file to go to a common location. So if I had a file with a path tag of a:b:c because it was in a sub folder path of Context/a/b/c; typing “:a” and being able to autocomplete is very helpful. However, sometimes the first file to arrive in a folder was simply dragged there. If I could easily create the tags that woudl have automatically moved the file there with the auto filing script, then autocomplete can be used for future files that also need to go there. Accordingly, I have a script that looks at a given file, figures out the tags that would have gotten the file there (going up the folder tree and checking folder names and tags on folders that indicate shortcuts and context) and then creating those tags for the file. This is handy for creating a folder somewhere, dragging a file to it and using this script to create the tags that are later applied to other files using autocomplete.

Having just watched David’s Keyboard Maestro field guide, I have developed a renewed interest in playing with Keyboard Maestro, so I now have a palette that is opened with the “META” key, and which allowed me to apply the autofile, tagged, file renamer and folder renamer to the files or folders selected in Finder, making it very fast and convenient. For example, I need to create several sub folders, I just create them, type the name in all lower case with spaces between words, select them all and run the macro from the palette to convert them all to CamelCase, making typing much easier but keeping my system consistent.

I am now working on renaming files throughout my filing system, piece by piece as I come across areas to work on. I have another script that can recursively go down an entire folder tree and rename all of the files with the new scheme, but will probably run it very selectively over time as I see how all this works out.

And that, after far too long a post, is where i am and what I have been working on since my last post on the topic. At the least I had fun writing these various scripts over the past few nights, and I feel that consistency of organization and naming will be a benefit in the long run.

Now to think about whether I want to fire up DEVONThink again after a long absence and index some of these folders …

Jonathan_Davis · June 1, 2019, 1:28am

Can’t wait to look at this thread later this weekend. Good stuff @nlippman

r2d2 · June 1, 2019, 4:12pm

What a useful way of thinking about directory structure and file names. Can you share screenshots of your system?

Where are the tagfiler posts?

I use date time stamps in two different way.

For files that are date specific like receipts, letters, and statements, I use dates as a prefix yyyy-mm-dd because of the importance of the date and sorting. Ex: 2019-05-30 amazon $254.99 purchase.pdf
Files that are documents like proposals or models, I want multiple versions and thus I use dates as the suffix of yyyymmdd followed by a letter signifying the version on that date. Ex: acme proposal 20190530c.doc

nlippman · June 3, 2019, 12:23am

@Artoo:

I would be happy to supply screenshots, but I’m not totally sure what to show. Let me know what you might be interested in and I will try to comply.

I agree that not all filenaming falls into the scheme I have outlined. My thinking is that for files going through multiple versions, I will likely rely on the prefixed date to indicate the date/time of creation/editing, and perhaps append _version_1.0 etc to the end. My current scheme is to post-pend a date, so pre-pending the date works just as well. By searching just for the “filename” part, eg in 2019.0602_this_is_a_file_version_1.0.doc, I could search for “this_is_a_file” and get the various versions in sorted order, which would be true even if I didn’t append the version part.

The “tagfiler” posts are all on Brett Terpstra’s website at https://brettterpstra.com/; use the built-in search box to search for tagfiler. I think this: https://brettterpstra.com/2013/12/20/automatic-filing-with-hazel-and-mavericks-tags/ is the most useful to describe what he does and you can follow other links from there.

His tagfiler script (in Ruby) is available from github and he has various posts that explain how to use it from the command line and/or Hazel. If anyone is particular interested I can post my equivalent python script (there are a few differences, but they do the same thing) and how I use it in KM and Hazel.

r2d2 · June 3, 2019, 12:41am

nlippman:

I adopted one suggestion from the earlier thread to keep my folder hierarchy much shallower than I had before, and have restructured folders quite a bit. This is working well; it’s much easier to see what is there with a longer list at the top level rather than having things buried (and so hidden) multiple levels deep.

I also spend some time rereading the old posts from Brett Terpstra on his “tagfiler” program, and after a lot of experimentation and trial and error, basically found that his scheme was better than any alternative I tried out. Under this concept, I have designated a few “top level” folders where I store things. He calls them contexts, I just see them as areas where I file things for different purposes. Those folders are tagged with a “context tag” starting with an equal sign. Under each area, folders can have a tag with an “@“ sign first character which is basically a short cut down to that part of the folder hierarchy.

Files get a context tag which tells which folder hierarchy they belong to, and what I call a “path tag” which is colon separated tags that basically chain the file down the folder hierarchy to its desired place. These two tags allow automatically filing into the folder tree.

Screenshot was to help me understand snd see examples of this. I like the idea of having a shallower hierarchy. I’m trying to train my brain to think in certain ways so that i only have one path for a file to fit.

OogieM · June 4, 2019, 12:38pm

Not the OP but I also use a very shallow filing structure: Here is an example, the top level is TDRC_Filing_Cabinet

23%20AM

I have filing “cabinets” for each organization I am a board member or officer in and one for myself.

r2d2 · June 4, 2019, 7:08pm

What do you mean by filing cabinet? Is that your high level folders?

What’s the best way to rename files with spaces to underscores?

OogieM · June 4, 2019, 8:21pm

Yes I use Filing_Cabinet in the folder names for my top level folders.

I would use either NameChanger or A Better Finder Rename 10 to do the renaming. NameChanger does fine for simple replacements like that.

A Better Finder Rename10 is better for complex prepends, suffix and adding sequential numbers and the like. I use it for editing image filenames from scanners and digital cameras to my official naming scheme in my picture archives.

memex · June 4, 2019, 9:09pm

Forklift has a powerful renaming tool too

r2d2 · June 4, 2019, 9:09pm

Cool. Also do you use dash or hyphens or only underscores?

OogieM · June 5, 2019, 1:48am

I use hyphens to separate date tokens, ie yyyy-mm-dd_rest_of_the_filename_with_all_spaces-as_underscores.suffix.

I also use hyphens to separate larger groups that have subsequently been subdivided into smaller sections i.e Sheep_Diseases eventually became Sheep_Diseases-OPP, Sheep_Diseases-Scrapie, Sheep_Diseases-Overeating_Disease etc. but only when there were too many items in the tp level that I needed to subdivide them.

One segment eventually became Sheep_Diseases-Scrapie_Genetics, Sheep_Diseases-Scrapie-Causes, Sheep_Diseases-Scrapie-Mad_Cow_Link, Sheep_Diseases-Scrapie-Kuru

r2d2 · June 5, 2019, 1:52am

Cool. I’ve struggled with different ways to setup and create file folders.

Why not use CamelCase?

nlippman · June 5, 2019, 1:59am

@r2d2: @OogieM has given a great example of shallow folders; she was the one who made the suggestion in response to my earlier post that made me rethink the shallow vs deep organization for my folders.

In my case, I had, for example, a top level folder called RSync (because it is sync’d between computers using ResilioSync). Under that I had a folder called “Database” which had a bunch of subfolders, and also a folder called “Work” for work related folders, and a folder called “Documents” for … well, documents. I was always drilling down into subfolder after subfolder, for no good reason, so I moved all the folders under “Database” to be directly under RSync, for example, Yes, there’s more in a single Finder window, but I can see everything and drill down to it.

If you want to do file renaming, there have been a couple off good suggestions for file renaming utilities. I found my needs a bit more complex, and so I wrote my own. The issue for me is that I have files typically named “some text with spaces or . maybe periods.ext”, often with a date at the end as in “this is a file 2019.0430.pdf” and so forth. I wanted a way to rename files to the format “yyyy-mmdd-hhmmss_file_name_with_spaces_and_periods_changed_to_underscore.pdf” instead.

The idea is that in the most general sense, feeding in a file name with a date at the end causes that date to be prepended to the beginning. If there is no such date, then prepend EITHER a date specified OR the file’s creation date, taken from the spotlight metadata, first using the content creation date and if there is none using the date the file itself was created. Sometimes I want the time included (hhmmss) but usually I don’t. There may be times when I want the date NOT included. Spaces and periods (except the one separating the filename from the extension) should be replaced with underscores.

I wrote a python script that basically takes in a filename and emits a corrected filename. I use that script both within Keyboard Maestro and from Hazel to rename files.

Both uses are convoluted, unfortunately. In Hazel (used to automatically rename files in a particular set of folders), the Hazel action if the file meets the correct criteria (I use a tag to tell Hazel to do the rename) has to run an AppleScript which runs a bash script to run the python script (yeah, I know) because only ApplesScript can return a value to Hazel to use to rename the file.

From Keyboard Maestro, the macro executes on all files currently selected in the Finder, and uses a bash script to run the python script to get the new filename.

For the most part these scripts apply the date criteria but do not add a time; that’s my preference.

For renaming Folders, I have another python script to fix folder names and that runs via KM as well.

I did it this way because even though I want my filenames to be uniform as posted in the OP of this thread, I don’t find typing them that way to be very convenient. For instance, typing a space is must faster than an underscore, and typing the date (yes, I have text expansions to add the current date quickly, but still) is sort of inconvenient, and often the date (for example of a scanned bill) is NOT the date I am doing the scan and hence not today’s date. I find it faster to type “my bill 2019.0604.pdf” than to type “2019-0604_my_bill.pdf”, and then the script grabs the file and fixes the name.

The script is smart enough to know that a) a date in the hyped filename takes precedence over the file creation date, and b) a specified date takes precedence over the creation date, which means it now pretty much “does the right thing” nearly all of the time.

And yes, I realize this is a rather complex scheme, but once implemented, it makes a lot happen automatically. I also have a program that can recursively scan a folder tree and use the renaming script to rename all the files in the tree if I need to equally apply these rules to an old folder of files of arbitrary depth.