Tag or identify the parent folder of a file

I have to tag, or somehow, identify a large amount of parent folders that have a PDF with “5077” in the PDF file. The screenshot I attached should help show what I mean.

We have 5,800 Transaction folders each with some subfolders and PDFs. I query Finder and see that 250 PDF’s contain “5077”. Now I need to identify the parent folders of those PDFs with a tag, or rename them, or somehow identify them. Then I need to do that process a few more times replacing “5077” with a different query. Any ideas?

I also have apps like Hazel and Keyboard Maestro if that helps. I’m not opposed to buying an app that helps me. Thanks for the advice!

I suggest Hazel – which is more adept at watching folders and changes within them, than Keyboard Maestro. Try posting this question also at Noodelsoft’s forum.

This might be a job for A Better Finder Rename. I believe this is the correct link, but on my phone, so hard to tell.

I would suggest Hazel, which you already own. Note that I just checked this in Hazel 5, which I installed tonight. I don’t recall if this can be done in Hazel 4.

You can create a rule matching ALL of the following. Make the first element that the “kind” is “Folder”.

If you then option click the plus button to add a new rule, you get the “subrule” construct. In the subrule, on the first line of it, you can read "If of the following conditions are met for ".

When you click the dropdown, you can it to “any of its subfiles or subfolders”. Add the condition in the sub condition to “name contains 5077” or “contents contains 5077” depending on which you need, AND that the kind is PDF.

This should match any folder that contains a file or subfolder that is a PDF and has 5077 in it (or in its name depending on how you need to create the rule).

For your action you can do whatever you want to accumulate these folders, eg apply a tag “5077” to the folder.

Then you can use Spotlight to find all of the folders tagged with 5077, or create a saved Finder SmartFolder to keep track of them.

Tell Hazel to run the rule to get everything populated, and of course apply this rule to the top level folder than contains all of the folders you need to search.

Note that there is a hitch to this: if you have a PDF two subfolders deep, every folder “above” that PDF is going to get the tag. It’s much harder if you only want to tag the top level folder of each tree.

In that case you might need to play with extra criteria in Hazel to ensure that a folder is NOT a subfolder before tagging it. I have’t played with how to do that in Hazel easily.

If all of the folders you want to do this for are under one top level folder, let’s call it “A”, then you could do something in a command line construct, sort of like:

for f in A/*  # Go through everything in folder A
do
  if [[ -d $f ]]  # Only process if this is a subfolder of A
    if [[ $(find $f -type f  -name ".pdf" -exec grep 5077 {} \; ) ]]
    then
      # This folder - $f - contains a pdf that has 5077 in it
      # use a utility like tags to add a tag to the folder $f, or use xattr to write the tags
    fi
  fi
done

Please note that I HAVE NOT tested the command line approach in detail and there may be some syntax errors, I just typed it straight into here. If you want to pursue that approach make sure it works on a test folder (like a copy of the folder tree you want to use it on) before turning it loose on real data!! Obviously it identifies pdfs by having a .pdf extension (but not if it’s .PDF!) so you might have to tweak to suit your actually data.

You could replace find with mdfind, which will be faster and you can use mdfind easily with the filesystem metadata to find pdfs, BUT mdfind only finds files that have already been indexed with Spotlight, and so can miss unindexed files - which might include anything recently created. If this is something you will only run once or infrequently, I would go with find which is slower but more comprehensive as it scans every file in the provided folder.

Hope this helps.

1 Like

I occurred to me: If your scenario is such that you have a top level folder, and under that folder are all of the folders you would want identified in some manner, and then the PDF files of interest could be any depth below that, then you could use the Hazel “Subfolder Depth” criterion to only apply the tag to the first level under the top level, which would potentially solve your problem without resorting to command line hacking.

The nice thing about the Hazel approach is that it will automate the process, so anytime you drop a PDF that means the criterion (eg contains 5077 or whatever else) into a subfolder that is not already tagged, Hazel will take care of it for you.

It would be cool to use the new list and table matching criteria in Hazel to put all of the criteria for your PDFs (you mentioned there were several contents you wanted to search for, not just 5077) into one rule. You could use the criteria “Contents” “Contain Match” and then add the new Label or Table match criteria from the pop-up. Using Table, you could specify a tag for each criteria (eg if 5077 is found in the file, the Table can let you set the tag to “this_folder_is_tagged_5077” if you wanted. Since you can set these match criteria to actually read in the list or table from an external file, you could have a separate location where the match criteria are kept that is easy to update without editing the Hazel rule.

I’ve been thinking about how to use these new match criteria for my own Hazel rules, and you have just provided a great use case. Just a thought.

1 Like

Caveat emptor.

This terminal command will create a batch file of rename commands (mv commands) that should do what you want. Just change into the top folder, then execute the command:

find . -type f -name "*5077*" -exec bash -c ' DIR=$( echo "{}" | sed -e s@^.@@ -e s@/@_@g  ); echo mv \""{}"\" \""$DIR"\"  ' \; >rename_batch

Then open rename_batch in an editor and check things out. If it’s okay, you can:

sh rename_batch

and the files will be renamed and moved into the current folder.

The batch file will look something like this:

cat rename_batch
mv "./parent 1/parent 2/doc named 5077 something copy 2.txt" "_parent 1_parent 2_doc named 5077 something copy 2.txt"
mv "./parent 1/parent 2/doc named 5077 something copy 3.txt" "_parent 1_parent 2_doc named 5077 something copy 3.txt"
mv "./parent 1/parent 2/doc named 5077 something copy 7.txt" "_parent 1_parent 2_doc named 5077 something copy 7.txt"
mv "./parent 1/parent 2/doc named 5077 something copy 6.txt" "_parent 1_parent 2_doc named 5077 something copy 6.txt"
mv "./parent 1/parent 2/doc named 5077 something copy 4.txt" "_parent 1_parent 2_doc named 5077 something copy 4.txt"
mv "./parent 1/parent 2/doc named 5077 something copy 5.txt" "_parent 1_parent 2_doc named 5077 something copy 5.txt"
mv "./parent 1/parent 2/doc named 5077 something.txt" "_parent 1_parent 2_doc named 5077 something.txt"
mv "./parent 1/parent 2/doc named 5077 something copy.txt" "_parent 1_parent 2_doc named 5077 something copy.txt"

Change the *5077* in the first command to find the files you’re interested in.

1 Like

Thanks for the reply! Seems Hazel is the best bet. Upgrading to version 5

Thanks for the reply. I have to rename files a lot and batches of them too. Will check this out for sure!

Thanks for the comprehensive reply! I’m going to try this today and I bet it will work. Fortunately, It doesn’t matter if all the folders under the parent folder are tagged too.

Thanks for this. I’ve never used the terminal before. I’m a little scared of it, but mostly too lazy to learn about it. You’ve inspired me to finally get around to learning it. I see so many problems that it solves, so it’s time.

1 Like
2 Likes

Bart’s explanations are awesome!
I recommend listening to the series

1 Like

@jec0047 pointed out you were looking for pdf files with 5077 inside them, rather than in the file name. This is a bit more complicated than the solution I posted above, as getting text from inside pdf files can be challenging.