Rename PDFs to include their Creation Date (via PDF metadata, not Spotlight)

tjluoma · December 26, 2019, 10:45pm

Over on the Hazel forum someone was trying to rename PDFs based on the original creation date of the PDF as found in the PDF metadata which is not necessarily the same date that Spotlight or Finder will show.

Although the OP was making the case that Hazel should include this by default, it does not (as far as I am aware, at least), so I did what I do, and wrote a script.

I called it pdf-rename-by-cdate.sh because I am very clever.

Note that the script requires the tool pdfinfo from poppler which you can install via brew install poppler. I was going to try to parse the info myself, but realized very quickly there are multiple formats used in a handful of PDFs that I tested, and pdfinfo worked with them all, so rather than reinvent the tool, I’m just building around one.

Anyway, it occurred to me that some folks here might be able to use this too.

The script attempts to be smart:

If you give it “filename.pdf” it will rename it to “filename (YYYY-MM-DD).pdf”
If the original filename already has “YYYY-MM-DD” in it, then the script won’t rename it.
If “filename (YYYY-MM-DD).pdf” already exists, it will try “filename (YYYY-MM-DD) 1.pdf” or “filename (YYYY-MM-DD) 2.pdf” etc until it does not find a conflicting file. I think this is basically what Finder does.
If you ask it to work on a file that is not a PDF, it will say “Hey, this isn’t a PDF” and skip it.
If pdfinfo is unable to find a Creation Date, it will report an error but not rename the file.
If pdfinfo is not found, it won’t continue, but will tell you how to install it.

To rename all the PDFs in a given folder, you can do:

pdf-rename-by-cdate.sh *.pdf

It should also work with Hazel or Keyboard Maestro.

nlippman · December 28, 2019, 10:46pm

@tjluoma:

Thanks for this interesting posting!

I also have a script that I used for file renaming. It is written in python, and functions a bit differently than yours, in that it accepts a single filename on stdin, reformats it, and writes the resultant new filename to stdout. I use it from other scripts, Keyboard Maestro, and Hazel to generate replacement filenames, with the script/KM Macro / Hazel rule that calls it deciding when/how to use the script and the changes filenames.

It supports a few command line flags to specific date or date and time to use in creating the new filename, to use the file’s creation date instead, to also lowercase the entire filename, and a flag to indicate that if the file in question actually does not exist (or is not a regular file) then generate an error rather than producing a new filename. All of these options were created for various workflows where I wanted to use this script.

However, the most useful role for this script is that typical when I create a new file, especially in naming a file as part of my scanning processing, it’s fastest for me to use a name in the format “this is the file name yyyy.ddmm.ext” where yyyy.ddmm is obviously the date I want assigned to the file. However, I want the filename to be yyyy-ddmm_this_is_the_file_name.ext, and this script will detect a filename in the former format and convert it to the latter.

I am happy to post my code, but the main reason I am replying to your post is to mention that if you don’t want to install pdfinfo, you can also use spotlight to get the creation dates. mdls foo.pdf will show all of the data accumulated by spotlight on file foo.pdf and you can use various shell tools to parse out the data you want, for example:

theDate=$(mdls foo.pdf | grep 'kMDItemContentCreationDate ' | sed -e 's/^.*= //')

tjluoma · December 29, 2019, 12:52am

That’s not quite accurate.

There is a difference between the metadata that mdls or Spotlight will find and the metadata that exists in the PDF itself. They could be identical, but they very well might not be.

For example, if I made a PDF in 2013, but then emailed it to you in 2019, your Mac would see the creation date as 2019 if you checked it via Spotlight/mdls. But if you want to know when the PDF was originally created, that information is still available in the PDF file itself.

The pdfinfo tool will extract that information.

MartinPacker · December 29, 2019, 9:45am

So essentially poppler is parsing the data stream - to get metadata.

(Off topic but I’ve wanted to set bookmarks based on heading/font criteria. poppler might help me do that.)

tjluoma · December 29, 2019, 8:03pm

Parsing by pdfinfo as part of the poppler package.

nlippman · December 29, 2019, 8:15pm

@tjluoma: Thanks. That’s very helpful to know! I will have to relook at pulling the content creation date out of the PDF data itself.