Question about photos, duplicates, and which ones to keep

I’ve been trying to clean up my Photos library following a number of years of neglect, multiple imports, and all of that.

I’m looking at it again today, and I see a number or duplicate photos with unmatched metadata, so Gemini doesn’t pick it up. I’m going to include two “Get Info” panels of an identical photo. One, though, is about half the size of the other. I’m guessing I keep the larger one, but I’m not sure how to automate that for 1000s of photos, unless I just do it once a day for a year and see what happens.

Thanks for any advice!

I don’t have the app in front of me but can’t you get Gemini to do a less rigorous matching?

I’m currently doing the same task. It is my understanding that the file size of the photos is different because there is additional metadata associated with the photo with a larger file size (see backup - Why do copies of the same photo have slightly different file sizes? - Photography Stack Exchange). The images are the same. I have been keeping the photos with the smaller file size when cleaning up my library.

In terms of automating, I have Gemini, PowerPhotos and PhotoSweeper. Each has it’s strengths and weaknesses. I choose which one to use depending on the task I’m doing. For what you are describing (removing duplicates images with different file sizes), you’ll find PhotosSweeper really useful as it displays a histogram and the metadata for each photo. That info gives me a quick way to confirm that the two images are indeed identical except for the file size.

Maybe? I need to give it a look. When I ran a sweep earlier, it said, “no duplicates found.”

Interesting. I wonder what metadata that might be. Any idea how to check it?

As I understand it, it can be metadata added by photo management software (e.g. facial recognition). My photos were just the usual snapshots, so I didn’t look too deeply into it.

I haven’t used PhotoSweeper like @annee but I have used PowerPhotos (and its predecessor iPhotos Library Manager) for several years. Fatcatsoftware offers a free trial if you are interested.

FWIW I don’t think metadata alone will cause that big of a file size difference. The jpg compression used in photo apps causing more of it. Try open up a jpg file in a separate folder a number of times (with different apps) and you will see its size change over time.

Also some apps use save file settings that reduce quality to 70-80%. If this is done multiple times over years or by different apps this can significantly reduce the file size. Both photos in your example are 12 years old, so this is a likely scenario. I always keep a copy of the original jpg in a backup drive and use the “working copy” in photos or lightroom.

As storage is cheap, suggest to keep the large version of the file

I’m a wimp when it comes to relying on automation to pare document and photo libraries. My only trustworthy method is to look at the assets, compare, and decide to keep or delete. Storage is cheap.

Thanks for all the help, everyone.

Now I’ve got another issue in that I’m seeing some similar photos, but there are some that are JPEG and some that are HEIC. Obviously, the JPEG are bigger in file size, but I wonder which ones I should keep in the library.

@simonsmark raises a good point. The difference in file size wasn’t +/- 2 MB, as it is in @monster94’s example, when I wrote off the difference to extra metadata. I also had not done anything to the photos except either copy the files (i.e. no opening the file and then saving) or import them once into Photos. And, as I said, the photos weren’t extremely important to me. I also made multiple backups of my unsorted photos before I started culling them so that I could get images back if I had second thoughts about my decision.

Compression is certainly worth considering. It is my understanding that Photos doesn’t compress photos on import. But it’s not clear if @monster94 has had the photos imported into any other digital asset management/photo preview software at some point.

One way to determine if compression, metadata, or both is the culprit may be to look at the average size of your photo files from that camera. Perhaps one file size is obviously an outlier. I also relied on the histogram feature in PhotoSweeper to help me tell if duplicate photos had meaningful differences. As I understand it, compression will show up in the histogram.

Many years ago I had a hard disk failure and used recovery programs (can’t remember which ones I used - I was in “panic mode” at the time) to recover my “lost” photos. There were a few thousand photos. I ended up with many copies of some photos. As an example I might have 5 copies of the same photo with the same obscure name but with a sequential number appended. The photo file sizes may have ranged from 400 kb up to 25 MB (like 400 kb, 900 kb, 1.1 MB, 9 MB, 25 MB). There were no differences in quality as they appeared on the screen (even when enlarged) nor when printing. Metadata all looked the same except for the file size. I used a combination of Gemini and PowerPhotos to remove the duplicates. I looked at the sizes of the photos for the camera that was used to take the photos and kept the photos whose sizes were most similar. At that time the appropriate file size of photos for the camera I was using ranged from about 1.1 to 1.4 MB. Two last comments: 1)Before I started removing the duplicates, I duplicated the folder and used one for removing duplicates and one as the archive folder. I deleted the archived folder a few years later. 2) I now have a good backup system.


I’d keep the original photos. Assuming these are iPhone photos, that would be the HEIC photos.

I have a ? I have often wondered about.

I take a photo and it goes into Photos app. Then I make a duplicate (in the same program- Photos). Is the duplicate “as good as” the original?

And I cannot take a jpeg and make it RAW, right? (Wishful thinking on my part!)


