Is there a way to identify the "original" and the "clones" when it comes to APFS?

I believe @alltiagocom understands this. I also believe the question he is asking is how to identify a file that if it were to be deleted would not free up any space (as well as the converse).

That is, how to identify if a file can be considered one of a set of clones, versus being a unique file (the “original”).

1 Like

There is a way to tell, linked in my reply above, but sadly OP is on Catalina and Big Sur is required for the utility.

@alltiagocom I now see that there’s also Precize that runs on Catalina and is able to tell you whether the file started its life as a clone as a special flag is set. The trouble seems to be that when clones start to diverge, macOS and APFS know this but there’s no way for apps to check whether they are still exact clones as Apple does not provide an API for this probably.

Please read the reply by @MevetS because he got it right.

Thank you. It seems that my explanation was clear enough.
As you said, I do understand how APFS works (at least the basic concept, because I understand that there are other things that go into it that are more complex than the average user would comprehend, the technical side of things).

And yes, the goal is to identify if a file will indeed give me X amount of space back or if it’s irrelevant. Again, deleting a file is not a big issue, because it’s fast and doesn’t require any effort. The issue is when you spend 2 hours cleaning up a music project in Logic to then look at the available space and it’s the same, because those files were all duplicated from another file that I don’t remember which one it is.

I will check that, thanks.
As long as I can see that the file is not 100% unique (meaning there’s no other copy of it anywhere else), that’s a first step, because I can then look for files that are 100% unique and start by deleting those instead (assuming that those aren’t really necessary anymore, of course).

Thanks for sharing!

Yes you do!

I understand your pain, but it’s the reverse side of not having duplicate files take any additional space at the beginning. I totally understand your use case as a fellow Logic user myself, though. I think what you experience through the Finder is classical Apple trying to hide as much technical complexity as possible (copying in Finder creates clones by default), but your use case is not solved.

This is an interesting problem, I think I could write some Terminal tool to find clones but as has already been suggested Daisy Disk (or GrandPerspective which is the tool I use) perhaps are already clone-aware and for a small one time prize will give you a nice interface to locate the original files. Will report back on this.

2 Likes

No, I cannot. I am not smarter than hoakley.

1 Like

My counter to this is that the user either wants to retain a file or doesn’t.

I appreciate that the OP is trying to gain spare disk space, but if the disk is full and the user needs to retain all files then they need extra disk space.

If you need a file, retain it. If you don’t need a file, delete it. If you’ve run out of space and there’s nothing left to delete, either archive some stuff or add more disk space (not always easily possible the days)

I’ve never looked at a file and only wanted to delete it if I’m running out of space.

I really appreciate Apple introducing the APFS system, even if it’s not as transparent or accessible to developers as it could be.
Being able to duplicate a file a million times and not seeing the disk space vanish because of that, it’s amazing!

I’m lucky that I don’t work with video, for example, where you need way more space than with music (at least the way I approach my work). But when the time comes I will definitely buy new backup disks that are at least twice the size of my main disk to avoid this kind of issue and having to constantly think that I need to delete files or clean them up, etc.

I will check that GrandPerspective app you mentioned. Thanks for sharing!
Appreciate your help :muscle:

I was thinking it would have to be a compiled app to make the system call to check the filesystem status. It could hook into Finder Sync to mark the clones.

Agreed! Seems someone already did that.

1 Like

You can skip that, I copied some files with the finder and it does not seem to detect clones. Perhaps Daisy Disk.

I wrote this tool that solves @alltiagocom use case, I believe. Requires some Terminal judo, though, but I can confirm it identifies exact clones (at least).

3 Likes

Sorry for the late reply.
I appreciate the tool. Unfortunately, that’s too complex for me, but I’m sure others will find it useful. As you said, it requires some Terminal Judo and I’m still a transparent belt when it comes to that :wink:

Well, things escalated a little bit on my side and I am currently working on a native SwiftUI wrapper that seems to work, but I am hitting a wall trying to distribute it in a .dmg file for download (damned Xcode sandboxing entitlements!). I have never written a native Mac app so this is an alien martial art to me.

3 Likes

Oh wow, that seems like a great tool indeed! :muscle:

So looking at the screenshot, the ref. count is how many files are either exactly the same as that one or have similarities?
For example if I just duplicate a document.txt 3 times (so 4 files total) will the ref. count show 4? And then if I change one of those files, it will still show 4, because it’s still retaining information from the original file?

If so, that’s a great tool and I can see how other people will benefit from that when trying to get some disk space back.

I wish I could help more than just being available to test it once it’s done. That’s too technical for me, unfortunately.

But if you got this far, I don’t think that that issue will be a block for you :wink: You got this!

Yes, it would show 4, I think that is the idea behind what the operating system is reporting. In the screenshot, clone IDs 35373790 are the same file copied with ‘cp -c’ which guarantees a clone file (there was a third clone but I deleted it while testing). So pure clones are detected. There is also a handy “Show in Finder” popup menu.

But then you can see clone ID 638712 that are .aiff files from a Logic Session with me destroying “Let It Be” by The Beatles. You can see 3 reference counts, which makes sense because there are three copies, but there should be another reference count, which is the original .aiff file that I copied with the Finder to my test/ folder, and it isn’t reported so I am at a loss here (perhaps it is not reported because the origin is iCloud Drive).

Then you have the other IDs which are part of the proper source code of the little app, which report at least one clone each but I didn’t clone them (perhaps it’s Xcode doing some magic while building the app for debugging).

So yes, that’s the idea but I am not really sure what would be reported if you edited a clone. In my tests the modified clones are reported the same as a pristine clone, which is no good. Need to do more testing.

Likely a bug somewhere in Apple’s file handling. In my experience, systems that use a reference count model often get out of sync.

Or it could be that, given that the original file was inside a Logic project and I store them in iCloud Drive, I would guess that the copying semantics are a little different there: Finder would not create a clone when dragging and dropping a file from iCloud Drive because the file could potentially be evicted or “optimised” by iCloud, so it would not be a clone even though they really are in the same physical filesystem. But that’s only a hypothesis.

That’s probably it, as I understood the reference count only applied to a single physical volume and would not be synced across to remote storage. iCloud Drive would have it own logic for the file copy it is synced to.