Is there a way to identify the "original" and the "clones" when it comes to APFS?

I just saw this thread from looking for the same thing on Google. I was interested in this and wrote a quick program to see what physical blocks are used by a file. Here’s what it looks like the OS is doing:

  1. You have a file which is, for instance, 100 blocks long (so 409600 bytes total).

  2. Suppose you make 2 clones of it.

  3. You modify just the first block of the original file. Even though the original file is the “original,” the OS, at least on my system (Monterey), performs a copy-on-write here. It spawns a new block, writes the modified data to that, and changes the original file’s pointer table to point to the new location on disk instead. The two clones are still pointing to the original block.

  4. You modify the first block of the first clone. The same thing happens. The OS allocates a new physical block for the clone, writes the modifies data there, and changes the file to point to the new block. The second clone is the only thing that is still pointing to the original physical block.

  5. You now modify the first block of the second clone. At this point, the OS seems to recognize that no other files are referring to this block, as the original file and first clone now have their first block pointing elsewhere. So, it just modifies the block in-place and doesn’t change anything for the second clone.

To do the above, I would guess that there is some kind of internal reference count for each physical location on disk, not just for each file inode indicating if it’s been cloned or not. This would probably be a different refcount from the ones that others have talked about above, in that it is indeed decremented when clones are no longer pointing to it. I would guess something like this must exist or else the OS wouldn’t know when it’s safe to deallocate blocks at all.

If that picture is correct - and maybe the APFS experts can explain - one could build a very easy and quick probabilistic algorithm, to whatever precision you want, to estimate the “true disk space freed.” Just get the physical disk locations pointed to by N random blocks from the file, get the refcounts for each, and see what proportion of those are 1, and use that as an estimate for the entire thing. It would be very fast, and you could even do this for entire folders very quickly.

Thank you for the detailed replies @vaya
This is too complex and advanced for me to understand, unfortunately.

I guess my goal is more like finding out if a file is the only file in the system or if there are clones of it, or if that was a clone of another file?

For example let’s say I create file1.txt
This file is unique, so if I delete it, it makes that space available again.
Now, if I duplicate that file and create file1 copy.txt, even if I delete file1.txt, the space is not recovered, because file1 copy.txt still exists.

So, it would be good to just have a look at our disk and see that X, Y, Z, files are 100% unique, and if I delete them I will get that space back, and files A, B, C, are not unique. They were cloned, or they are clones of other files.

That would help us get an idea if we are indeed getting space back when deleting certain files, or we are just wasting time. Been there, done that. Spent time going over dozens of files, deleted them, to then see that the space hasn’t increased.

Hope it makes sense

Thank you for replying, but as I previously stated, I understand that. I’m not saying that file A is the original and file B is the clone, as if they share different blocks.
When I say “original” I just mean that file A was where it all started and then file B was created after that, from A. For example if I have “File 1.txt” and I duplicate it, it automatically renames it “File 1 copy.txt”. Even though they both share the same blocks, I would call “File 1.txt” the original, and “File 1 copy.txt” the clone, just because it’s easier to explain what is happening visually. For example, in Logic Pro when I create “version 2” of a song and I add “v2” to the name, even though they share some blocks, I would call “v1” the original and “v2” the clone, just to make it clear what’s happening. That’s all.

Even though I’m not deeper into the technical side of it, I understand the concept. They both share what it’s the same for both, then whatever they have unique, is saved as different blocks.

That said, my issue is still the same. I would love to have some kind of way to identify if a file I’m about to delete will indeed free the space it says it’s using (because it’s a unique file, no clones were created from it), or not.

I don’t wanna sound “arrogant” or anything. It’s just that I already explained a few times that I understand the concept of how things work. That’s not what I would like other users here to clarify, again and again, because it shifts the focus of what is being asked.

If file A is unique, I would like to be able to see that information. So if it’s 10GB and I delete it, I get 10GB back. If it’s not unique, whether it is the “original” file or a “clone” (meaning it was created after file A has been created and used file A as the source material, so to speak), I would like to see that information as well. What happens right now is that sometimes I need to free 50GB, spend 2 hours deleting files carefully to then look and see that I got 1GB back, because most of those files were not unique.

As much as I love APFS, it can be frustrating when you want to free some space.

Hope that clears things up… again.

Way back up thread, I made it clear that I understood what you are asking for. And I still do. But you are still confusing things by saying you want to be able to identify the “original”. Identifying the original is irrelevant to determining if deleting a file will free up space.

As has been pointed out multiple times, the only way to free up space is to delete all the copies of the file.

So yes it would be very useful to know if there are multiple copies and where they are. But identifying one as the “original” and others as “clones” is not useful for the purpose of freeing up space.

Not to sound “arrogant” or anything, but by continuing to think of these as “copies” and not just pointers to the one and only copy of the data shows that you don’t really understand, despite your claiming otherwise.

I was thinking about it and I guess the issue is that I sometimes refer to “original” as one thing, and then to another thing, so let me clarify.

Depending on context, I may say “original” or “clone” when I talk about 2 files that are related. Example:
music.txt
playlists.txt (which was created from duplicating music and renaming it).

I will consider music.txt as the “original” and playlists.txt as “clone”. And why? Simple. If I have a Logic Pro template and I duplicate it to create a new song, in my mind I will always refer to the template as the “original” and every other file derived from it, the “clone”. Yes, they share some of the blocks, and then some blocks will be unique to each “clone” (each file derived from the template file) or even if I change the “original” template.

So, this is one case where I use the term “original” and “clone” here in this thread. Just to explain which file is which.

The other time I refer to “original” and “clone” (and maybe this is why some people get confused) is when I refer to “unique” files vs files that are using the APFS “feature/benefit” (whatever technical term needs to be used). For example, if photos.txt is unique on disk, I will call it “original” in a certain context.
If music.txt, songs.txt, and soundtracks.txt are all sharing the same blocks, because they were duplicates from each other, I will call those “clone” files, all 3 of them.

Yes, maybe the issue is the wording. I totally get that. Maybe I should use other terms for this case, or avoid using “original” and “clone” completely, but again, not being an expert in this, I use the words that make sense to me to illustrate my issue.

All that being said, regardless of this wording issue, my issue with new replies is not tied to this. It’s tied to getting replies trying to explain how APFS works, when I already explained I do understand the concept and I shared examples. So, even if the wording is wrong, it’s already established that I understand the concept. That’s all.

Again, the wording may be wrong here. What I mean by “original” is if the file is “unique on disk”. Maybe I should just say “unique”. No worries.

It’s not even remotely related. I can explain a concept without using the right wording, and still understand it. I could call it “cats” and “dogs” and still perfectly explain the process.

I do understand the process:
I create a file “music”. It uses blocks A-B-C.
I duplicate that file and name it “songs”. This new file uses blocks A-B-C as well.
I modify “songs”. It now uses blocks A-B-C and D
If I delete “music”, “songs” still uses blocks A-B-C (plus D, which is unique to that file).
So deleting “music” will not free any space used by A-B-C (unless I also modified the file and it was using blocks A-B-C and R, and in this case it would free the space used by the R block).
Only when I delete all files using blocks A-B-C will I get that space used by those 3 blocks (which initially was the unchanged “music” file).

So, if you still believe I don’t understand the concept, I don’t know what else I can say? Call it original, clone, copies, unique, cats, dogs, carrots. If you are focused on the names and not on what I have already shown with examples, you are cherrypicking what to focus on.

But at the same time, you already said I understand and you clearly explained what I was looking for. So you DO understand what I’m tryin to achieve. You are just focused on the wording being used, not that I don’t understand how it works. That’s what’s tiring. You are not making a point about correcting the wording, you are now trying to make it sound like I don’t understand the process.

Again, I know I come off as “arrogant”. It’s ok. I just find it a bit tiring trying to find answers to A, and people being focused on B all the time.

The core of the issue is simple:
I wish I could select a file and know if more files on disk are using all or some of the blocks being used by that file. And if so, how much space would I actually free by deleting that file (in case there are more files using some of the blocks, how much space the unique blocks on that file are actually using on disk).

Example: if 50 1GB files show that 50GB is being used on disk (assuming they were all duplicated from the same file, for example sake here), I want to know how much space is being used by unique blocks, per file. So, if I delete 25 files, instead of thinking I’m freeing 25GB on disk, maybe I’m just freeing 3GB, because that’s the amount taken by unique blocks.

Hope this clears things up once and for all. Can we move on from this?

A very smart fellow, Richard Feynman, relates a story about a group of physic students he was talking to.

He asked them, (not exact quotes), “When a stream of photons interacts with a dielectric, what is he result?” They responded correctly, “The light becomes polarized.”

He then asked, “Looking out the window, what can you say about the sunlight reflecting off the ocean?” And they we dumbfounded.

So why they could quote the theory, they didn’t actually understand it.

I’ve understood the core of the issue from the get go. But reading your repeated desire to be able to distinguish between the original and the clones, I see you as these students. You can quote the theory, but not apply it. Which shows a lack of understanding.

You are the one who keeps mixing A and B.

I find it “funny” that the parts you quote from my replies are always simple sentences to prove you are “right” about something, yet when I give a detailed explanation of how things work, you can’t say anything about it. Interesting…

Your example, once again, proves nothing. Those students didn’t explain anything. They answered a question with a sentence. They didn’t explain how “the light becomes polarized”. Anyone can say that sentence when they hear a certain question. So, again, you are trying to prove me wrong using examples that make no sense.

So, to end this nonsense discussion you and I are having, I just ask you: where is the flaw here:

I will be happy to learn something new and I’m ok with being proven wrong. So far, you have been trying to prove something (I don’t know exactly what…), without success. You fail to show me where my example above proves I don’t understand how APFS works.

And YES, I am a student and will always be. You are trying to sound like a teacher, without any skills to be one. Don’t get confused.

I know ChatGPT is not the master of all knowledge, but I decided to give it a try with my example and here’s what I got, in case there’s something else you want to debate:

Your understanding is essentially correct. A few clarifications and precise terminology will make it fully accurate for APFS.

What you are describing is APFS copy-on-write (CoW) with block sharing, which is used for file cloning (e.g., Finder “Duplicate”) and for snapshots.

Step by step, aligned to APFS behavior:

  1. Create file music

    • Logical file music references physical blocks A–B–C.
    • Reference count for A–B–C = 1.
  2. Duplicate music to songs

    • APFS creates a clone, not a physical copy.
    • Both music and songs reference the same physical blocks A–B–C.
    • Reference count for A–B–C = 2.
    • No additional space is consumed.
  3. Modify songs

    • APFS performs copy-on-write only for the modified blocks.

    • If the modification affects one block, APFS allocates a new block (D) and updates songs to reference it.

    • Result:

      • music → A–B–C
      • songs → A–B–C–D (where D replaces or extends a modified portion)
    • Reference counts:

      • A–B–C = still ≥1
      • D = 1
  4. Delete music

    • Reference count for A–B–C is decremented but not zero, because songs still references them.
    • No space is freed for A–B–C.
    • Block D remains exclusively owned by songs.
  5. Space reclamation

    • A block is freed only when its reference count drops to zero.

    • Therefore:

      • Deleting music alone does not free A–B–C.
      • Deleting or modifying all files that reference A–B–C is required before those blocks are reclaimed.

Your parenthetical example is also correct:

  • If music had unique blocks (e.g., R) due to prior modifications, deleting music would free only those unique blocks.

Important nuances that are often missed:

  • This behavior applies to:

    • Finder duplicates
    • cp -c
    • Time Machine local snapshots
  • APFS tracks references at the block level, not the file level.

  • Modifications can cause partial block duplication, not necessarily whole-file duplication.

  • Disk space may still appear “used” until snapshots referencing those blocks are also removed.

Conclusion

Your model is correct:

Space is only reclaimed when the last reference to a block is removed.

You are not missing anything fundamental; your understanding matches how APFS actually works internally.


I rest my case on this :man_judge:t2:

I find it funny that even ChatGPT says that: APFS creates a clone
I guess “clone” is indeed a thing when talking about this?

Ok.

The word “clone” in this case is functionally a term of art used to aid in user understanding of the process. Telling users that you’re duplicating file pointers rather than copying files gets confusing. :slight_smile: Although that’s exactly what’s happening under the hood - APFS is creating a second file pointer.

That’s different from long-established usage of “clone” in the context of filesystems, which has come to mean “a second copy which is completely identical to the first.” “Clone backup,” “bootable clone,” etc. If you tell somebody that you made a clone of your data, they won’t be thinking of APFS file pointers.

It’s even simpler. macOS reports if a file has clones at the filesystem system call level. The “check for clones” tools just read that information and then act if the number of clones of a file is >0. But for this use case you just have to print the file if it doesn’t have any clones!

1 Like

I thought I also had to mention John Siracusa’s Hypercritical: Hyperspace which is relevant to the APFS clones discussion in this thread.

While this does not directly solve @alltiagocom use case, it seems to me that it could be a very valid solution to optimize disk space: substitute totally equivalent files for APFS clones. I haven’t tested it myself but the principle looks solid and one could recoup a significant amount of disk space this way although it will vary a lot depending on the apps and files in the system. Still the tool does things at a deep level and I would not recommend it for casual users.

1 Like