Is there a way to identify the "original" and the "clones" when it comes to APFS?

I have a file and I make 10 copies. Is there a way to identify which one is the original file that was used to create all the 10 copies?

The reason I’m asking is because sometimes I have a folder that says, for example, is taking 10GB of space and I need free space on my disk. Then I go and delete that folder, but because the files in there are a copy of another folder, the disk space doesn’t go up.

I would like to have a way to identify which files I can indeed delete to get free space back. Is there a way to do this programmatically or with an app?

I believe there’s no way to detect this easily as cloning is done by APFS and is quite low-level as far as macOS is concerned. Perhaps you may try Sparsity, linked in the article below, but given the posts below, Apple’s documentation on this seems to be scarce so it’s unlikely there are any other utilities out there.

Also, if the file/folder you deleted was indeed still a clone, you would not get any disk space back as none was used for the clone to begin with. As the cloned file begins to change and diverge from the original, APFS starts using more space for the changes.

1 Like

Thanks for the link. Yes I was Googling about this and saw some comments about Apple not providing information about this.

Yes, I just wanted to know for example if the folder and files I’m about to delete are original files or clones. For example I copied all Apple Loops folder to a different location and I remember renaming them, which by default doesn’t make a difference, meaning it will still be a clone, right? Just adds a bit more space being used to indicate that the file changed slightly, right?
Now the thing is that I don’t remember if some of the files there were actually new files created by me that I added to the folders, so when I want to free some space up, I don’t want to be deleting folders and files thinking I’m getting that space back, when in fact it won’t make any difference.

But if this is not possible, I guess I will have to learn how to live with it :wink:
Thanks for clarifying.

Does it matter which one is the original. APFS should be (and I believe that it is) clever enough that if the Original is deleted, one of the other 9 copies is marked as the original which the remaining hard links point to. If this wasn’t the case you could accidentally delete the “original” and you’d be left with 9 pointless hard links and your data is gone.

It’s only if you delete all copies on APFS that you get the space back.

1 Like

I suppose Apple does not want us digging that deep (and probably for the right reasons when the file system is concerned). :slight_smile:

However would it help you if you were able to compare files and folder structures? To compare two files quickly, you can use the Terminal:

diff file1 file2

If there’s no output, then files are the same (i.e. it will not list any differences); otherwise diff will say that they differ. diff is usually used to compare text files (but you are working with the binary files):

To compare the whole folder structures and files within, you can use tools like Meld. There’s a macOS GUI variant of Meld here:

(I haven’t used it in a while though. I see that it has not been updated in a while.)

And if that was the case, they would not be clones or hard links, they would be symbolic links. From what I remember from older Unix filesystems, when doing hard links there is no difference between originals and clones. A hard link is not a reference to another file in another directory. It’s the file that manifests itself in several places at the same time. This explains why only when you remove the last hard link you free space: because the duplicates were not occupying any disk space in the first place.

APFS may add more complexity to this (in case you add changes to a clone), but I don’t think the basics changed that much.

Edit: to add that this is all created with ‘ln’ and ‘ln -s’, if @alltiagocom used the Finder to copy the files from one place to the other, I am not so sure what magic APFS does. I guess the journalling capability makes it basically unnecessary to mess with hard links, it is smart enough to not use additional space unless you change the copy.

hoakley points out the differences here APFS hard links, symlinks, aliases and clone files: a summary – The Eclectic Light Company. Clone files start out exactly the same because they point to the same exact set of file locations on the disk!

3 Likes

All of what you have said is true (at least up to the point of saying “APFS may add more complexity to this”) but I’m not sure I understand the point you’re trying to make (sorry if I’m being dense).

Depending on the scenario, but to me, it does.
Let me give you an example:

Right now I need to have extra free space and Finder says I have 10GB left.
So I look at a folder that says it’s taking 100GB (Logic Pro files). Let’s say each one is 1GB. So I go over 100 files, open them, optimize them (in Logic it would mean using the Clean Up option to delete files no longer needed in the current project) and I can go from 1GB down to 5MB on each file. Awesome!
Now, after spending 2 hours working on this, I look at Finder and the available space… still 10GB… all because the files I deleted are most likely clones. So I just wasted 2 hours doing something that did absolutely nothing.

So yes, knowing if a file is the original or a clone can help the user save time and organize things. So for example if we could run a script or open an app that would say: if you want to free up space, delete any of these files (shows list), along with how much free space it would create, that would be awesome.

I love the APFS system, but in this case, it’s a bit ambiguous. You end up not being able to really trust when it says “X amount of available space”, because you can’t just delete a file and get that space back.

I guess in certain scenarios when you know which files to compare, yes.
But for example let’s say you have a document.txt file and you duplicate it 100 times, then rename them all with completely different names. After a while you have no idea which ones to use to compare. So it’s not helpful all the time, unfortunately.

For certain scenarios, it actually does.
Read this reply to understand when it does, in a particular context:

Did you end up trying Sparsity? Seems to work fine for verifying if files in a directory are clones. I seem to average about 1% in user folders. If you’re having trouble freeing up disk space because you’re always copying your projects to new folders and not going back, try an app like DaisyDisk to get a big picture of where you’ve put original files you’ve copied and forgotten about.

If your big files are mostly shared block clones and you’re low on disk space, then you probably need a bigger drive because of all your non-clone files that are crowding your audio folders.

I’m still on Catalina and it seems that it doesn’t support that?
image

Would this be similar to Sparsity?
I’m a newbie when it comes to this subject, so I apologize if that’s a silly question…

What do you mean by this?

No problem. Daisy Disk is different. It gives you a nice overview of where your disk space is going and some tools to clean it up. It makes a chore pretty fun. One of the best $10 I’ve ever spent on an app.

It would show you if you had a forgotten folder somewhere with gigs of files you’re no longer using.

I mean, it sounds like your goal of freeing up a lot of space may be impossible. If you have low free space and your audio files you’re optimizing are mostly clones or sharing blocks, then you have a bunch of other non-clone files that are eating your space. So you’d do best to find those other files first, hence the DaisyDisk suggestion, but ultimately you probably just need a bigger drive so you can keep everything without having to clean up as often.

1 Like

You’re thinking about this incorrectly, probably because the language you’re using isn’t appropriate for how APFS works. There is no original file and there are no clones, that’s not how APFS works. But explaining it requires an understanding of how data is stored on a disk, that’s not about files, it’s about blocks of data and tables which link those blocks to a filename.

Think of it this way. If you create 9 “copies” of a file. What you actually have is one set of data saved on the disk and 10 links to it. It’s not quite this simple, but it’s close enough.

It doesn’t matter which file (I.e. link) you delete, the data will still be there for the other 9 links, pick 9 of the 10 “files” and delete them, the data still exists on the disk and is referenced by the 10th “file”. It’s only if you delete all 10 links that the data is then marked for deletion.

In contrast, by making changes to one of the “copies” of the “original” you still have the original in it’s entirety, but the data you changed takes up MORE space as the changes you made to the copy need to be recorded.

If you really want to understand what’s happening you need to read a lot more about how APFS and how it handles data. I’ve tried to find a video explaining it, but can’t. It’s hard to explain in text bout would be easier in images. This is the sort of thing where (in my IT career) I would have found the nearest whiteboard to draw.

3 Likes

You’d probably end up with something similar to this on the whiteboard… :smiley:

Modifications to the data are written elsewhere, and both files continue to share the unmodified blocks. You can use this behavior, for example, to reduce storage space required for document revisions and copies. The figure below shows a file named “My file” and its copy “My file copy” that have two blocks in common and one block that varies between them. On file systems like HFS Plus, they’d each need three on-disk blocks, but on an Apple File System volume, the two common blocks are shared.

3 Likes

Well, mine would have been a bit more artistic (read “wiggly”) :wink: but yeah, that’s the gist. Thanks @dario

2 Likes

I did some tests yesterday trying to see what has been reported here, using the Finder to create copies but running df from Terminal and new copies didn’t take any space, it’s not a Finder issue, it’s at a lower level.

I believe we are all talking about the same thing, but maybe the way I’m expressing myself is hard to understand, or maybe you are looking at it differently, so I will try to explain the best I can and why this topic started and why it’s important for me to know what I want to know. I will keep it as simple as I can and if this is still “confusing”, I will just leave it at that, because there’s really no other way to explain it and we will all be wasting our time debating it.

Let’s use 3 files only on a 5GB disk with just those 3 files. So the disk is formatted as APFS.
File A - 1GB
File B - 1GB
File C - This is a copy of file A, so on an APFS disk, this won’t take extra space, correct?
So on disk, physically speaking, they’re only taking 2GB, even though, when I look at Finder, file C shows me 1GB so when I see them side by side I can see that all 3 have 1GB associated with it, but the disk is not using 3GB, it’s using 2GB. At least for now, because I haven’t made any changes to file C.
That to me, is my understanding of APFS. And if someone here still thinks I don’t understand what APFS does, then I guess the ones to blame are the people online who share this exact same information. But from what I read in many different places, this is universally accepted. An exact copy of a file (without any saved changes), will not take extra space (sure, it will maybe use a few KBs so the OS knows that there’s an extra file, but if the original file (File A) is 1GB, the copy will not be 1GB).
So to me, I call File C a “clone”. If that’s not the right term, I hope you can at least understand the concept and why I call it a clone. Call it whatever appropriate name is supposed to be called.

Now that we got that out of the way…
Why would it be important for me to know which files are “original” and “clones”?
To be clear: I call “original” when a file is unique, no copies. “Clone” would be either File A or File C, even if File C was created from File A. “Clone” as in “not-unique”.

Let’s say that along the way I have other files and I get to a point where my 5GB disk is full, meaning I have files that take 5GB as well.
Thing is, I forgot that File C (without any changes) is a copy (aka “clone”) of File A. It’s been a long time and I don’t track which files I duplicate or anything.
So I look at a list of files and I see File B (which is “original”, meaning it’s the only file for a particular content, it was never duplicated) and since it’s taking 1GB I decide to delete it, and now my disk will have 4GB of used space and 1GB free. Right?

Now I keep looking at the list and I see File C, also showing that’s taking 1GB (it’s not, because it’s an exact copy of File A, but I don’t know that, it’s been a while). So I go ahead and I delete the file and to my surprise, my disk is still using 4GB and still only 1GB free.

Can you understand why it would be important to know if a particular file will in fact create more free space?
Now, in this hyper simple example, I am just talking about deleting a file, which takes 1 second, but when I look at 100 music projects that can be modified (meaning, delete unnecessary files) to show 5MB instead of 1GB, the process of making 100 files go from 1GB to 5MB doesn’t take 1 second or even 100seconds. It takes 2 hours or more. Now if all of those 100 files are exact copies of an original file (the one they were duplicated from), it doesn’t really make a huge difference if those files show 1GB or 5MB, the same way that them showing 1GB doesn’t mean they are indeed using 1GB if they are an exact copy of another file that takes 1GB.

I understand that if I have 1 original file and 10 copies and I delete them all, that’s when the whole space gets freed. I get that. Again, I’m not an expert when it comes to APFS, but I understand the basic concept. If File A (1GB) is duplicated (File B), it will not take 2GB. When I make changes, maybe I will be using 1.02GB, because those changes are now saved, but everything else that’s common to those 2 files is only using 1GB, by both files. Then I make another change and now I’m maybe using 1.3GB, etc.

So, to finish this already lengthy post, I understand the basics of APFS (I’m aware that there’s a whole deeper level, but that’s not the point here).
My goal was to just look at a file and being able to know if that file is an original (unique) file, meaning no other copies were made, OR if the file was a copy of another file (or if it’s the original file where other(s) were duplicated from).

Again, if this isn’t clear enough for some of you guys, I’m sorry, I can’t make it more clear than this and I believe we will be debating infinitely and will get us nowhere.

I appreciate you all for trying to explain other things (that I think are not related to my original question), but I see that what I was trying to achieve is not possible. And that’s ok.

Thanks!

1 Like

I see File C, also showing that’s taking 1GB (it’s not, because it’s an exact copy of File A, but I don’t know that, it’s been a while). So I go ahead and I delete the file and to my surprise, my disk is still using 4GB and still only 1GB free. Can you understand why it would be important to know if a particular file will in fact create more free space?

No, I don’t understand. You could delete File A and get the same result as you would by deleting File C. If you want to reclaim space you must delete all of the files (in this case, File A and File C), as you posited above.

If your issue is that there’s currently no way to find all of those files, then I understand that, and hopefully that will be addressed some day (but don’t hold your breath waiting for that fix).

2 Likes