654: Data Storage in 2022

Lots of good questions to think about. One quibble, the DiskFresh app may be a good idea but its website looks really old and seems to be Windows-only. Anybody know a Mac utility with similar functionality?

Yeah, I think there’s some new API that allow something like Dropbox to be notified “hey, the user is asking for the badgers.jpg file, which I don’t seem to have, so you had better fetch that right quick or I’m gonna be blamed!” by Macos.
No idea if it works - I always keep everything store on device on my computers, though I do appreciate not having to do so on my phone.

1 Like

Just a thought: when you’re (royal ‘you’) making a long-term backup, you’ll want to ensure that you’re doing a complete backup that writes everything to the drive. If you’re making an incremental backup, some portion of the data will not be written again, thus it can still degrade over time.

In different words, if you have a backup that you made 5 years ago, and have been doing incremental backups since then, parts of the data on the drive were written 5 years ago, and have been fading since.

Doing a full backup like this negates the need for DiskFresh type utilities to read and rewrite the data.

AFA SSDs, the Wikipedia page is pretty good. One of the interesting things is that Intel’s Optane SSDs use a different technology (resistive, rather than capacitive change to store info). However, there doesn’t seem to be definitive longevity data on either.

2 Likes

Looking around for Mac repair utilities, I haven’t found any that specifically attempt to cope with existing data that may be fading on disk. For general Mac repair software, I see little mention of Apple Silicon Macs nor APFS disk drives. Apple gets the blame for providing insufficient APFS documentation and for APFS still being a work in progress.

Backblaze drive shipments are great. My little anecdotal advice is to do a test request ahead of when you might really need one, if you have a lot of files. Drive provisioning time scaled by number of files more than by total size, and I had ~25x the average number of files on the drive. I made some changes to the backup config in the interest of recovery speed, needless to say…

1 Like

Great discussion, really helpful episode.

You’ve inspired me to upgrade my external drives to SSD @MacSparky.

Do you (or anyone else!) have recommended resources for how to connect a drive (e.g. the Samsung T7) to a network so it can be accessed over wi-fi? I’ve had a look around and can’t find much, any pointers would be really appreciated.

Thanks :v:

1 Like

I’m still not sure who I heard mention it (but still think it was Federighi) nor where it was mentioned, but I have tracked down the tech behind what I was referring to (and somewhat confirmed by @Shruggie).

The new technology is called File Provider Extensions and was introduced at WWDC 2021 (so no excuses for DropBox or Microsoft).

Some key passages from the transcript:

What isn’t clear to me from the transcript is whether a third party, such as a backup tool, could similarly trigger eviction, or whether the File Provider Extension could itself determine that the file was only needed temporarily. But that aside, the system absolutely allows for the download of files when demanded simply by an attempt to read. That much, at least, favours backup solutions. It’s just a case of whether that will start filling your disk.

In fact, if a system like DropBox explicitly allows marking files as “Offline” this, alone, would be information enough to trigger the eviction at some point.

2 Likes

This make me uncomfortable.
Files that look like files, but aren’t really files.
It seems there is a lot of potential here for backing up these shadow files, only to find that you local backup only has a shadow file and not the actual content.

So far, Arq is complaining about these files, such as in Box, which I then set to Make Available Offline. Box also places a cloud symbol next to any file that’s in the cloud, but not local. If these things are made more transparent (e.g. blindly backing up shadow files, or removing the cloud symbol), things could go sideways in a hurry.

I’m still not sure if my Documents folder is being backed up, as it’s stored in iCloud. Something I need to check. Everything important there I back up in other ways, such as to a Git repo.

1 Like

This sounds like it could possibly use a ton of data. That wouldn’t be good for those of us with data caps.

This article by Ivan Drucker is an excellent resource although you’ll find the process not as straight-forward as you might hope. Network Time Machine Backups: Moving on from the Time Capsule - TidBITS

This episode came too soon after my recent experience trying to resize my own RAID5 array on an OWC ThunderBay with SoftRAID. The episode almost triggered a little PTSD. :sweat_smile:

For people interested in my recent adventures with SoftRAID, I wrote up a blog entry about it.

Sorry about that @James1. Grin.

1 Like

Remember when the rumor mill said a “new file system for OSX was coming” and ZFS was often mentioned. It would have solved so many issues… Or something akin to Linux Software RAID would also be nice…

3 Likes

Yes, ZFS. It automatically takes care of things like bit rot.
But no. Thanks Oracle.

It’s still around, and used by e.g. TrueNAS (I use it on my server), OpenZFS does development, and it seems to be possible to use it on macOS.
I’m trying to use it as the root filesystem on Manjaro (which is quite nice) at the moment, but so far no luck.

I have a question regarding ransomware and backups. I thought I heard that ransomware is now sophisticated enough to infect backup drives connected to a computer. If so, would a rotation strategy be best where there’s always a backup drive in a drawer that has been disconnected for a week?

Something like that, for sure! I have two external drives that I use for daily backups but one spends two months unused “in a drawer” before I swap the drives and “rest” the other one for two months.

I also have four external drives that I use for weekly backups so each drive sits unused for three weeks at a time.

You can make up your own rules. (I also back up offsite using the Arq Premium service and my Apple Time Capsule continues to function as well.)

Importantly this episode allowed me to understand an episode of “Silicon Valley” when Gilfoyle exclaimed, “We’re going to have to go RAID zero.” I had no idea what that meant… now I do. I love that they use actual tech concepts in this show.

2 Likes

The whole point of the new system is that you can’t backup the shadow files because the OS sits in front of them. Assuming that Box uses this system and that Apple’s bits are working as advertised then you cannot possibly back up “shadow files”. In fact, this is a prominent advantage over the old way of “trusting” the providers to do the right thing with online-only files. This is why, I am sure, Federighi mentioned it, wherever he did that.

Whether Apple can make this work is open for debate, but the design 100% drives at the fact that such files “are fully transparent to processes who happen upon them unprepared”.

The data cap question is a good one, but there are already tons of processes, first party and third party, that, on a Mac, just assume it’s open season on data. When Mac laptops get cellular modems, perhaps that will change.

I’m sure I remember David saying BackBlaze will back up your time machine data, and that you could even back up a family member’s back-up if their time machine is backed up to a disk connected to your BackBlaze account.

Is this true? If so I’d become a BackBlaze customer instantly.

Their website suggests not…

To avoid duplicating data, Backblaze will not back up any drive that contains Time Machine data on it. The drive will be listed in Backblaze’s “Select Hard Drives to Backup” as Time Machine and will not be selectable.

If you’d like to back up a Time Machine drive to Backblaze, you have two options:

1. Turn off Time Machine, delete the Time Machine data from the drive, and select the drive for backup in Backblaze.

2. Split the drive into two partitions, one for Time Machine, the other for data. The data drive can be backed up to Backblaze.

1 Like

Let’s hope.
Currently, shadow files in Box, iCloud, etc. fail on attempted backup.

Another concern (you may have mentioned earlier): in order to back up the data that is shadowed to the cloud, it will all need to be downloaded, backed up, then deleted. For drives low on space, there will need to be some sort of round-robin processes of downloading, backing up, then deleting, all within the constraints of the space available on the drive.

1 Like