654: Data Storage in 2022

This make me uncomfortable.
Files that look like files, but aren’t really files.
It seems there is a lot of potential here for backing up these shadow files, only to find that you local backup only has a shadow file and not the actual content.

So far, Arq is complaining about these files, such as in Box, which I then set to Make Available Offline. Box also places a cloud symbol next to any file that’s in the cloud, but not local. If these things are made more transparent (e.g. blindly backing up shadow files, or removing the cloud symbol), things could go sideways in a hurry.

I’m still not sure if my Documents folder is being backed up, as it’s stored in iCloud. Something I need to check. Everything important there I back up in other ways, such as to a Git repo.

1 Like

This sounds like it could possibly use a ton of data. That wouldn’t be good for those of us with data caps.

This article by Ivan Drucker is an excellent resource although you’ll find the process not as straight-forward as you might hope. Network Time Machine Backups: Moving on from the Time Capsule - TidBITS

This episode came too soon after my recent experience trying to resize my own RAID5 array on an OWC ThunderBay with SoftRAID. The episode almost triggered a little PTSD. :sweat_smile:

For people interested in my recent adventures with SoftRAID, I wrote up a blog entry about it.

Sorry about that @James1. Grin.

1 Like

Remember when the rumor mill said a “new file system for OSX was coming” and ZFS was often mentioned. It would have solved so many issues… Or something akin to Linux Software RAID would also be nice…

3 Likes

Yes, ZFS. It automatically takes care of things like bit rot.
But no. Thanks Oracle.

It’s still around, and used by e.g. TrueNAS (I use it on my server), OpenZFS does development, and it seems to be possible to use it on macOS.
I’m trying to use it as the root filesystem on Manjaro (which is quite nice) at the moment, but so far no luck.

I have a question regarding ransomware and backups. I thought I heard that ransomware is now sophisticated enough to infect backup drives connected to a computer. If so, would a rotation strategy be best where there’s always a backup drive in a drawer that has been disconnected for a week?

Something like that, for sure! I have two external drives that I use for daily backups but one spends two months unused “in a drawer” before I swap the drives and “rest” the other one for two months.

I also have four external drives that I use for weekly backups so each drive sits unused for three weeks at a time.

You can make up your own rules. (I also back up offsite using the Arq Premium service and my Apple Time Capsule continues to function as well.)

Importantly this episode allowed me to understand an episode of “Silicon Valley” when Gilfoyle exclaimed, “We’re going to have to go RAID zero.” I had no idea what that meant… now I do. I love that they use actual tech concepts in this show.

2 Likes

The whole point of the new system is that you can’t backup the shadow files because the OS sits in front of them. Assuming that Box uses this system and that Apple’s bits are working as advertised then you cannot possibly back up “shadow files”. In fact, this is a prominent advantage over the old way of “trusting” the providers to do the right thing with online-only files. This is why, I am sure, Federighi mentioned it, wherever he did that.

Whether Apple can make this work is open for debate, but the design 100% drives at the fact that such files “are fully transparent to processes who happen upon them unprepared”.

The data cap question is a good one, but there are already tons of processes, first party and third party, that, on a Mac, just assume it’s open season on data. When Mac laptops get cellular modems, perhaps that will change.

I’m sure I remember David saying BackBlaze will back up your time machine data, and that you could even back up a family member’s back-up if their time machine is backed up to a disk connected to your BackBlaze account.

Is this true? If so I’d become a BackBlaze customer instantly.

Their website suggests not…

To avoid duplicating data, Backblaze will not back up any drive that contains Time Machine data on it. The drive will be listed in Backblaze’s “Select Hard Drives to Backup” as Time Machine and will not be selectable.

If you’d like to back up a Time Machine drive to Backblaze, you have two options:

1. Turn off Time Machine, delete the Time Machine data from the drive, and select the drive for backup in Backblaze.

2. Split the drive into two partitions, one for Time Machine, the other for data. The data drive can be backed up to Backblaze.

1 Like

Let’s hope.
Currently, shadow files in Box, iCloud, etc. fail on attempted backup.

Another concern (you may have mentioned earlier): in order to back up the data that is shadowed to the cloud, it will all need to be downloaded, backed up, then deleted. For drives low on space, there will need to be some sort of round-robin processes of downloading, backing up, then deleting, all within the constraints of the space available on the drive.

1 Like

Indeed. Technically solvable, I think, but will anyone deliver that is another question.

1 Like

Thinking about Stephen’s anecdote about the client that had the raid drive fail, I found 3 critical mistakes.

  1. Not paying attention to the messages from BackBlaze.
  2. Not checking status of backups on a regular basis.
  3. Not testing the backups.

Number 3 is the most critical in my opinion. If a backup isn’t tested it doesn’t really exist. While I haven’t tried BackBlaze’s hard drive recovery yet, I do go online and download a small set of files to verify that I can restore them.

Also in the episode I agree that few users need raid. It’s mainly for fault tolerance where you can’t stand the downtime if a drive fails. I have a number of drives in JBOD enclosures.

That is a very true in the data world. If you haven’t tested your backup - you don’t have a backup!

This.

A tale I recounted on another thread about backups:

As others have noted, an untested backup is no backup at all.

1 Like

I’m making backups to two external drives (one stored at home the other at the office) using Time Machine. I’m also using Backblaze to backup. Additionally, I store all fines in iCloud (I know its not a backup)

Should I also be using Carbon Copy Cleaner and if so why?

And it’s a mess. When OneDrive migrated to the new File Provider Extension it broke a lot of workflows, backup, etc. Users were pretty upset with the change.

This was enough for me to move all my files far away from Onedrive, and any other FPE solution (I tried Box, iCloud).

I now use a local server with a Nextcloud setup. It’s a little laborious to setup, but worth it in my experience.
All the files sync back and forth to the Mac and iPhone with no problems. Real files, not placeholders.
The Mac backups are guaranteed to always catch all files. I backup the server to an external drive too.

3 Likes

You seem to be in good shape, but a couple of things give me pause:

TimeMachine always failed for me at some point (6 to 9 months or so), and required starting backups over. This didn’t result in my losing data, but I lost the history I was trying to build and keep. (To head off a bunch of replies: other people have used it for years without any trouble.)

If I were in your shoes, I would keep a couple of “shelf drives” that you backup to every, say, six months or a year, in alternation. This adds protection against things like ransomware attacks. Let’s say you’re connected to your backup drive at home. Unbeknownst to you, some malware begins encrypting your drive and TimeMachine is doing its hourly backups. You move to your office, connect the drive there, and continue working. TimeMachine begins backing up to the office drive. The malware continues encrypting, and you potentially have corrupt/encrypted data backed up in both places. In the event of a high ransom, bad actors not unencrypting, etc. a shelf drive would let you at least restore to a fairly recent point. Backblaze might provide some coverage here if you pay for their extended retention service. I don’t have the details of their service at hand.