Backblaze just threw an error today. It says my bzfileids.dat file is too large, and therefore Backblaze has completely stopped working.
Their website says I need to install the most current version over the old version, and do a few steps to rebuild it. So I did. Still no success.
Reached out to support. Their solution is for me to completely abandon the current backup and re-upload all of my data. Which I could do, but with over 10 GB of data that’s going to take quite awhile. And, of course, since the issue seems to be the number of files in the backup, I don’t necessarily have confidence that the new backup will be more reliable.
Looks like I have some research to do regarding how many files exactly I have, and some thinking about whether or not Backblaze is the best option moving forward.
I just can’t believe that commercially-available backup software doesn’t have a way to recover from this sort of thing without abandoning the current backup and re-uploading all of the data. Apparently the current file is 20 GB. But on my Mac Mini, that actually fits in RAM if I shut down everything else - and I would think there’d be a way to say “prune version history” or something…even if it would take a long time to do.
That sucks @webwalrus. I had a similar issue not long ago and the instructions I found on BackBlaze’s website resolved the issue very quickly.
I had the same fears as you though, with a backup which at the time was approx 16TB and is now down to just over 10TB, I suspected it was a way to discourage heavier users.
The majority of my files are media so larger files rather than lots of tiny files.
Are there file heavy areas you could exclude from the BB backup and (for example) run a nightly zip of those files which is copied to an area which is backed up. To the backup then it would look like one file.
I’m looking into that. For awhile this past year I was working with figuring out how to handle email archives, so there was awhile where I had multiple folders with 100,000+ files each. They were the same files, just in different configurations, so I’m not sure if they took up archive space by themselves or not.
I think my strategy is going to be to have an “@ Not Backed Up” folder on my external storage where I can put stuff that I’m working with, and aggressively prune stuff from ~/Library. I was backing up that whole folder, because there’s important stuff in there (email, etc.). But upon further review, it looks like there’s /Metadata, /Caches, etc. in there that could probably be safely excluded.
Backblaze is still the best value for money, and I’ve never had issues restoring - so I’m tempted to keep it and re-work my archive. It’s just kind of ridiculous that “redo your backup from scratch” is a proposed solution to what, under the hood, is ultimately a file indexing issue.
I don’t have a fix for Backblaze, but I do have a suggestion.
For greater flexibility, I switched over to using Backblaze B2 and Arq for my backups. Arq is much better IMO than the Backblaze client. This will let you have more control over your backups. This will also give you the ability to backup to multiple providers to diversify your backup targets and hopefully avoid this type of issue in the future.
You’ll pay by data storage with B2, but the costs are low. I have 400GB in mine right now and the cost is about $1.50/month.
I have a NAS at home that I backup to B2 as well as my Mac through Arq. You can backup as many devices as you want this way, you’ll just pay for how much you consume.
Have you tried to restore a few files since you encountered this problem? That would be the first thing I wanted to know because it could put a clock on finding a solution.
The entire restore process seems to have succeeded. I picked a folder that would have a significant amount of data (about 10 GB) split across a bunch of files. Downloaded, unpacked, and verified a few random files - things look good.
Just got a reply from support regarding whether there was any other way to address the problem. Completely abandoning the previous backup and re-uploading all the data isn’t just the preferred way to fix this problem - it’s the only way.
If Backblaze B2 is $6/TB/month, then 500 GB would be $3/TB/month, so wouldn’t 400GB be about $2.34 per month? How do you get it for $1.50 per month? Are you grandfathered in at a lower rate?
I use B2 for very specific purposes. Would I consider backing up 10s of TB from my NAS to it? No way. It’d be expensive! So… consider how much the regular consumer Backblaze product is priced and you’ll probably see why they don’t really want to help you succeed.
The thing that occurs to me though is that the issue seems to be the number of files and revisions – not the volume of data.
If I am looking at the file list, it says that I have about 4 million files tagged for back up. The vast, vast majority of that is very small code files, website folder trees that I’ve downloaded, copies of nightly data imports, etc.
So while the volume of data may be why they don’t want to help me, this feels like the sort of thing that a user with far less data volume could easily run into.
For example, Siracusa was talking on ATP about a folder in iCloud that his kid had that had something like hundreds of thousands of files worth of Node.js dependencies. And when I completely deleted Backblaze to rebuild my back ups, it scanned my home directory and found over 700,000 files just in my user library cache folder.
Needless to say, I am aggressively analyzing everything that goes into this back up and trying to make sure that I avoid folders with huge numbers of files if possible.
AFAIK no file system is happy with 100,000 files per folder. HFS+ could handle around 2 billion files per volume and we had to stay around 10,000 to get “acceptable” performance over a network. But apparently you are doing OK.
How often do you access most of these files? B2 is currently $6/TB per month so $60/TB per month for 10TB would be too expensive for me. But 10TB in Amazon S3 Glacier Deep Archive would only be $10/month, if I rarely needed most of that data.
Could you possibly find a solution with Amazon, or perhaps Wasabi, etc using multiple levels to store your data?
Sorry, that was confusing. Not per folder, per folder tree. For instance, I might download a site to migrate for a client, or be developing a new WordPress site on a local web server. Relatively pedestrian WordPress installations can have 30,000 or so files in them by themselves pretty easy, although obviously not in the same single folder.
I could probably hand-roll a solution with Glacier or Wasabi (which I use for backups on a production server, actually). I do wind up doing restores a few times per year though for various things - so Glacier’s recovery costs would probably get a bit crazy.
My strategy at this point is that I’m going to give Backblaze another try, being a bit more intentional about what goes over there. And I’ll look into other redundancy options for the most critical data going forward.
I agree with excluding those vendor/dependency folders. You’d find they make restoration too slow if you need to recover. We found that doing drive shipment tests from Backblaze with ~3 million files. Reinstalling from package.json/composer/rubygems/cargo etc. should be good enough, maybe maintaining single copies of versions of core software you could re-apply in a restoration.