Technical question somewhat theoretical

I have a technical question that is not directly related to any issue I’m having but could inform minor tweaks to my workflow.

Over time are files, especially but not exclusively plain text files, in a database more likely to be corrupted than those residing only on the hard drive?

Adding layers of complexity always adds more ways to go wrong.
But backing up regularly and proper maintenance should’t be an issue.

1 Like

no. xxxxxxxxspacexxxxxx

2 Likes

No, although it depends on the database software’s reliability vs the OS’s own reliability. As @aardy says, it’s another layer (albeit potentially a very reliable one.)

Backing up may also be an issue, although it depends on the software. For example if the backup software sees the whole database as a single file it potentially becomes difficult to restore from backup an individual PDF or MD file within that database.

Lots of ifs and maybes!

Data in a database is still bits on the disk, no different than “files”. A database system merely adds additional level of complexity between you and your data. More complexity equates to more points of failure, thus more risk. But, on the other hand, bit rot is bit rot – the “database” doesn’t stop the eventually deterioration of your disk.

BTW, if the “database” you’re thinking of is DEVONthink, be aware that that software doesn’t change your files, it just stores them away in folders inside of another specialized folder. The “database” part of DEVONthink is a concordance of the words in your files and some specialized pointers (external to your files) to help find files and file content.

Short story, IMO, the “database” add a few percentage points of risk, which are easily erased by the benefits of data management the database provides.

Nothing worth losing sleep over as long as you keep good backups, on and off site.

Katie

1 Like

Unless you’re using it to sync.

1 Like

Guess what! A file system is a database, and can suffer from the same corruption risks that a “database” has. Only since the database is built on top of a file system (database), it is more problematic. Luckily systems have become more reliable over the years and if you systematically make backups you are in good shape.

2 Likes

True. Didn’t want to resurrect sad thoughts.

Katie

For those who brought the issue up, yes, I am referring to DEVONthink, and I do sync with my iPad and Mac. However, my data is imported into DEVONthink, not indexed.

With that more specific information, does that change anyone’s perspective at all?

When you import your files they, along with the DEVONthink database file that keeps track of your files, all get rolled into a single package file. A package is a collection of files that your Mac reads as one file. Right-click and select Show Package Contents. But you’re not supposed to poke around inside.

So if one file changes, is incremental backup software smart enough to update just that one file? If bit rot hits does it ruin the entire DEVONthink database file or just a single corrupted document?

That is why I much prefer apps like Obsidian, EagleFiler, and NotePlan, among others, that specifically work with a normal folder full of normal files and subfolders – which is kind of what you have when you index rather than import with DEVONthink.

2 Likes

A package is a file system directory that appears to the end user as a single file – but it is still a directory of files. For instance, all .app “files” are actually packages. The Mac doesn’t “read” the package as a single file, it is managed by the file system as a directory.

Katie

2 Likes

Packages are just folders. There’s nothing mystical about how they are stored.
You can rename, say, Typora.app to Typora and it appears as a folder. Rename it back, and it’s an app again.

From the developer documentation:

  • A package is any directory that the Finder presents to the user as if it were a single file.
  • A bundle is a directory with a standardized hierarchical structure that holds executable code and the resources used by that code.

Back to the question at hand:

As you’re probably aware, @OogieM lost a lot of information, and was given the brushoff by DEVONtechnologies. I think it was sync related, but I don’t think any conclusions were reached. They added a check so you can see if they corrupted your files, which I think is telling. Needless to say, this is very much closing the barn door after the horse is out, and is minimally helpful. Then again, they are still selling DEVONthink, and people are using it.

So, caveat emptor.

No. They are just files. They are subject to the edge cases that people detailed above, such as bit rot. Bit rot is where data spontaneously change as the magnetization of the particles on the disk fades, or the electrical charge in as SSD fades. There are also mechanisms built into disk drives (hard disk or SSD) to correct some of these errors without our even knowing about them. As long as your backups are good, there is little to worry about from these rare events.

I’m aware of that. But the question I posed was how does your backup software see it.

2 Likes

Did you answer my question is incremental backup software smart enough to update just one of the files in a package if that is all that changed?"

The way I would answer your question is that if a byte or two of a plain text file somehow got corrupted you would still be able to read the rest of the file and make sense of it because it is just ascii and even if some word becomes unintelligible you can probably figure out what the sentence means.

When a database gets corrupted, it could make the entire record or table unreadable since you have no way of understanding the format of the closed source system they are using. You are then at the mercy of the company to try and recover your data.

1 Like

And yet, for a number of years, Box couldn’t or wouldn’t sync OS X packages – it didn’t see them as folders of files, but as files it couldn’t sync. So the distinction matters. (I assume Box has fixed this by now, but I haven’t checked in a long time.)

2 Likes

You said, “ A package is a collection of files that your Mac reads as one file .”
Which isn’t the case, as detailed in the developer docs I linked.

The backup software I use sees them as they are, folders, some with package bits set, just as other macOS files and folders have extended attributes.

Regarding DT and imports, the answer is “no” - the data isn’t inherently any more likely to be corrupted other than the sync thing others have mentioned.

But it will be disorganized at the filesystem level to the point where you’ll likely have a hard time making sense of it without DT.

Imagine you had a beautiful collection of file cabinets (your disk), with everything filed “just so”. You then hire an assistant (DEVONthink) to manage all of your filing and retrieval. Down the road, you fire that assistant (delete the app) and go take a look at your files.

You’ll discover that your assistant has (to your view) not maintained your system, but rather created their own undocumented system, intelligible only to them. Everything is in a dozen separate filing cabinets, filed by some largely-inscrutable criteria. Your assistant could retrieve files flawlessly, but without them you’re out of luck.

That’s DEVONthink.

There’s nothing wrong with that, as long as you have appropriate backups and can put your database back, or as long as you export the files out of DEVONthink before you get rid of the app. But if you’re assuming it’s going to be like Obsidian (“delete the app and you’re left with a nicely-organized folder of files”), you’re in for a surprise. :slight_smile:

Just be aware of what you’re getting into, and you’re fine.

2 Likes

DEVONthink synch file are NOT backup and not useable by anything but the DEVONthink app. Rinse and repeat. DEVONthink synch files are not backup and not useful to users.

DEVONthink has menus to produce archive backups in ZIP file format, and TimeMachine (or similar) should be used.

1 Like

This is a good analogy.
To expand it, the assistant also mails copies of documents to your other office, and receives mailed documents from the other office.
Sometime later, you discover some of the envelopes are empty…

2 Likes