Best database program?

David_Roper · August 23, 2022, 3:35pm

Views and triggers are part of SQLite, so should be available on Android as well as other platforms. I don’t know much about Android, but from a quick Google it looks like Room is a database interface layer and I’d be very surprised it there aren’t others.

I’d seriously suggest taking a look at the SQLite documentation for views and triggers as, despite views being almost, but not quite the same as tables and hence somewhat quirky, they can be very useful. What the documentation doesn’t describe is how best to use views, which is really more to do with data modelling than programming or databases per se.

For what it’s worth, my general approach is start by looking for “real world things” – physical or conceptual entities – and work out what tables are needed to store them. If you look through introductory texts you’ll often see examples involving students and professors and courses, or departments and employees. Generally these, at least in my view although obviously not the author’s, are at best misleading or at worst utterly wrong. Please take them as lessons in SQL programming, not database design.

A student isn’t a real world thing, he or she is a person. The “student” part of it arises because a person can be registered at an academic institution, i.e. it’s the relationship person and the institution that establishes that the person is a student. In fact it’s the relationship that carries them some typical “student attributes” (like a student number). It’s also the presence of the relationship that allows the student to enrol on a course. Similarly the professor is a person who is engaged by the academic institution. Being a professor, or a reader or a lecturer or a tutor is an attribute not of the person, but of that relationship.

Students and professors are the same, basic, “real world” entity; a person who happen to participate in one of two different relationships, perhaps even both at the same time. Customers, clients, vets, farmers, cut them and they bleed, they’re all the same; what differentiated them is not what they are, but how they relate to the world around them.

Even then, you shouldn’t throw everything about a person into a Person table. Students and Professors both have a date-of-birth, but they don’t have telephone numbers; telephones have numbers, not people. Neither do they have addresses, houses do. And they change with time.

Relational databases don’t do time variant data as a fundamental structure (you can’t do something like CREATE TABLE Animal ( ... weight *TIME VARIANT* NUMBER ... ); and have the database do the hard work). But Animals grow, so if you want to record their weight it needs to be in a separate Weight table. In data modelling parlance Weight is a weak entity; the rows of the table have no identifier of their own, just that of the animal to which they relate. If you delete the animal, you want all its weight records to be deleted too (see referential constraints).

Similarly – as you clearly already know – reference data like sex should be in a separate table (by the way, if it’s not already, you might want to make that relationship many to one, I presume your wethers weren’t born that way!).

Anyway, the point is that even what looks like a simple concept – student, professor, person, animal – will inevitably normalise into a collection of tables; on top of which the application really does want to know about students and professors, sheep and goats and pigs.

This is where views come in.

Personally I build layers of views that culminate in the application level object that I want. For example, I might well build a Person view that pulls together a useful definition of a person that includes their *current *contact details, their current weight (or whatever) as well as their date-of birth. I would then use that view to create further views for Student and Professor by joining against the relationships Persons have. These views would bring in relevant information from the relationships (associations) that define the person as being a student or a professor.

Views are also very useful for isolating sub-sets, for example the subset of Animals that are Sheep and the subsets of Sheep that are Lambs or Ewes or Wethers or Tups. SQLite was designed to be an alternative to building complex runtime data models and serialising/ deserialising them to files. Because it runs in-process SQLite is very fast (certainly compared to a conventional database, no network latency or marshalling I/O ), so using it to answer even very simple questions can make sense. Views can make applications simpler even for apparently straightforward questions like “What percentage of my flock are ewes that are more than 5 years old?”.

I don’t wish to give the impression that database views are a panacea for all ills, they come with problems that shouldn’t be understated. It’s easy to use views to pull data together and retrieve them from the database. It can be a real problem reversing that process, inserting into or updating the database. The underlying problem is that queries (and that’s all views are) don’t have to include the primary key(s) of the table rows from which they pull the data in their result set. That means that if you then want to update a row or insert a new one through the view, the database doesn’t have a primary key by which to identify it. Having a primary key is mandatory. In consequence the basic position of SQL databases is, I can’t insert or update views in the general case, so I won’t let you do it at at all.

Now there are circumstances where it could be done, typically if the view includes the PK or (for inserts) it is autogenerated. INSTEAD OF INSERT and INSTEAD OF UPDATE triggers on views pass the buck to the programmer, but they are a starting point. The principal problem is, what do I use as a primary key? Using a natural key, i.e. some combination of mandatory attributes that is guaranteed to be unique, sounds like it should be easy, but in practice it often isn’t. Think, for example, of an animal’s ear tag number. Definitely unique, but to use it as a natural key is to DEMAND it’s available before you can enter any data about that animal. Might be Ok, might not. On the other hand using a surrogate key, e.g. a row count or a uuid, imposes one of two problems: if the database allocates the key, you’ve somehow got to pass the keys of newly created rows back to the application; conversely, if the application allocates keys, you have to make sure that they are consistently created by all applications that may ever use the database and can’t clash.

By now you’ll be screaming enough, enough, so I’ll stop.

webwalrus · August 23, 2022, 4:49pm

So are you effectively saying you’d store phone numbers in a separate table, and one-to-many link the students with the phones, because “phone” is a real-world entity?

I don’t think it’s inherently useful to think of “phone” as a separate entity in this case, because unless you have some need to store additional properties of the phone itself, or require historical tracking of phone numbers over time, it adds a layer of indirection where one might not be necessary.

If I need one phone number for a given person, or if I’m only allowing one home phone and one work phone, it’s far more direct to just store the info in the same row. The utility of knowing somebody’s previous phone number in many cases is minimal, so it can be omitted.

But if I need to store n phone numbers for a student, where n can be any number from 0 to (theoretically) infinity, that’s where I’d break “phone” out into a separate table - with a one-to-many link from students to phones.

And obviously sheep are a different thing than phones, but just noting that the business logic for data can sometimes influence how much abstraction is necessary.

David_Roper · August 23, 2022, 8:28pm

@webwalrus I absolutely agree, what you’re planning to do with the information can strongly influence the degree of abstraction. I should perhaps have said that I look for the principal and subsidiary entities with the domain of discourse.

My view, however, is that I can’t know today what might what direction any given application, let alone datastore, will be asked to do in the future. @OogieM’s LambTracker is a case in point; I bet when that what she thought it would do when she started out ten years ago bears little resemblance to what it’s now being asked to do. Storage is cheap and fast, memory plentiful and processors unbelievably so; this makes abstraction relatively cheap, the more so for very efficient, single application datastores like SQLite. So yes, I would most likely start by using a highly normalised model and only optimise it (by denormalisation) if it turned out to be too slow in practice. I forget who said that premature optimisation was the root of all evil, Donald Knuth? Anyway, somebody way more skilled than I am!

Altering a data schema after it’s been released and lots of people have lots of data stored is a process fraught with risk. Trying to restructure a denormalised table into a more normalised form is not something to be undertaken lightly, especially when compared with throwing in a handful of “spare” columns “just in case” or adding altering a table definition in place (we can have three phone numbers now, is that enough?). This probably goes a long way to explains why technical debt mounts up in long lived applications.

Would I take the same view for a centralised, corporate datastore with hundreds to thousands of simultaneous users? Probably not. I’d design the logical data model at a high level of abstraction, then denormalise it; but then I’d also do things like using materialised views, stored procedures, data types, domain checks and anything else the RDBMS had to help maintain quality and integrity at speed and scale.

On a more Apple topic, I’ve often wondered why macOS built in app don’t have a unified datastore; it would make integrating them and linking data (eg. open a contact, see a list of recent emails; link a reminder to a note; connect a reminder to a file) so much easier. If it was properly structured third party apps could extend it, which would also be nice.

tf2 · August 23, 2022, 11:48pm

But of course, a person is just a mammal – and a mammal is a subset of animals, which are members of the kingdom of Animalia. At least, on earth; we may soon know of other kingdoms of life on other planets, so probably best to start with the planet…

And then it’s turtles all the way down!

More seriously, I used to worry a lot about normalizing everything. Then I talked with some people who used databases much more extensively. They argued that at a certain point, for most projects – especially relatively small-scale projects that aren’t likely to be expanded dramatically over time – the mental and design overhead just isn’t worth it.

Where that point is – well, that’s where expertise and experience come in!

OogieM · August 24, 2022, 3:38am

Yep, I just spent several hours to try to find the details but that is my memory too.

FWIW If anyone wants to continue this conversation (which I would LOVE to do) I will give access to the details of my current (still in flux) database design docs. I have no one I can talk to this stuff about in a civil debate where we hash out the pros and cons of any particular structure and would really benefit from more conversations about this.

Right now the number of flocks using the old Lambtracker database schema and code can be counted on one hand. I already have more AnimalTrakker users in 9 months than I had of LambTracker users in 10 years.

Clearly AnimalTrakker is solving a need that LambTracker did not.

I need to make it better.

cornchip · August 24, 2022, 4:12am

I think David_Roper might be one of those experts.

ThatNerd · August 24, 2022, 4:51am

Somebody’s about to step into the thread with a “I helped build X DBMS”, I just know it

Lars · August 24, 2022, 5:00am

Which should be followed by an immediate confession “and made many mistakes in the data scheme, which we are still ironing out after years”.

Zathras · September 15, 2022, 5:00pm

It appears that Apple is going to give us a freemium version of Claris Pro (formerly FileMaker Pro) that is good for one user. This is exactly what I want, since I spent well over a decade at work with Filemaker, AppleScript, Photoshop, etc. automating my photo workflows. When I retired, I just couldn’t justify keeping FileMaker Pro up-to-date. Can’t wait for this new version to arrive this Fall.

tomtom · September 16, 2022, 6:05am

At the other end of the cost spectrum, folks on this thread might be interested in Collections Database, which is free to download. The free tier is generous, then it is a $4.99 one off in app purchase to unlock certain advanced features.

I found it when I was struggling with AirTable’s clunky user interface. It is a universal app across iPhone, iPad and Mac. The UI is the same on all platforms so things like the settings menu look much more at home on the iPad, but even on the Mac it is a vast improvement over AirTable.

While the focus is obviously at a consumer level, the features list is impressive (including formulas, cross linking etc). There’s a full list in this review. Syncing is via iCloud and I have had no issues.

It is a new app by a single developer, and I have found him to be incredibly responsive to requests for support or features.

dfay · September 16, 2022, 7:40am

Has this gone anywhere? I’d love to join the group as I’m in the early stages of a database / CoreData based project that could use another set of eyes.

OogieM · September 16, 2022, 12:06pm

Yes, lots and lots of useful converatiosn abut UUIDs instead of primary keys, moving to an Entity, Attribute, Relationship model and more. I sent a PM to you.

OogieM · September 16, 2022, 12:15pm

Yes, lots and lots of useful converations abut UUIDs instead of primary keys, moving to an Entity, Attribute, Relationship model and more. I sent a PM to you.

Kraftwerk · October 2, 2022, 4:34pm

a welcome addition, thanks

01001001 · October 2, 2022, 5:40pm

May someone please educate me: what are the use cases for database software? I understand how things like MySQL and SQLite are valuable as additions to certain software, say Wordpress for example. But how could they be useful to non-software developer types? Growing up I fiddled with Microsoft Office on the family desktop, but understood Access to just be “the purple one”.

rms · October 2, 2022, 6:04pm

Frankly, not useful unless you want to step out and learn.

Software dev skills are not necessarily important but understanding data structures is.

Microsoft Access is good with plenty of learning resources, books, courses, etc. It can be used once experienced gained to work with MySQL or other SQL server databases, leave that to later.

karlnyhus · October 2, 2022, 6:10pm

I got the impression in other posts that you use spreadsheets. So you are halfway there in understanding that a database is software that you use when your data is too complicated for a spreadsheet to handle.

tf2 · October 2, 2022, 6:30pm

One use case is when your spreadsheets get too big or too complex to work well. For example, you’re trying to pull data from multiple sheets, or the numbe of rows and columns gets so big that it slows the spreadsheet app down.

Another benefit is reproducibility. Spreadsheets are fast and capable, but changes and errors can be hard to spot – if someone sorts wrong (selecting only some of the columns) or copies and pasted carelessly, or just fat-fingers a cell (adding a zero or two?), it can easily go unnoticed until too late. For example, a faulty cell range in an average() formula helped convince policymakers around the world that austerity was the solution to slow growth.

Many databases, by contrast, use SQL or another kind of written instructions to modify or analyze data. Those instructions get executed exactly the same way, as written, each time, and are fairly readable (once you learn SQL). It’s easier to see what you’ve done to your data, roll back changes, etc.

01001001 · October 3, 2022, 12:43am

Okay now I’m intrigued. I’ve got a few projects in my mind that are relatively far off, but can benefit from the relationship between a spreadsheet and a database.

I’ve also got a copy of a book titled “Data Structures Using Pascal”. I don’t know Pascal. But it’s a whiff that I can start from, I reckon. Thank you to you, @karlnyhus and @tf2 for the explanations.

rms · October 3, 2022, 5:15am

I’ve heard of that book and probably gives a great start. But I would recommend not getting too deep into using Pascal for lots of reasons. Getting to understand one-to-many relationships is a great start with data analysis, and also keeping in mind your objective, e.g. as explained many places… I found A Step-by-Step Guide to the Data Analysis Process [2022] at the top of a “interweb” search.

In the 90’s I used Microsoft Access extensively as it has it all for quering and reporting (but I made little use of their VB all that time), then after a sojourn from data related work, when I came back to having a business need I parlayed that database structure experience into still using Access for the front-end but kept data in SQL servers (or wherever it was). Following that, started using Python with Pandas to automate working with the data. Now I’m finding that I’ve again moved on and for analaysis on data that I get from out on the “interweb”, I still rely on Python/Pandas but “R” taking on a role too. Both are supurb tools unavailable when I started doing data number crunching. So much to chose from. Best not to conflate “database structure” with “programming tools”, though.

To start, learn about relational data modelling (one-to-many, lookup tables, many-to-many, inner and outer joins, foreign keys, etc.). Probably covered in your book you have, but many books and “interweb” sites available to help with that.