6

I’ve been working on a project for a little while and I’m unsure which is the better architecture. I’m interested in the consensus. The answer to me seems fairly obvious but something about it is digging at me and I can't pick out what.

The TL;DR is: how do you handle a program with application data and user data in the same DB which needs to be able to receive updates to the application data periodically? One database for user data and one for application, or both in one?

The detailed version is.. if an application has a database which needs to maintain application data AND user data, and the user data all references application data, it feels more natural to me to store them in the same database.

But if there exists a need to be able to update the application data within this database periodically, should this be stripped into two databases so that one can simply download the updated application data database file as an update and replace the old one? Or should they remain as one database, and the application data be updated via a script which inserts the new data into the existing database? The second sounds clearly preferable to me... but for some reason just doesn’t feel right, and I can't pick out quite why.

trycatch
  • 1,080
  • 2
  • 12
  • 18
  • Does the application data get modified during normal execution? – Winston Ewert Mar 20 '12 at 03:25
  • @WinstonEwert - no, the application is preloaded and should only be updated during a software update. – trycatch Mar 20 '12 at 03:46
  • Does the application run on a central server or in the users device/computer? – David Andersson Mar 20 '12 at 05:18
  • @DavidAndersson - On a user's PC. – trycatch Mar 20 '12 at 13:58
  • @trycatch, can you be more specific then about the database system being used. I assumed you were talking about a db server, but not it sounds like we're talking about a desktop application. What specific dbms are you using? Are we talking something like a parts database thats used in a desktop app? – GrandmasterB Mar 21 '12 at 04:44
  • @GrandmasterB In this case it's a SQLite DB on a user's desktop. A parts database is perhaps a good example of the situation, but add in an inventory table that's FK'd on the parts table or something like that. I need to be able to push out updates to the parts table (the application data) on without affecting the user generated data (ie. the inventory data). – trycatch Mar 21 '12 at 10:04

3 Answers3

4

If you ever need to move the application, independently of the user database, then you need a separate database for the application (in whatever form that takes), so that the database can travel with the application, leaving the user data intact in its original location.

It therefore follows that, if the application database is updated periodically from the vendor (that's you), then it needs to be kept separate from the user's database, so that you can distribute changes to the application database without affecting the user database.

Now, if you need to add fields or tables to the user database, that's a different story. For that, you need a module that can accept as input a table of changes from the application database, to be applied to the user database. Some programs do this by "converting" the user database to the new format.

Data conversion can be done by using SQL DDL to apply the field and table updates to the user's database, in a way that doesn't negatively affect the user's data. In some advanced scenarios, data transformations might actually take place; normalization or denormalization, for example.

If you need to provide users with the ability to do a data transfer, you should use some other mechanism such as a communications conduit, or an import/export file containing the data to be transferred (perhaps in XML).

Robert Harvey
  • 198,589
  • 55
  • 464
  • 673
  • I believe the converse (or is it convex? I can never remember..) is more necessary for this particular application; the user needs the ability to export some of his data to give to another user of the application who can read it in. I don't foresee a need for needing to move the application data separately. – trycatch Mar 20 '12 at 03:44
  • I wrote the first comment before your edit, but yes, XML is the clear choice to me for import/export. I'm curious though about your mentioning of the need to move the database - can you suggest to me an example of when I would need to move the application database and not the user data? I'm having a hard time wrapping my head around coming up with a specific example. – trycatch Mar 20 '12 at 04:14
  • @trycatch, is this data users are exporting, application or user data? – Winston Ewert Mar 20 '12 at 04:39
  • @WinstonEwert - User data, although some of the user data may reference values from application data tables. I think that may be part of what's concerning me.. the possibility that somehow my updates may result in different key IDs to application data tables (though it really shouldn't if I'm doing the same operations on everyone's) and imports may go awry if so. – trycatch Mar 20 '12 at 13:59
  • @trycatch: If everyone's data is in a single, large database, then ID's shouldn't be a problem. If you need an ID to refer back to the source database, you can store it in another field. If the ID's need to be globally unique across all users' individual databases, you should be using GUIDs for IDs. – Robert Harvey Mar 20 '12 at 15:13
  • @RobertHarvey - it's a single database per installation to a PC. I may need to be able to push out a global update to all users to add new application data occasionally. For purposes of exporting user data and importing it to another user's application, it is pretty essential that the application data PKIDs be identical across all machines. – trycatch Mar 20 '12 at 15:27
3

I dont think it really matters. Barring other requirements, as long as you can update the application data as needed, whether its in a separate database or the same one doesnt seem too pertinent. SQL scripts can drop/reload tables in either case.

GrandmasterB
  • 37,990
  • 7
  • 78
  • 131
  • Just use different schemas, Application.Table.Column vs. User.Table.Column . One database, two schemas, then you have defined boundary between application and user data. – Jon Raynor Mar 20 '12 at 17:15
  • @JonRaynor Well, the tables are already separated, so the boundary already exists, it's just a matter of patch-ability. I'd have to update via a script that does a bunch of inserts, rather than be able to just copy over an updated database file. – trycatch Mar 21 '12 at 00:50
  • I think the single database is fine with the update scripts. One case to make for a new file is if the update is massive, say 1 million inserts, then its easier to just lay down a new file. If you updates are not that massive, update scripts will work fine. – Jon Raynor Mar 21 '12 at 14:06
2

I would look at the coupling. Does all your data belong together, ie is all the data in the same bounded context (to use domain-driven design terminology)? If not then you may want to split it out.

For an application the physical location doesn't matter as much as the logical but for maintenance / performance the physical is rather important.

The fact that this bothers you on some level means that the structure may not be comfortable. Analyse the relationships between the data more closely and make a call.

  • the data is all tightly coupled.. aside from "leaf" tables, there are no tables which do not reference another table. The "it bothers me" aspect is more somewhere along the lines of being worried about somehow mangling user data with an update somehow, but I can't imagine how if I'm only inserting new data into application data tables. The IDs should never be changing of any of the data in those tables, so consistency shouldn't be an issue.. is this just paranoia..? – trycatch Mar 20 '12 at 04:19