13

In discussions about NoSQL vs SQL databases, I sometimes hear that companies prefer to use schemaless NoSQL databases because it is problematic to migrate the schema to a new version. But is that really a big problem when doing upgrades? Are relational databases bad for such operations?

I read this blog post on the MongoDB blog: Why Schemaless?

Jonas
  • 14,867
  • 9
  • 69
  • 102

4 Answers4

21

Just because your NoSql database doesn't have a schema in a traditional sense doesn't mean there isn't a logical schema you need to deal with as it changes. In the case of a typical app using MongoDb, most likely your code expects certain fields of the json object to behave in certain ways. If you change the behavior, it follows you might want to update the already existing data in the database. Now, with traditional RDBMS this was a largely solved problem -- you just had to ALTER the underlying tables. But with these newfangled NoSQL databases, you have a decision -- do you write a script to munge and update all your objects? Or do you add code to convert between versions on the fly? If so, how long do you support v1 objects? Forever? Until v3?

I'll add that the example used in the MongoDb blog post is a bit simplistic and a very easy case to handle if you've got a decent update process no matter what the RDBMS is; adding a field rarely hurts. It is when you decide to split your Name field into FirstName and LastName that things get exciting.

callum
  • 10,377
  • 9
  • 30
  • 33
Wyatt Barnett
  • 20,685
  • 50
  • 69
  • With traditional RDBMS, this is NOT a solved problem. You still have to update all of the data besides updating the schema. This part is common for both SQL and NoSQL. – kawing-chiu Dec 05 '17 at 02:09
  • 3
    @kawing-chiu RDBMSes worth their salt have transactional DDL, which makes it a solved problem. Schema modifications and correction of the data to be done in a single transaction that can be rolled back. – Blrfl Sep 24 '18 at 10:38
19

But is that really a big problem when doing upgrades?

It can be.

Some organizations are -- well -- disorganized, and do a very bad job of schema migration.

  1. "Migration Weekend". Stop the servers. Back up and export all the data. Build the new schema (often by modifying the existing schema). Reload data or attempt to restructure in place.

  2. "Continuous Tweaking". Alter tables to the extent permitted by SQL. Without tracking the sequence of ALTER's performed. With no way to go back to a previous schema version. Where necessary, create new tables from existing tables, hopefully adjusting all applications to use the new tables. But -- lacking good QA -- leaving the old tables in place "just in case".

  3. "Full Panic". Simply prevent schema modifications. Make a big stink. Claim the risk is too high. Block all efforts in this direction. Take the schema hostage until forced to adopt some more sensible approach.

Are relational databases bad for such operations?

Any schema is a pain to migrate.

The largest issue is not technical.

It's semantic.

One principle reason for a schema change is that the previous schema doesn't match the problem domain very well. Since the semantics have changed, the database (and applications) need to change. Sometimes these are profound changes that require rethinking the way the applications work with the data.

Revising semantics of database can be very difficult.

What folks do instead of schema changes is just misuse the physical schema. They start loading wrong data into existing fields because they can. A "comment" field suddenly starts having an important piece of customer management information followed by "//" followed by the real comment. That grows to have to pieces of data "field 1 - field 2 // comment". The users have a spreadsheet that extracts this additional data from the comment field because the "real" application software had a schema to hard to change that IT refused to change it.

S.Lott
  • 45,264
  • 6
  • 90
  • 154
  • 9
    I feel dirty after reading this. – Michael Borgwardt Sep 19 '11 at 16:12
  • 3
    +1 for a great turn of phrase; "taking the schema hostage". A good analogy. Been there, experienced that. – Warren P Sep 19 '11 at 16:32
  • 1
    But the application also has to be upgraded anyway, so doeas really a Schemaless database help much? – Jonas Sep 19 '11 at 17:48
  • 1
    @Jonas: Your question is vague. But. Removing the restrictive SQL schema means you have one less thing to struggle with. So, trivially, "Yes, it helps." You **always** have application changes. Application changes without schema changes would be less work. Right? Or are you asking something different? – S.Lott Sep 19 '11 at 17:59
4

We upgrade production databases adding tables and (nullable) columns with no problems. Previous versions of the application work fine with the upgraded database, they just don't reference the new stuff. We avoid removing tables or columns, or changing how existing data is stored, though when this is necessary we produce appropiate conversion scripts. Whether your database has a declared type safe schema or not, changes in data structure require data conversion and application upgrades to interact with the new structure.

JGWeissman
  • 1,061
  • 8
  • 12
1

It depends.

First, if you have a really big database spanning multiple machines, then everything (not just database update) is going to be pain. (no matter how much you planned ahead of time).

Second, updating a database is NOT just a database thing -- it also depends on the bigger system of which the DB is part. This also includes the database deployment (many database servers, multiple data-centers, master-slave setups, etc.)

The pain can be eased by architecturing you system components such that they all have some sort of 'cognizance' of the DB schema change event. This means that the entire system should be tolerant of schema changes and can respond to it in a 'sane' way.

You can check out a utility developed by Facebook for tackling MySQL schema updates.

Also, there are standard best practices like turning you master read-only, making changes to slaves or on development copy, etc.

In any case, having a full backup and extensive test suite is a MUST. Only then, can you do any changes confidently and safely.

treecoder
  • 9,475
  • 10
  • 47
  • 84
  • But the application also has to be upgraded anyway, so doeas really a Schemaless database help much? – Jonas Sep 19 '11 at 17:48