How to safely run database migrations with multiple app instances?

Question

We have an application that has a mix of both fast (< 1 second) and slow database migrations (> 30 seconds). Right now, we're running database migrations as a part of CI, but then our CI tool has to know all of the database connection strings for our app (across multiple environments) which isn't ideal. We want to change this process so that the application runs its own database migrations when it starts up.

Here's the situation:

We have multiple instances of this application - around 5 in production. Let's call them node1, ..., node5. Each app connects to a single SQL Server instance, and we're not using rolling deployments (all apps are deployed simultaneously as far as I know)

Problem: say we have a long running migration. In this case, node1 starts, then begins executing the migration. Now, node4 starts, and the long-running migration hasn't finished yet, so node4 also starts running the migration -> possible data corruption? How would you prevent against this problem or is the problem even important enough to worry about?

I was thinking of solving this problem with a a distributed lock (using etcd or something along those lines). Basically, all apps try to acquire the lock, only one of them gets it and runs the migrations, then unlocks. When the rest of the apps start up and enter the critical section, all the migrations have already been run so the migration script just exits.

However, my gut is saying "this is overkill, there must be a simpler solution," so I figured I'd ask here to see if anyone else has any better ideas.

How about using a "migration status" table as your global/distributed lock? The single row would indicate if a migration is currently active and possibly what migration was executed last. — Bart van Ingen Schenau, Dec 24 '17 at 08:02

Doc Brown · Accepted Answer · 2017-12-29T23:02:37.543

Since you mentioned SQL server: according to this former DBA.SE post, schema changes can (and should) be put into transactions. This gives you the ability to design your migrations just like any other form of concurrent writes to your DB - you start a transaction, and when it fails, you roll it back. That prevents at least some of the worst database corruption scenarios (though transactions alone will not prevent from data loss when there are destructive migration steps like deleting a column or table).

So far, I am sure you will also need some migrations table where already applied migrations are registered, so an application process can check if a specific migration was already applied or not. Then utilize "SELECT FOR UPDATE" to implement your migrations like this (pseudo code):

Start a transaction
SELECT FROM Migrations FOR UPDATE WHERE MigrationLabel='MyMigration42'
if the former statement returns a value, end the transaction
apply the migration (roll back if it fails, log the failure and end the transaction)
INSERT 'MyMigration42' INTO Migrations(MigrationLabel)
end the transaction

That builds the locking mechanism directly into the "was the migration already applied" test.

Note this design will - in theory - allow to let your migration steps be unaware of which application actually applies it - it can be possible that step 1 is applied by app1, step 2 by app2, step 3 by app 3, step 4 by app1 again, and so on. However, it is also a good idea not to apply migrations as long as other app instances are in usage. Parallel deployment, as mentioned in your question, may already care for this constraint.

score 1 · Answer 2 · answered Dec 29 '17 at 21:23

Maybe you can find a library that supports database migration with multiple nodes.

I know about two libraries in the Java world, both of them support what you need:

Liquibase: From their FAQ: Liquibase uses a distributed locking system to only allow one process to update the database at a time. The other processes will simply wait until the lock has been released.
Flyway: From their download page: Safe for multiple nodes in parallel ✓

There are probably other tools for Java and other languages as well.

If you are unable (or don't want) to use such a tool, a table can be used as a lock or even as a migration log, see Doc Browns answer for an example.

How to safely run database migrations with multiple app instances?

2 Answers2