We have an application that has a mix of both fast (< 1 second) and slow database migrations (> 30 seconds). Right now, we're running database migrations as a part of CI, but then our CI tool has to know all of the database connection strings for our app (across multiple environments) which isn't ideal. We want to change this process so that the application runs its own database migrations when it starts up.
Here's the situation:
We have multiple instances of this application - around 5 in production. Let's call them node1, ..., node5
. Each app connects to a single SQL Server instance, and we're not using rolling deployments (all apps are deployed simultaneously as far as I know)
Problem: say we have a long running migration. In this case, node1
starts, then begins executing the migration. Now, node4
starts, and the long-running migration hasn't finished yet, so node4
also starts running the migration -> possible data corruption? How would you prevent against this problem or is the problem even important enough to worry about?
I was thinking of solving this problem with a a distributed lock (using etcd
or something along those lines). Basically, all apps try to acquire the lock, only one of them gets it and runs the migrations, then unlocks. When the rest of the apps start up and enter the critical section, all the migrations have already been run so the migration script just exits.
However, my gut is saying "this is overkill, there must be a simpler solution," so I figured I'd ask here to see if anyone else has any better ideas.