Dealing with data, when your database is under version control

Question

Currently my database is not under some kind of vcs, we can get deltas but that's about it. I would like to try and make product deployments more automated, and less time consuming. I understand that placing a db's schema files under source control allow you to manage versions, and that these files are basically for dropping the old tables/indexes/etc and then adding the new versions.

My question is, what about the data that's already there? By dropping everything we'd lose all of the data. So, we would have to do a data dump before updating the database, and then re-load the data back after the update has been done. Problem is, some of our largest databases have 80+ GB of data, and we probably have a total of 20 sets of databases (6 DBs per set).

I'm sure that this would work, but given the size of everything, is there a simpler solution that would cut out the need to dump and reload everything each time a schema update took place? And, if not, wouldn't we have to dump the data such that reloading it took in to account the new schema?

What language is your project in? Several languages/frameworks have premade solutions for schema management — Daenyth, Aug 14 '14 at 16:33
Progress? You have my deepest sympathies. You may wish to consider getting a consultant out there since most people are unaware of the related implications of such a solution - this isn't something that one can take a mysql solution and reapply it to postgres or oracle or informix... this is something quite different (even though its a database too). You might consider asking over at [DBA.SE](http://dba.stackexchange.com/questions/tagged/progress-database) as there are people who are more familiar with that particular solution... and again, consider a consultant. — , Aug 14 '14 at 19:28
... and when I said asking, if you want to ask *this* question, consider flagging it for migration since it has an answer already rather than cross posting it to another site. Stack Exchange works best when there is only one copy of the question on the network rather than multiple copies of it with different answers. — , Aug 14 '14 at 19:58

score 2 · Answer 1 · answered Aug 14 '14 at 16:49

What we used in a company I used to work at is Liquibase. It is a solution in Java, but since you only ever need to run it once per DB update it will likely work for other languages as well. I'm pretty sure there are other solutions that work in a similar way, so I'm just going to describe how it functions in general so you can search for something that works for you:

You have a file (or set of files) with changelogs - commands that need to be run on a DB to update it from version a to version b. The DB itself then has a special table that tracks which of these commands have been executed against it along with checksums, timestamps and other misc info.

Then when you need to make an update to the schema - add the commands that need to be run to update the DB without dropping the data (likely UPDATE statements). Then when you actually run the update, the updater will find the version of the DB schema, find all updates that need to be run to get from its current version to the latest version and run only those in correct order.

That way you do not recreate the schema every time and don't need to reload the data. It's also automated and fairly quick compared to doing it manually or recreating the DB completely

+1, unfortunately I don't have enough reputation on stackexchange to do it officially. Not sure why I didn't think of doing an update as opposed to an add/drop. Will need to see if Liquibase would work with Progress OpenEdge databases, and or if we could automate the functionality of Liquibase ourself. — SolomonS, Aug 14 '14 at 16:55

Dealing with data, when your database is under version control

1 Answers1

Linked