How to handle unexpected schema changes to production database

Question

Is it a common or normal practice to encounter schema changes, especially renamed or removed columns, in a production database, without having been notified of the change? If unexpected schema changes are anticipated in the production database, how are they guarded against, other than basic exception handling?

I've worked mainly with credit union software, DoD software, high volume e-commerce web applications, real estate systems, robotics, and more recently CMMS, GIS, and SCADA systems. Although the '90s and early 2000s were more of the wild west, everything I have been involved with over the last 10 years or so has required change management. Modifications to a production database without change control is not allowed by policy and by security controls.

Comments in the accepted answer to this question - Why is “Select * from table” considered bad practice make me wonder how much of this is academic and how much is reality.

score 5 · Accepted Answer · answered May 10 '14 at 15:07

5

If I may be so bold ;)

The problem isn't really about unexpected schema changes to production, though that's certainly a symptom.

In reality it is:

"Why aren't different members of my organization communicating about changes that affect each other"

Once you view it that way then it makes more sense to look at:

current processes of code and database changes
current process for notifying others
current process for running tests for such changes
current methods used to communicate (email, IM, Project tracker, etc.)

etc.

answered May 10 '14 at 15:07

Michael Durrant

13,101
5
34
60

Our changes are communicated and planned. I've heard others say that unplanned changes in production happen all the time, and that it's normal, and that we should plan for it, but I don't consider it "normal" and am not sure about the best way to plan for it, other than minimizing the occurrences. Saying it’s normal and happens all the time is almost like saying there is no communication in the organization, or that people make a lot of mistakes. – Bratch May 20 '14 at 18:26
If you changes are communicated and planned, are you experiencing unplanned changes? Are you saying the issue is changes not made by developers locally? Please add some clarity on the actual problem you are facing rather than academic/theoretical – Michael Durrant May 24 '14 at 12:47
At work I have not faced this kind of problem for many years. The issue I have is hearing other experienced developers say that this happens all the time in real world companies. I completely agree with you that the root of the problem is poor communication, after that I would put the focus on preventing it from happening rather than planning for when it does happen. I didn't want to further clutter up the comments in the linked question about "select *" being a bad practice, so I asked a new question. – Bratch May 28 '14 at 17:44
re-reading I think that your "our" means just your group. I am guessing that changes are coming from other groups that are not as disciplined as you. Those changes should be addressed by better communication. – Michael Durrant Aug 04 '14 at 11:02

score 3 · Answer 2 · answered Apr 12 '14 at 19:29

The short answer: Log it, maybe notify the user, and maybe continue execution of your application.

Changes happen all the time. In production. Without warning. Often by mistake.

Even with change control policies carved in stone, mistakes happen. A DBA that is tuning a query drops a table. A developer with too much access accidentally removes a column. One of the DBAs moves your tables to a different schema. A database becomes corrupted.

These are some of the things that I have seen occur in live production databases over the years. Each one made me very glad there were backups of the production applications.

There will always be something somewhere that goes wrong. That is one of the reasons that you have error handling code in production applications: to prevent the failure of one section of code from causing the overall failure of all of it.

Errors

Log everything that is an error with the most details you can. Include the full error message (where possible). Make sure that your code can gracefully shutdown when an error is sufficiently detrimental to require it. And log that your app forcefully terminated itself. Displaying errors to the user is something that you should do in most, but not all circumstances.

Warnings

Log warnings, but only grab the relevant minimum information you need to diagnose and correct the issue. You may want to set some flag that will conditionally allow you to log warnings (and/or informational messages). This will help in keeping the log file small. Warning messages are mostly something that you will want to show to the user, especially when the user can make changes to correct the issue.

Informational Messages

Only log informational items when you need normally unimportant information. In production code, I have found it useful for informational messages to include when you enter and leave a function, method, properties (when they do more that get/put values), start up and shut down singleton instances (e.g., SerialPort). These informational messages do not need to be shown to the user.

I've seen some of what you describe happen, but it's rare, or not part of the normal process. We use INFO/WARN/ERROR, which helps find and fix after it happens, and of course we have backups and/or snapshots, also useful after it happens. — Bratch, Apr 16 '14 at 15:55

score 1 · Answer 3 · answered Apr 12 '14 at 20:52

Ideally, NO unexpected changes happen to a production system, EVER.
In reality of course, they happen all the time.

The way you handle it is by having proper procedures in place to minimise them, AND having proper procedures in place to deal with them when they happen (lists of people to call, things to check, rollback processes, etc. etc..
Risk can also be reduced by properly testing all changes that are scheduled as good as you can, in a situation as close as possible to the life system (identical hardware and software configuration for example), then having procedures in place that make it imperative that no manual changes are performed to production systems except those listed explicitly in the upgrade/installation manuals for your product.

Notice there's not a single piece of monitoring software, no coding practice, nothing technical in there. It's all procedural and documentation.

Combine that with proper access control to production systems (basically only the people who do installations and disaster recovery have write access there, and by procedure they only log in with those specific accounts when performing those specific tasks), and you've done all you can.
Only thing to remain to do in your code is to make it as robust as you can towards schema changes.
Of course you can't guard against a column you need being deleted or having its data type made narrower. But you can guard against a column changing to NOT NULL by ensuring all columns are always written with values, and you can guard against most column additions (obviously not NOT NULL columns) by writing your code so as to only retrieve and write explicit columns, only the minimum set you need for your operations.

We follow a dev-test-stage-production process for migrating changes, which are planned and expected, and we use access controls, which means unexpected changes in production should be very rare. We do these things and have very, very few surprises. — Bratch, Apr 16 '14 at 16:07

oɔɯǝɹ · Answer 4 · 2014-05-10T20:03:22.633

1

You don't.

Database / schema changes are not things that "just happen". Changes like this are not desired or supported.

Any change that may be needed will have to go through the regular release management and change management process, which includes testing the existing functionality (nothing is broken) as well proving that the change has the desired effect. We have a dev, test, staging and production environment just for this. We do not write code in production.

It may take a bit longer than just changing a database in production, but it is the only way to make sure users don't get confronted with random errors because "someone" changed "something".

You cannot code against a changing database schema. What if the DBA renames a table? What if the DBA drops related records? What if the DBA disables validation, or adds extra validation?

The only valid option is to log errors, and to let the application fail.

I suggest making clear rules with the operations team about these kinds of things. How would they like it if you start renaming servers for them?

In my environment we make schema changes only as part of a release, and we do that with tooling that transforms the schema to the exact form as designed. other changes that may be made in an environment will be undone by this.

edited May 10 '14 at 20:03

answered May 10 '14 at 19:54

oɔɯǝɹ

303
3
12

Ah, such wonderful idealism, "the only valid option". Try that on the execs of an e-commerce website losing millions per hour or the manager of a hospital with patient record access issues or the CEO of a startup with an investor demo in 10 minutes ;) – Michael Durrant May 11 '14 at 13:34
@MichaelDurrant pray tell me, who is responsible for the outage, when a DBA makes an untested change to a database? – oɔɯǝɹ May 11 '14 at 17:03
1

What i'm trying to say is, that this isn't idealism, or naive optimism, but experienced pragmatism. You can only build your house on a solid foundation. When you shake the foundation the house collapses. No building structure can prevent that, unless you decide to build in space, but that comes at some cost and loss of functionality. Now to address your startup CEO scenario, when a demo is that important, you wouldn't let anyone make changes to the application to be demostrated. Not the programmers, but (certainly) not the DBA. – oɔɯǝɹ May 11 '14 at 19:41
1

This answer is closest to what I have experienced. In the comments under an answer in the linked question, my view was considered "not real life production" or purely "academic" and that nobody does it the correct way, and a few others agreed with this view, but I do not. So on the ecommerce website, or while showing the hospital manager how to access records, or in the investor demo, someone makes a DB schema change and its okay? Seems like this is what could cause the ecommerce website to start losing money. – Bratch May 20 '14 at 17:57

How to handle unexpected schema changes to production database

4 Answers4