Refactoring or upgrading databases to handle new features

Question

Several responses to a database schema question, suggested an additional table to normalize a database for a feature that is not part of the current requirements (A UserDepartment table to allow for a many-to-many relationship between employees/users and different departments they may belong to.).

Not against normalization. Seems like when it comes to database design, there is a strong push to include features that they're 'sure' someone will want in the future. Is it so difficult to add tables/fields to the database to accomodate features that there is a tendancy to over-engineer? Wouldn't they be refactored or upgraded just like the rest of the app if needed? Redoing things is never fun, but moving data from one table to a new one can be done. Just not sure where this line of thinking will end.

Edit: There is so much of an aversion to this, I wonder how many projects end up not adding a feature that requires a drastic database change or are non-normalized approaches taken like adding a DepartmentID2 field instead of a new table. The need for multiple departments for an employee is a common domain problem. I just haven't noticed many database schemas that are littered with many-to-many relationships.

+1 Thanks for asking this. I learned a lot reading the responses to my original question, and this is an insightful thread as well. — Jim, Oct 01 '11 at 03:17

score 14 · Answer 1 · answered Sep 29 '11 at 19:04

14

Refactoring code is easy--you simply change the code and run your regression tests.

Refactoring databases is hard--you've got to move (a potentially huge amount of) data around, make sure none of it is dropped, make sure the constraints are maintained in the new schema. And, if you have audit requirements on the data, you've got to be able to explain why it's organized differently and be able to match pre-refoctor data to post-refactor data. Also, none of your old back-ups will match the new schema, which is yet another risk.

Scary stuff.

answered Sep 29 '11 at 19:04

Matthew Flynn

13,345
2
38
57

Database tests shouldn't be any different. All changes require an audit and affect backups. How much data are you going to accumulate before recognizing this need? If you've converted data, this feature would be even more obvious. – JeffO Sep 29 '11 at 19:53
8

+1 for @Mathew Flynn. How much data are you going to accumulate before recognizing this need? MILLIONS of rows. Another problem is that many times YOUR app isn't the only thing using the database. The database could have many apps working with it and you might not even know they exist (eg wild "BI" apps). Changes in database schemas _are_ scary. – Angelo Sep 29 '11 at 20:21
2

Sometimes billions of rows – HLGEM Sep 29 '11 at 20:55
1

If you're dealing with Billions of rows, you better know how to move them – JeffO Sep 29 '11 at 23:21

score 4 · Accepted Answer · edited Dec 14 '11 at 17:38

4

There is an entire book written about database refactoring. Just like with code refactoring, there are standard ways to do database refactoring. The only difference is that when doing code refactoring, you don't have to consider the state of the object/code, while in databases you have to consider the data, because losing data is not good for the users (or for anyone, actually).

You can read more about database refactoring here.

edited Dec 14 '11 at 17:38

answered Oct 05 '11 at 14:44

Pramod

56
1

This site is what prompted the question in the first place ;) – JeffO Oct 07 '11 at 07:45

score 3 · Answer 3 · answered Sep 29 '11 at 19:00

3

There is a fine line between spending lots of time over-engineering and investing a bit of your time to add just enough features to save you a considerable amount of time in the future.

answered Sep 29 '11 at 19:00

0x4B1D

241
1
3

1

You could make this arguement for an isolated instance or two, but when do the 'bits' of time add up to too much? – JeffO Sep 29 '11 at 19:13
From my own experience, it's actually the case for the vast majority of the projects. But I'd also guess that it comes with experience and is highly subjective :) I would be surprised if someone can give you an exact recipe (hence the 'fine line'). – 0x4B1D Sep 29 '11 at 19:26
@Jeff O: It is not going to be 'bits'. A 10% or 20% investment of development time in hardening is necessary because the system may outlast both the originally envisioned timeframe and your employment. – rwong Sep 30 '11 at 13:16

score 3 · Answer 4 · answered Sep 29 '11 at 19:00

I think the theory is that if you include a link table to support a many to many relationship between 2 tables, then even if really only many to one relationships exist in the data, everybody will write the SQL in such a way that if ever a many to many is supported everything will "just work".

In practice I have not usual found that this is true, but I suppose the SQL is closer to what it needs to be to support the many to many than it would otherwise have been.

But to get specifically to your question, there actually is a fair amount of pain converting a relationship from 1-to-many to many-to-many. The reason is that SQL is not designed with the same kinds of encapsulation goals that objects are, and most queries use more tables on the database layer than people would be comfortable having an object in the business layer have visibility to.

Therefore a change to a many to many relationship will impact every query that involves the original 2 tables, often a much wider cascading effect than will happen on the business layer. So people go to significant lengths to prevent this from happening.

IMHO this wouldn't be needed if we had a better language than SQL to specify the relational algebra. If it were feasible to build up a SQL query piece by piece by objects that didn't need visibility to every table in the query this wouldn't happen. Things like LINQ (to SQL or to Entities) attempt to solve this, but it is a very complex solution and tough to optimize (and I've been to DBA user groups where LINQ is mentioned and a collective groan goes up every time). I dream of a database language that is universally supported with first class relational algebra functions...

In the meantime, yes, you can refactor from 1-to-many to many-to-many, but it can be a lot of work.

You're not going to turn every relationship into a many-to-many? — JeffO, Oct 07 '11 at 07:41
@Jeff O - Not sure I understand your question. When in doubt I model as many-to-many to avoid the pitfalls mentioned in various answers to your original question. I've grown a little more wary of that after maintaining databases that really did make almost all relationships many to many, because they had ended up doing things like creating views that made the relationships appear 1-to-many (which, in practice, they all were). So they had the worst of both worlds. I've never had that happen on my own designs, but it's out there as a cautionary tale. — psr, Oct 07 '11 at 17:32

jqa · Answer 5 · 2011-10-08T01:33:00.043

I usually explain it this way to the PHBs - code is the walls and roof, the database is the foundation.

Moving the walls and changing the roof can be done. Changing the foundation around requires a lot of digging and rebuilding the walls and roof.

What inexperienced developers (and college professors) say is "over engineering" is what experienced developers call "future proofing". Despite what the spec says you know what will probably change during the ALM or where the performance problems will occur so you want to get your table structure right to start with.

Rolling out update scripts to customer servers is a non-trivial project and each of the customers' DBAs are all over you wanting to the triple check it all. Some extra columns and tables aren't so bad after all.

James Anderson · Answer 6 · 2013-06-04T05:13:22.717

The general rule is if a relationship is one to one but may in future be many to many then make it a many to many.

The employee/department is a classic example. In most small companies this is effectively a one to many relationship most of the time. However there is almost always a situation where it becomes many to many -- one of your engineers moves up into management, but, is still responsible for supporting a product he developed while he was in engineering, or, one of your sales people moved to product development, but, because he has a close relationship with an important customer he is still lead salesman for that customer.

It doesn't cost much more if a one to many is implemented as a many to many -- but refactoring a database and application to support a many to many is expensive and fraught with difficulty.

I agree there are many mature domains (like HR) where the client doesn't anticipate the need, but you're aware that it's bound to happen. — JeffO, Oct 07 '11 at 07:43

score 0 · Answer 7 · answered Sep 29 '11 at 21:34

There are two ways to look at design of software (and probably many other things) - A tactical view or a strategic view. Each has its own advantage and drawbacks.

Even with OO software modifications is still a pain, not only the coding part is difficult, but the process of promoting a change to production in a complaint environments (given the current state of tech.) is unreal for large systems that are supposed to be working 24/7.

I follow my principle that says: "When possible, design shared software artifacts strategically" - This may sound like it goes against the YAGNI principle in some way, however, this is my opinion. This approach guarantees less re-work on the cost of complexity and resources.

In your case, the activities required to add a new junction table would include: design, design approval, changing the schema, re-writing several methods for CRUD for 3 tables (with the exception of some reads), building indexes, create GUI for the CRUD for the new table, to allow the user to select the PKs in create, update of the new table, etc. Oh, and by the way don't forget unit testing, user acceptance testing, system testing and production promotion.

If this is not enough, the real nightmare comes from information loss. If you did not have the junction table to begin with and you decided to capture the dates where the association/separation between an employee and a department occurred, you won't be able to automatically populate the date on the junction table. You got to enter those manually (if you have the data).

So, it is better to foresee this from the start.

Everything is better to foresee from the start. – JeffO Oct 07 '11 at 07:44 — JeffO, Oct 07 '11 at 07:44

score 0 · Answer 8 · answered Sep 29 '11 at 21:37

As Matthew said above, refactoring/changing databases is often more involved compared to software as managing data also needs to be taken into consideration. There are techniques that can help eg ensure that you have an appropriate suite of database unit tests, decouple client applications from your base schema by using a 'DB API' - sprocs/views etc.

Refactoring or upgrading databases to handle new features

8 Answers8

Linked