62

I have done a fair bit of work with relational databases, and think I understand the basic concepts of good schema design pretty well. I recently was tasked with taking over a project where the DB was designed by a highly-paid consultant. Please let me know if my gut intinct - "WTF??!?" - is warranted, or is this guy such a genius that he's operating out of my realm?

DB in question is an in-house app used to enter requests from employees. Just looking at a small section of it, you have information on the users, and information on the request being made. I would design this like so:

User table:

UserID (primary Key, indexed, no dupes)
FirstName
LastName
Department

Request table

RequestID (primary Key, indexed, no dupes)
<...> various data fields containing request details
UserID -- foreign key associated with User table

Simple, right?

Consultant designed it like this (with sample data):

UsersTable

UserID  FirstName   LastName
234     John        Doe
516     Jane        Doe
123     Foo         Bar

DepartmentsTable

DepartmentID   Name
1              Sales
2              HR
3              IT

UserDepartmentTable

UserDepartmentID   UserID   Department
1                  234      2
2                  516      2
3                  123      1

RequestTable

RequestID   UserID   <...>
1           516      blah
2           516      blah
3           234      blah

The entire database is constructed like this, with every piece of data encapsulated in its own table, with numeric IDs linking everything together. Apparently the consultant had read about OLAP and wanted the 'speed of integer lookups'

He also has a large number of stored procedures to cross reference all of these tables.

Is this valid design for a small to mid-sized SQL DB?

Thanks for comments/answers...

Jim
  • 1,997
  • 3
  • 20
  • 25
  • 12
    Oh boy, if this makes you say WTF, then you probably have not seen tables with 200+ columns and stored procedures more than 1000 lines long. – Job Sep 29 '11 at 18:02
  • 43
    +1 for not deleting after feeling embarassed. Thanks for leaving this so others can learn. – Wayne Koorts Sep 29 '11 at 18:09
  • 2
    @Job - actually, I haven't - I'm not a DBA by trade (pretty obvious by now! lol), so my SQL WTF threshold is fairly low. Although, my completely missing the point of the consultant's design has me WTF'ing my own abilities. Ever have a day where you just feel *dumb*? – Jim Sep 29 '11 at 18:11
  • 10
    @Jim: Congratulations, you've turned a _dumb_ day into an _enlightened_ day. – Wayne Koorts Sep 29 '11 at 18:14
  • @Jim, I feel pretty smart on the weekends. – Job Sep 29 '11 at 18:15
  • And for any future random questions, the Stack Overflow chatroom for MySQL is here for you! http://chat.stackoverflow.com/rooms/592/mysql – TehShrike Sep 30 '11 at 04:28
  • 3
    Curse those highly paid consultants! – davidsleeps Sep 30 '11 at 04:59
  • A great way to learn is by working daily with other people's code/design. – coder Sep 30 '11 at 14:47
  • @Job then have 5000 SP's in a db that has 200 tables like that all done with no naming conventions or data standardization... Yes I have had a db like that and management had no desire to fix it because its not broken. – SoylentGray Sep 30 '11 at 16:34

6 Answers6

73

Makes perfect sense to me. It's just very normalized, which imparts a lot of flexibility that you wouldn't have otherwise. De-normalized data is a pain in the butt.

Blrfl
  • 20,235
  • 2
  • 49
  • 75
  • your answer makes perfect sense, and looking over my question and the schema maybe it is just the sheer number of tables he is using that confused me. I greatly simplified the example for my question, but I see how the concept is sound - he is just splitting things off much more than I would. Sigh, I guess it's a good thing I'm not a DBA! :) – Jim Sep 29 '11 at 17:12
  • Learn to design by the ten-minute rule: "What holds true now probably won't in ten minutes." Make sure your designs can deal with change. – Blrfl Sep 29 '11 at 17:14
  • 1
    This schema actually has the advantage that when an employee is inserted, their department has to exist. – Simon Richter Sep 30 '11 at 14:57
  • @SimonRichter: It is not true. The Employee can be created without any Department existing, and also the reverse. – Daniel Dinnyes Jul 01 '12 at 20:31
  • @SimonRichter The benefit of this design is, firstly, that the Department is a separate entity, and secondly, that there is a many-to-many relationship between Department and Employee, as opposed to the OPs example, where it was "many-to-one-ish" (could not say many-to-one, as no separate Department entity it was referencing to be called a relationship). – Daniel Dinnyes Jul 01 '12 at 20:34
  • Future proofing database and code architectures (to a sane degree, of course) cannot be understated... I'm currently dealing with a total lack of any solid architecting or future proofing in a project I had to take over. It's not fun. – Nick Bedford Jun 20 '19 at 02:46
48

I don't think that either a WTF is warranted or that the guy is doing any mad genius kind of design - it's pretty standard database normalization.

The reason for the department table is that if you don't put the departments into a separate table you will have to deal with users in "Sales", "sales", "Salesmen", "Sails", and "Selling" departments, unless you do something to prevent it. And having the extra table is (part of) the best way I know to do that.

Whether there should be a UserDepartment table is a tougher call, which of course means neither decision is way out and crazy. On the one hand it's a pain when all your table design and logic had assumed one department per user and then that changes, on the other hand doing an extra join for no reason for years and years is a real possibility and also a pain.

I personally would agree with you that the UserDepartment table is probably overkill. Even if it's included, the chances are that over time people will write queries that assume there is only one user per department, so you will end up with the worst of both worlds - an extra join for no reason prior to needing the table, and code not working anyway once more than one department per user becomes allowed.

EDIT - A key driver of whether the many to many relationship should be supported is if the business rules are clear. If you have no idea how a user in multiple departments would even work there isn't much point in adding the table, since your code can't possibly correctly handle the cases where a user is in multiple departments.

Imagine that you did allow many departments per user, just in case. You then implemented a business rule for assigning commissions, based on department. Then multiple departments were allowed. Fortunately, you also had the foresight to write your commission code in a way that took this into account. Unfortunately, you added the commissions from each department for users in both. Management wanted you to base in on the persons role for each sale. So, how much good was having the table in advance? What about the other tables you had "just in case" that are never needed at all?

LATER EDIT - Another reason the consultant might have wanted to add all those intermediary tables is addressed in this follow-up question, the answers to which give some reasons why refactoring a database is usually harder than refactoring code, which would tend to push you towards the "put in all the tables you might ever need" approach.

psr
  • 12,846
  • 5
  • 39
  • 67
  • I think you put into words what my WTF was - the guy is using TONS of these intemediary tables, and it just seemed so stupid to me. Now that I've broken it down into a much smaller example for this question, I feel rather stupid for posting it since it doesn't seem that bad. – Jim Sep 29 '11 at 17:18
  • 5
    As you can see by lots of the comments, there is a healthy skepticism about "there will only ever be one X per Y" comments. The consultant is covering himself from "how come there can only be one X per Y" complaints. Some of which will probably come up. But he won't be responsible for maintaining code that has many joins (not too bad, but harder) and that has to be correct against business rules that don't yet exist (bad) - imagine the question "why do users get ALL the permissions from each department, they should get the LOWEST of each permission" or some such. – psr Sep 29 '11 at 17:31
  • @psr I think there's a typo: shouldn't "queries that assume there is only one user per department" be "queries that assume a user is in only one departement" ? – BiAiB Sep 30 '11 at 12:19
  • @BiAiB - you are correct, that is what I meant to say. – psr Sep 30 '11 at 17:44
14

If the requirement is to have multiple departments per user, this design makes sense. The only gripe about it is the UserDepartmentTable having a surrogate key UserDepartmentID that is not needed (just make the UserId and DepartmentId a composite primary key).

If a user only ever belongs to a single department, your design makes sense (though a department lookup table would still be a good thing).

Oded
  • 53,326
  • 19
  • 166
  • 181
  • No, only one department is ever possible per user. – Jim Sep 29 '11 at 16:52
  • 18
    ...Until more than one department is possible per user. – Blrfl Sep 29 '11 at 16:53
  • 1
    Exactly, @Blrfl. Today's not-ever-going-to-happen is tomorrow's the-CEO-is-having-an-aneurysm-because-it-doesn't-do-it. – Adam Crossland Sep 29 '11 at 17:09
  • @blrfl - good point. – Jim Sep 29 '11 at 17:14
  • 1
    @Blrfl: That doesn't make much sense. Anything can be changed in the future. Why do departments get a special treatment? Tomorrow the CEO could want each user to have multiple first/last name pairs just as well. – configurator Sep 29 '11 at 17:41
  • 2
    Part of deciding what's worthy of that kind of treatment is understanding the problem domain. In some applications, it might be important to know that employee #3804 has been known to the company as Ann Smith and Ann Jones (after getting married), which would make normalizing the name out of the table of employees a sane thing to do. In Jim's case, it might be worth expanding the breaker table to keep a history so that if Ann moves from HR to IT, the fact that an old request tied to her could reflect that it was really HR's request and not IT's. – Blrfl Sep 29 '11 at 18:01
  • 8
    YAGNI - databases can be refactored. – JeffO Sep 29 '11 at 18:26
  • 2
    @Oded, Some ORM mappers like Entity Framework do not work well with tables that have a composite primary key. – maple_shaft Sep 29 '11 at 19:21
  • @maple_shaft - I guess some ORMs are sub standard ;) – Oded Sep 29 '11 at 19:27
  • 1
    I must be an ORM mapper because I hate composite keys too. Doesn't mean I won't use them. – JeffO Sep 29 '11 at 19:56
  • @Oded: ORMs are expected to catch up to SQL 92 in five to ten years. – Mike Sherrill 'Cat Recall' Oct 01 '11 at 00:59
  • @Catcall - When are they expected to catch up to SQL 99? 2040? – Oded Oct 01 '11 at 07:00
  • @Oded: No. Developers will stop using ORMs in six years. – Mike Sherrill 'Cat Recall' Oct 01 '11 at 10:00
  • 1
    @MikeSherrill'CatRecall' we're still there :) – Walfrat Aug 23 '17 at 11:24
5

Some requirements are not clear in your question. The correct answer depends on what your customer wants - If I were you, I would ask the customer about this:

0-What is the difference between a user and an employee?

1-Assuming an employee=user, what if an employee changes departments?

2-Can a group of employees make 1 request?

3-Could an employee belong to more than one department? What about the CEO

4-Is there a subset of employees that are allowed to make requests?

5-What happens to the request when an employee record is deleted (if ever)?

6-Could you delete a request? What happens when the request is delted (make sure you don't delete the employee record by RI)

7-Can the employee make the "same" request more than one time (define the "same")

8-How to handle requests for employees when they leave the company (cancel their requests or delete the requests?)

There may be more questions, but my point is that the solution depends on exact requirements and project scope. Once that is determined the schema can be derived directly. Accordingly both solutions presented may be correct.

NoChance
  • 12,412
  • 1
  • 22
  • 39
1

I'd like to add a couple point form notes talking explicitly about some of the potential advantages of using a join table in the fashion that your highly-paid consultant did.

  • Properly indexed (e.g., if UserDepartmentTable indexes the two foreign keys), there is only a small performance loss of a join table like this due to the foreign keys not being unique. If the foreign keys are guaranteed to be unique, the by the little database theory I know, looking up UserDepartmentTable.Department is no "harder" than looking up any other column in the User table.
  • The join table gives you more flexibility to set up other information about the association between the user and the department (e.g., timestamps on creation).
  • The join table allows you to "version" the association fairly easily (e.g., when a user changes departments, trigger some index boolean flag like UserDepartmentTable.Active to false and create a new association that is active). It's also possible to have department association versioning with the two-table model (just User and Department), but it's tougher and necessitates adding at least one more column or doing database acrobatics in order to avoid duplicating primary keys.
  • It allows you to assign one-to-many or many-to-one or many-to-many associations pretty easily.

That being said, there are several reasons NOT to do what your highly-paid consultant did.

  • All the above benefits are all anticipatory of possible future needs, overcomplicating things for the present day. It is not YAGNI-compliant. It's of trivial difficulty later to write a migration that moves from your two-table model to a join table model. You can do that when the business need emerges. To do it before that can be confusing.
  • It confuses other developers. While, yes, I would say the expectation for a web developer of your stature (where you're reviewing consultants' decisions) would be to understand and recognize a join table, it's still more complicated than necessary and considering the lack of business need, it causes confusion.
Steven
  • 356
  • 1
  • 6
  • nice analysis - however, I wouldn't say I have any stature as a dev in my day job, except that I am the only one here who knows *anything* about db/c#/vb/etc... so guess I'm part time dev by default. this is a fairly small project, so the consultants sheer number of tables and joins left me saying "wtf" (but thanks to you fine folk I am now saying "oic...") – Jim Sep 30 '11 at 18:09
  • Quite an old topic, but still relevant... refactoring may be very hard, imagine you need multiple departments in the future instead of one, but have only a department ID in Users as FK. You will likely end up with duplicate referendes (Users.DeptID and UsersDepartmentsTable) or complete garbage, like comma-separated lists in Users.DeptID, or XML. The correct solution could not be easily added, as suggested by YAGNI or KISS, but would be obstructed. – Erik Hart Oct 05 '15 at 09:45
0

Without full structure of needed information I can't say is it terrible or not. But at least shown piece is not of "WTF" designs. It's just seems as 3-rd Normal Form of data-structure (well, we have theoretically also 4-rd and 5-th also)

Some talks can have place for UserDepartmentTable between two schools of "natural" and "artificial" keys in shown piece. Nothing more, as I can see

Normalization is rule of good DB-developer/designer for a lot of reasons, *de*normalizations are used sometimes in the middle of developments for speed-win mostly

Lazy Badger
  • 1,935
  • 12
  • 16