Suppose I have 4 types of services I offer (they are unlikely to change often):
- Testing
- Design
- Programming
- Other
Suppose I have 60-80 of actual services that each fall into one of the above categories. For example, 'a service' can be "Test Program using technique A" and it is of type "Testing".
I want to encode them into a database. I came up with a few options:
Option 0:
Use VARCHAR
directly to encode service type directly as a string
Option 1:
Use database enum
. But, enum is evil
Option 2:
use two tables:
service_line_item (id, service_type_id INT, description VARCHAR);
service_type (id, service_type VARCHAR);
I can even enjoy referential integrity:
ALTER service_line_item
ADD FOREIGN KEY (service_type_id) REFERENCES service_type (id);
Sounds good, yes?
But I still have to encode things and deal with integers, i.e when populating the table. Or I have to create elaborate programming or DB constructs when populating or dealing with the table. Namely, JOINs when dealing with the database directly, or creating new object oriented entities on the programming side, and making sure I operate them correctly.
Option 3:
Don't use enum
, do not use two tables, but just use an integer column
service_line_item (
id,
service_type INT, -- use 0, 1, 2, 3 (for service types)
description VARCHAR
);
This is like a 'fake enum' that requires more overhead on the code side of things, like i.e. knowing that {2 == 'Programming'}
and dealing with it appropriately.
Question:
Currently I have implemented it using Option 2, guided under concepts
- do not use enum (option 1)
- avoid using a database as a spreadsheet (option 0)
But I can't help to feel that seems wasteful to me in terms of programming and cognitive overhead -- I have to be aware of two tables, and deal with two tables, vs one.
For a 'less wasteful way', I am looking at Option 3
. IT is lighter and requires essentially the same code constructs to operate (with slight modifications but complexity and structure is basically the same but with a single table)
I suppose ideally it is not always wasteful, and there are good cases for either option, but is there a good guideline as to when one should use Option 2 and when Option 3?
When there are only two types (binary)
To add a bit more to this question... in the same venue, I have a binary option of "Standard" or "Exception" Service, which can apply to the service line item. I have encoded that using Option 3.
I chose not to create a new table just to hold values {"Standard", "Exception"}. So my column just holds {0, 1} and my column name is called exception
, and my code is doing a translation from {0, 1} => {STANDARD, EXCEPTION}
(which I encoded as constants in programming language)
So far not liking that way either..... (not liking option 2 nor option 3). I do find option 2 superior to 3, but with more overhead, and still I cannot escape encoding things as integers no matter which option I use out of 2, and 3.
ORM
To add some context, after reading answers - I have just started using an ORM again (recently), in my case Doctrine 2. After defining DB schema via Annotations, I wanted to populate the database. Since my entire data set is relatively small, I wanted to try using programming constructs to see how it works.
I first populated service_type
s, and then service_line_item
s, as there was an existing list from an actual spreadsheet. So things like 'standard/exception' and 'Testing' are all strings on the spreadsheet, and they have to be encoded into proper types before storing them in DB.
I found this SO answer: What do you use instead of ENUM in doctrine2?,
which suggested to not use DB's enum construct, but to use an INT
field and to encode the types using 'const' construct of the programming language.
But as pointed out in the above SO question, I can avoid using integers directly and use language constructs -- constants -- once they are defined....
But still .... no matter how you turn it, if I am starting with string
as a type, I have to first convert it to a proper type, even when using an ORM.
So if say $str = 'Testing';
, I still need to have a block somewhere that does something like:
switch($str):
{
case 'Testing': $type = MyEntity::TESTING; break;
case 'Other': $type = MyEntity::OTHER; break;
}
The good thing is you are not dealing with integers/magic numbers [instead, dealing with encoded constant quantities], but the bad thing is you can't auto-magically pull things in and out of the database without this conversion step, to my knowledge.
And that's what I meant, in part, by saying things like "still have to encode things and deal with integers". (Granted, now, after Ocramius' comment, I won't have to deal directly with integers, but deal with named constants and some conversion to/from constants, as needed).