46

I've been browsing SQL dumps of some famous CMSes, including Drupal 7, Wordpress (some quite very old version), and some custom application based on Python.

All of these dumps contained data with string flags instead of integer ones. For example, a post's status was represented as published, closed, or inherit rather than 1, 2, or 3.

I have quite limited experience in design of databases and I have never went past simple SQLs, but I was always taught that I should use numeric/integer flags for data like this. It's obvious that tinyint consumes much less space in a database than, for example, varchar(9).

So what am I missing? Isn't this a waste of data storage and a data redundancy? Wouldn't browsing, searching and indexing be a bit faster if these columns used integers instead of strings?

gnat
  • 21,442
  • 29
  • 112
  • 288
trejder
  • 2,386
  • 3
  • 19
  • 39
  • 7
    Are you sure they don't actually use https://dev.mysql.com/doc/refman/5.0/en/enum.html which will look like a string in dump. Either way I think these days it almost counts as a micro optimisation. – Esben Skov Pedersen May 21 '15 at 09:25
  • 2
    Somewhat related: [Best way to store enum values in database - String or Int](http://stackoverflow.com/q/1612319/3235496) – manlio May 21 '15 at 09:28
  • 2
    This question is fundamentally an appeal to authority. – DeadMG May 21 '15 at 19:52
  • 3
    Not a full, answer, but...you know the scripting language Lua? Renowned for being direct and high-performance, used to write entire game engines, etc? Surprisingly enough...they never bothered having a number type at all. Their string handling code is so effective, they can add numbers together that are actually strings, in time-sensitive game engine code. Like JavaScript, they don't even have objects - just very fancy hash tables. The C programmer's view of "a huge array of `char`s? How inefficient!" is outdated compared to 2015. – Katana314 May 21 '15 at 20:30
  • 2
    Edited to remove the "appeal to authority" part and reopen-voted, since the question about using strings rather than ints is perfectly on-topic as long as it's not specifically about those "authorities". – Ixrec May 23 '15 at 10:36
  • Why would you want to? – user16217248 Mar 24 '23 at 05:05
  • @user16217248 Eee... Because others does like that? Did you read the question? Did you understand it? "_I've been browsing **SQL dumps of some famous CMSes**, including Drupal 7, Wordpress (...) All of these dumps **contained data with string flags instead of integer ones**. For example, a post's status was represented as `published`, `closed`, or `inherit` rather than `1`, `2`, or `3`._". – trejder Mar 24 '23 at 22:37

2 Answers2

66

Yes, storing strings instead of numbers can use more space. The reason that high-profile pltforms are doing it anyway is that they think the benefits of that solution are greater than the cost.

What are the benefits? You can easily read a database dump and understand what it's about without memorizing the enum tables, and even semi-official GUIs might simply use the values themeselves rather than transform the record they get. (This is a basic form of disk space/processing time tradeoff.)

What about the cost? Data storage capacity hasn't been the bottleneck in CMS for a long time, since disks have gotten so large and so cheap. Programmer time, on the other hand, usually becomes more expensive - so anything that trades development effort for disk space is also a good thing, from a business perspective.

Kilian Foth
  • 107,706
  • 45
  • 295
  • 310
10

Yes, storing things such as yes or true will take more space than a tinyint. This should not be surprising. It also makes indexing and thus joins less efficient for the database. It also has the penalty of possible confusion for what is the correct value (yes vs y).

However, there are many approaches that look similar to storing strings in the database (in particular MySQL) that are efficient.

First, MySQL has an enum type (docs) which can look very much like a Boolean or restricted set of strings when set up that way. It also enforces only valid values are entered. This is often much more useful than storing 1, 2 or 3 as a value as the meaning is conveyed with the information. The enum comes with the penalty that a schema change is required to add or remove types.

This brings us to a child table and foreign keys (applicable to all databases). Yes, you are storing some value as a key (back to the 1, 2 or 3) and the value of published, closed, and inherit are stored in another table. Using a view (docs) it is then possible to make it look like the table contains the string rather than the key. This has the advantage that no schema change is required to add or remove entries from the child table.

Exactly how things are stored would require one to look at the actual DDL of the schema to determine what method is used and get some hint of which trade offs they have selected.