- complex means that there are many parts;
- complicated means that it is difficult to understand;
Complex tables but simple design
Complexity is not bad. A table with many columns may be perfectly legitimate and simply represent items / facts / events with many attributes:
- The number of columns just show the wealth of data collected on a complex reality;
- The table could nevertheless stay simple to understand: just focus on the columns of interest, select the relevant rows, aggregate them and/or sort them.
Complex table and complicated use
When complexity starts to be an issue, it may just be because the number of parts exceeds our capacity. In this case, the most suitable strategy is “Divide et impera” as the Roman emperor Julius Caesar used to say:
- If many queries appear similar, just create views. Views store a base query and allows you to use it as if they were smaller tables. This makes repetitive querying much simpler. The real storage of data remains the unchanged big table.
- If groups of column are used very differently, the table could be divide into several lighter tables with less columns. The only constraint is that you find some redundant data stored in both tables that allows to reassemble the rows (with a join). This kind of splitting is delicate and you'll also need to adapt the data feeding into your database.
Too complicated to use: bad design?
Often, though, the large number of columns appears to reveal a bad design that was made unnecessarily complicated. Generally this situation grows historically: initially the design and the table is simple. Columns are added over and over again without too much thought. And one day the table's a monster.
The two main issues are:
Repetitive columns: this happens when data is squeezed into a single table instead of being better structured as separate but related tables. Data is simple to feed. Some queries are very simple. But many queries become overly complex due to a long enumeration of columns. And advanced queries become difficult to automate
Merging unrelated data: a table is meant for storing related data. If a part of the column is independent of another part, you have mixed appels and oranges and some data is repeated unnecessarily.
In both cases, the cure is simple: put together what belongs together, but keep separate independent things. In database jargon, this exercise is called normalisation.
How to do it?
As explained, just identify the independent data to put it in separate tables:
- for repetitive columns, the trick is to find a missing parameter explaining the repetition, and extract the relevant data into a separate table with where the repetitive columns become lines (e.g. 12 lines instead with a month number, instead of 12 columns that corresepond each to a specific month).
- for independent data, the trick is to map the columns to independent things (e.g. product and product name belong together, product and monthly result also become together, but product name is not directly related to monthly results, so keep it in a separate table).
Changing a database scheme is not easy. ANd you need to completely overhall the way the data is captured in the database. Carrying out the change can be done in steps according to the following genral approach (to be tailored for your needs):
- Create views for the independent things.
- Create a join on all these views to verify that you are able to find back the same data as in the big table.
- Create some more advanced views for all the views contain repetitive columns to show the data as lines (hint: requires often some SQL UNION).
- Create some queries to verify that you can assemble aggregates on these advanced views (e.g. total per product) with the rest of the data (e.g. product table), and check if you can find the same aggregates from the big table.
- If all these verifications are successful, you should set your DB offline, and copy the data of each view into its own table, for example using
CREATE TABLE AS SELECT
.
- Rename your big table, and once you are sure that everything is fine, and no query non longer rely on it, drop it.