3

The situation

Our system has a MySQL database with a product table whose parameters can't be modified.

In order to add optional extra parameters, a developer designed a key-value pair table which looks a bit like this:

+----+--------------------+----------+------------+
| Id |        key         |  value   | product_id |
+----+--------------------+----------+------------+
|  1 | provider_id        | 3        |          2 |
|  2 | appliable_discount | 20%      |          2 |
|  3 | pay_mode           | pre-pay  |          2 |
|  4 | pay_mode           | post-pay |          3 |
|  5 | appliable_discount | 15%      |          3 |
|  6 | provider_id        | 4        |          4 |
|  7 | provider_id        | 3        |          5 |
|  8 | expires            | 1        |          2 |
|  9 | expires            | 0        |          4 |
| 10 | color              | red      |          2 |
| 11 | color              | blue     |          4 |
| 12 | foo                | 1        |          2 |
+----+--------------------+----------+------------+
  • id: auto-incremental id.
  • key: the name of the key (string).
  • value: the value for that key (string).
  • product_id: Implicit Foreign Key to the Product.

[key, product_id] pairs are unique. There can't be 2 different values for the same key for the same product.

Changes on the value columns are rare. Once set, they usually stay the same.

Problem

In one part of our application there is a query to extract all products and a subset of the extra parameters that look like this:

SELECT p.*, sp.value AS hours, cp.value AS appliable_discount, cp2.value AS pay_mode, cp3.value AS provider_id, cp4.value AS expires, cp5.value AS foo, cp6.value AS bar, cp7.value AS etc
FROM products AS p
  LEFT JOIN product_extra_parameters AS cp ON cp.product_id = p.id AND cp.key = 'appliable_discount'
  LEFT JOIN product_extra_parameters AS cp2 ON cp2.product_id = p.id AND cp2.key = 'pay_mode'
  LEFT JOIN product_extra_parameters AS cp3 ON cp3.product_id = p.id AND cp3.key = 'provider_id'
  LEFT JOIN product_extra_parameters AS cp4 ON cp4.product_id = p.id AND cp4.key = 'expires'
  LEFT JOIN product_extra_parameters AS cp5 ON cp5.product_id = p.id AND cp5.key = 'foo'
  LEFT JOIN product_extra_parameters AS cp6 ON cp6.product_id = p.id AND cp6.key = 'bar'
  LEFT JOIN product_extra_parameters AS cp6 ON cp7.product_id = p.id AND cp7.key = 'etc'
WHERE p.id > 1

As you may imagine, performance of this query is really low and we are trying to improve it. We suspect the abuse of left join is the main issue.

Note: To the effect of this question, ignore the fact that it's fetching all products.

What I am looking for

Short term: If it exists, a new strategy to query this table, to get the same information with better performance.

Long term: A better strategy to store this information in a relational database, or to cache that information for better performance.

Since 24 hours old information is acceptable, a solution on the lines of "a cron that updates a table with key: [product_id], name: [json with all extra_parameters]" would be acceptable.

An answer doesn't have to provide both a short term and a long term solution to be accepted.


EDIT: For the long term solution, I've found some valuable insight in this stackoverflow question: https://stackoverflow.com/questions/2909448/best-pattern-for-storing-product-attributes-in-sql-server

xDaizu
  • 247
  • 2
  • 8
  • What datsbase platform are you using? Have you tried a pivot query? – Sir Swears-a-lot Jun 26 '18 at 10:00
  • @Peter It's over MySQL, I'll add it to the question. Also, no, I have not tried a pivot query. Could you add that as an answer? – xDaizu Jun 26 '18 at 11:42
  • 2
    Why are you joining on the same table 7 times!? The only difference in the conditions for the JOINs is the key ... – Greg Burghardt Jun 26 '18 at 11:54
  • This is probably a better fit for StackOverflow. – Greg Burghardt Jun 26 '18 at 11:56
  • 1
    @GregBurghardt **I'm** not joining 7 times. Someone else did! It's an abomination I've found in the codebase. I have a couple ideas of how to do it better, but since databases is not my area of expertise I came to ask what would be the ideal best way of doing that query, in order to learn, and for other to learn as well. – xDaizu Jun 26 '18 at 13:54
  • @GregBurghardt Would it? I considered both SO and CodeReview.SE and I decided on this site at the end. If the community agrees that it's a best fit for SO, I don't mind moving it. – xDaizu Jun 26 '18 at 13:56
  • Basically, the reason for the joining is to be able to gather all the data for a given product in one line. If you don't mind dropping that and having some extra stuff to do on the backend, you could have one JOIN only by chaining the key with multiple OR condition (or IN('...','...')). – Walfrat Oct 25 '18 at 06:53

4 Answers4

1

How about just getting all parameters and do any filtering logic on the application side. It would cut joins to a single one in the given query, but you'll have to handle the name-value assignment in the application instead of the query (this is an upside actually, it's faster and easier).

The attribute table idea is pretty worn out (but that doesn't mean every generation won't reinvent it), but implementing logic based on those in the database queries is just a really really bad idea.

Kayaman
  • 1,930
  • 12
  • 16
  • `The attribute table idea is pretty worn out` I know. If I had a nickel for every time I've seen this used... it starts looking simple enough, but it doesn't scale well, because it ends up being about MxN, where N es the number of products and M is the number of differents keys devs can (and will) come up with. – xDaizu Jun 26 '18 at 14:00
  • It's worth considering that others may need to be able read/report/query those values directly from the database in the future. And so it may be worth thinking beyond the immediate problem, and who or how this might be used in the future. – Sir Swears-a-lot Jun 27 '18 at 03:17
  • 3
    If the filtering is slow in a database server, it's going to be even slower in application code... Databases are highly optimised for this sort of work. – Maybe_Factor Jul 27 '18 at 04:37
  • @Maybe_Factor databases work in the relational model whereas application code doesn't. There are many other reasons besides performance where moving logic into application code can be useful. It can be simpler, easier to change when needed (imagine rewriting all the relevant queries because you need another join) etc. This particular use case is very poorly suited for databases, which is why it comes up so often. – Kayaman Aug 01 '18 at 05:34
1

This is a relatively a common problem with relational db's. It gets annoying adding columns for each parameter/value.

One solution is to use a Pivot Query. Unfortunatley pivot isn't possible in MySQL. But you can use a case statement.

Another option is to hold the key value pairs in XML or JSON. Most db's can index and query these. reasonably efficiently. However other users may find this dificult to work with if they need to query/report using these values.

0

As you may imagine, performance of this query is really low and we are trying to improve it. We suspect the abuse of left join is the main issue.

What indexing do you have on the "product_extra_parameters" table?
The proliferation of joins, left or otherwise, should not be a major issue unless this table is significant in size.

Phill W.
  • 11,891
  • 4
  • 21
  • 36
  • Since the table `products` is significant in size (N in the thousands), the `product_extra_parameters` is even more significant: close to MxN, where M are all the possible values the column "key" may take. It currently has indexes over the columns "key" and "product_id" – xDaizu Jun 26 '18 at 15:21
0

You could horizontally partition your product_extra_parameters table. i.e. product id 1 in one table, product id 2 in another (obviously some sort of rule is needed, not just arbitrary). This can be handled by either the DBMS or even triggers on the relevant core table.

A quick google search found the following regarding horizontal partitioning on MySQL: https://dev.mysql.com/doc/refman/8.0/en/partitioning-overview.html

Maybe_Factor
  • 1,381
  • 11
  • 12