Is it ever okay to use lists in a relational database?

Question

I've been trying to design a database to go with a project concept and ran into what seems like a hotly debated issue. I've read a few articles and some Stack Overflow answers that state it's never (or almost never) okay to store a list of IDs or the like in a field -- all data should be relational, etc.

The problem I'm running into, though, is that I'm trying to make a task assigner. People will create tasks, assign them to multiple people, and it will save to the database.

Of course, if I save these tasks individually in "Person", I'll have to have dozens of dummy "TaskID" columns and micro-manage them because there can be 0 to 100 tasks assigned to one person, say.

Then again, if I save the tasks in a "Tasks" table, I'll have to have dozens of dummy "PersonID" columns and micro-manage them -- same problem as before.

For a problem like this, is it okay to save a list of IDs taking one form or another or am I just not thinking of another way this is achievable without breaking principles?

I realize this is tagged "relational database" so I'll just leave it as a comment not an answer, but in other types of databases it *does* make sense to store lists. Cassandra comes to mind since it has no joins. — Captain Man, Nov 14 '18 at 15:17
Good job in researching and then asking here! Indeed, the 'recommendation' to never violate the 1st normal form did really well for you, because you really should come up with another, relational approach, namely a "many-to-many" relation, for which there is a standard pattern in relational databases which should be used. — JimmyB, Nov 15 '18 at 13:59
"Is it ever okay" yes.... whatever follows, the answer is yes. As long as you have a valid reason. There's always a use case that compels you to violate best practices because it makes sense to do so. (In your case, though, you definitely shouldn't) — xyious, Nov 15 '18 at 17:26
Totally agree with xyious on this one. The one time I've stored a delimited list of ints in a database table was when I didn't need to do anything with this list inside the database - only to store and retrieve it - and therefor it made sense to store it as a list. But that's a single incident over a professional programming career that spans over two decades, so you can understand it's a rare case. — Zohar Peled, Nov 15 '18 at 19:04
I'm currently using an array (_not_ a delimited string -- a `VARCHAR ARRAY`) to store a list of tags. That's probably not how they'll end up being stored later down the line, but lists can be extremely useful during the prototyping stages, when you have nothing else to point to and don't want to build out the entire database schema before you can do anything else. — Nic, Nov 15 '18 at 20:25
Most dbms now can store Json and XML. If the quantity is small, and you don't need to relate the contents to other tables, there is no harm in throwing it into an XML or Json column. You can even still write queries against it (though they won't be indexable). That said, in your case you *do* need to relate the contents to other tables, so as @GrandmasterB and whatsisname say, tables are the way to go. — Ben, Nov 15 '18 at 20:42
If you do this then at some point, somewhere down the line, you'll regret it — Caius Jard, Nov 15 '18 at 20:51
@Ben "*(though they won't be indexable)*" -- in Postgres, several queries against JSON columns (and probably XML, though I haven't checked) _are_ indexable. — Nic, Nov 15 '18 at 21:27
Anything that requires string manipulation to do actual work is making it harder to fully utilize the database. Some databases can do some string manipulation but then you suddenly have vendor lock-in. — Thorbjørn Ravn Andersen, Nov 17 '18 at 15:08
@ThorbjørnRavnAndersen But to be fair, vendor lock-in is inevitable unless you do everything in your application code. Data types, case sensitivity, indexing options, function availability (such as date manipulation), etc. are all dependent on the database you use. Database portability is incredibly difficult, doubly so if you actually want to use the database's capabilities. — mu is too short, Nov 17 '18 at 21:17
Your title is "Is it ever OK", but in fact the body only asks whether it's OK in this one specific case. I'd say one of those should be edited. I'm also a bit confused by how your searches through articles and SO didn't throw up any trails of breadcrumbs to many-to-many relationships and how to implement them, especially as this is a fundamental of relational databases and normalisation. — underscore_d, Nov 17 '18 at 22:24
@muistooshort Depends on your needs. Of course you need vendor specific configuration to get up and running, but then you do normal SQL queries from there (perhaps using Hibernate to abstract away the dialect). I am talking about having to need to use a vendor specific database capability to get any work done at all. Selecting a string, splitting it in id's and sending it back to look up items must be slower (given proper indexes) than telling the database to look up id's in a detail table as part of the original query. — Thorbjørn Ravn Andersen, Nov 18 '18 at 12:05

score 254 · Accepted Answer · edited Nov 19 '18 at 14:35

254

The key word and key concept you need to investigate is database normalization.

What you would do, is rather than adding info about the assignments to the person or tasks tables, is you add a new table with that assignment info, with relevant relationships.

Example, you have the following tables:

Persons:

+−−−−+−−−−−−−−−−−+
| ID |    Name   |
+====+===========+
|  1 |  Alfred   |
|  2 |  Jebediah |
|  3 |  Jacob    |
|  4 |  Ezekiel  |
+−−−−+−−−−−−−−−−−+

Tasks:

+−−−−+−−−−−−−−−−−−−−−−−−−−+
| ID |        Name        |
+====+====================+
|  1 |  Feed the Chickens |
|  2 |  Plow              |
|  3 |  Milking Cows      |
|  4 |  Raise a barn      |
+−−−−+−−−−−−−−−−−−−−−−−−−−+

You would then create a third table with Assignments. This table would model the relationship between the people and the tasks:

+−−−−+−−−−−−−−−−−+−−−−−−−−−+
| ID |  PersonId |  TaskId |
+====+===========+=========+
|  1 |         1 |       3 |
|  2 |         3 |       2 |
|  3 |         2 |       1 |
|  4 |         1 |       4 |
+−−−−+−−−−−−−−−−−+−−−−−−−−−+

We would then have a Foreign Key constraint, such that the database will enforce that the PersonId and TaskIds have to be valid IDs for those foreign items. For the first row, we can see PersonId is 1, so Alfred, is assigned to TaskId 3, Milking cows.

What you should be able to see here is that you could have as few or as many assignments per task or per person as you want. In this example, Ezekiel isn't assigned any tasks, and Alfred is assigned 2. If you have one task with 100 people, doing SELECT PersonId from Assignments WHERE TaskId=<whatever>; will yield 100 rows, with a variety of different Persons assigned. You can WHERE on the PersonId to find all of the tasks assigned to that person.

If you want to return queries replacing the Ids with the Names and the tasks, then you get to learn how to JOIN tables.

edited Nov 19 '18 at 14:35

T.J. Crowder

981
6
8

answered Nov 14 '18 at 04:47

whatsisname

27,463
14
73
93

89

The keyword you want to search to learn more is "[many-to-many relationship](https://en.wikipedia.org/wiki/Many-to-many_(data_model))" – BlueRaja - Danny Pflughoeft Nov 14 '18 at 09:38
4

And now that this table is there linking Person to Tasks, you will be able to add really easily new concepts : audit : when was it added by who (with a date column and a foreign key to a user table), perhaps a status (is the assignement still valid (and until when ?)), etc... Pure join tables are fairly rare, because they tend to become an entity on their own given enough time. – Thierry Nov 14 '18 at 09:40
35

To elaborate a little on Thierrys comment: You may think that you do not need to normalize *because I only need X and it is very simple to store the ID list*, but for any system that may get extended later you will regret not having normalized it earlier. **Always normalize**; the only question is to what [normal form](https://en.wikipedia.org/wiki/Database_normalization#Normal_forms) – Jan Doggen Nov 14 '18 at 10:18
2

Ah yeah, *normalization* is a word I learned in my database class in university, which never made any sense to me when I learned it, but it’s something I deal with practically every day in my web app development. I should read up on normal forms again, I think it would be useful to have the theory again and not just the practical knowledge. – Chris Cirefice Nov 14 '18 at 10:47
9

Agreed with @Jan - against my better judgement I permitted my team to take a design shortcut a while back, storing JSON instead for something that "won't need to be extended". That lasted like six months FML. Our upgrader then had a nasty fight on its hands to migrate the JSON to the scheme we should have started with. I really should have known better. – Lightness Races in Orbit Nov 14 '18 at 11:57
@JanDoggen - thanks for posting that link: I'd completely forgotten how old I am and seeing evidence that there was only third form when I was learning this stuff has just made me want to retire! :-D – Spratty Nov 14 '18 at 15:40
2

Just a quick point: if we're really talking about a list (i.e. ordered), there would also need to be an extra column in the join table to rank the items. – Bruno Nov 14 '18 at 17:51
6

The ID-column in the Assignment-table looks very artificial, and superfluous. – Deduplicator Nov 14 '18 at 20:00
13

@Deduplicator: it's just a representation of a garden-variety, auto-increment integer primary key column. Pretty typical stuff. – whatsisname Nov 14 '18 at 20:21
9

@whatsisname On the Persons or Tasks table, I'd agree with you. On a bridge table where the sole purpose is to represent the many-to-many relationship between two other tables that already have surrogate keys? I wouldn't add one without a good reason. It's just overhead as it will never be used in queries or relationships. – jpmc26 Nov 14 '18 at 22:00
6

@jpmc26 I would probably use a composite primary key for the association table too. – RubberDuck Nov 14 '18 at 23:09
5

@jpmc26: I think you two are overthinking this. It's a simple example. – whatsisname Nov 15 '18 at 00:02
7

@whatsisname regardless of whether you choose to keep your auto-incremented ID column (there are often good reasons for having such an immutable single integer primary key from an application perspective), you should still preserve a key constraint on Assignment(PersonID, TaksID) through use of UNIQUE. Not doing so just leads to later pain. – beldaz Nov 15 '18 at 00:41
8

@beldaz: I probably would write CREATE TABLE statements instead of typing the tables out with ASCII art too. This is a **simple example** to illustrate a basic concept, not a comprehensive treatise of how to write a commercial database driven application. – whatsisname Nov 15 '18 at 02:46
5

@whatsisname then get rid of your extra column since it makes it less simple. But there's little point normalising a schema if you don't enforce your keys. – beldaz Nov 15 '18 at 05:21
1

@LightnessRacesinOrbit the fact that you used JSON would've made that conversion the perfect opportunity to look at NoSQL databases... any chance someone on your team considered that route instead? – Nelson Nov 15 '18 at 06:02
6

@Nelson It wouldn't have been appropriate to fundamentally restructure our entire database for the sake of a single field. Relational data suggests a relational database. NoSQL is a huge paradigm shift, more so than many people seem to think, and is not a drop-in replacement in any case, certainly not when it's only a response to lazily dumping a JSON array in a column rather than normalising it :P – Lightness Races in Orbit Nov 15 '18 at 10:49
@Nelson In fact the specific choice of JSON was merely practical due to the ready availability of a decoding library; a comma-delimited list of escaped values would probably have been more appropriate (insomuch as putting _any_ flat list in an SQL column is ever appropriate) – Lightness Races in Orbit Nov 15 '18 at 10:50
@Nelson (It was a list of element<->group allocations - we found ourselves needing to transform group names when we relaxed the encoding restrictions for element group names .. and in any case the database couldn't do any correlation/lookup for us because our version of MySQL didn't support JSON natively - overall a poor decision for many reasons! IIRC the original rationale that was [successfully] sold to me was simplifying the SQL and reducing number of queries required) – Lightness Races in Orbit Nov 15 '18 at 10:51
2

Generally better to think of an "assignment" as an object in its own right rather than an awkward necessity created by a shortcoming of the RDBMS. The real-life object it represents is "That time Alfred assigned Jeb to feed the chickens" `TaskPersonAssignment(TaskId, PersonAssignedToId, PersonWhoAssignedItId, DateItWasAssigned, Comment)` = `(1, 2, 1, '2018-11-15', 'Hey Jeb, feed the chickens, will ya?' )` – Ben Nov 15 '18 at 20:48
That's weird, you missed one : Alfred is also harvesting the grain ;) – njzk2 Nov 16 '18 at 19:29
3

@JanDoggen Actually, that's not true. You should normalize when you're optimizing for edits. Warehousing/reporting databases are typically optimized for reads because they're usually append only or are reloaded in bulk, and they often aren't organized according to any of the normal forms. Duplication is often desirable. That's what "OLTP" vs. "OLAP" is about. Some years ago, I made the mistake of normalizing the DB for an application primarily designed for reporting data from other sources and still regret it completely. – jpmc26 Nov 16 '18 at 22:36
@jpmc26 Good point, but the OP's question was not about a read-only DB. – Jan Doggen Nov 17 '18 at 17:01
2

@JanDoggen No, but it's still very important to be aware that normalization is *not* an absolute and that in many situations it's actively detrimental. My concern is that your comment will mislead readers into thinking that's not the case. The fact your comment got so many upvotes suggests quite a few developers are unaware of it. – jpmc26 Nov 17 '18 at 20:08

score 36 · Answer 2 · answered Nov 14 '18 at 04:48

36

You're asking two questions here.

First, you ask if its ok to store lists serialized in a column. Yes, its fine. If your project calls for it. An example might be product ingredients for a catalog page, where you have no desire to try to track each ingredient individually.

Unfortunately your second question describes a scenario where you should opt for a more relational approach. You'll need 3 tables. One for the people, one for the tasks, and one that maintains the list of which task is assigned to which people. That last one would be vertical, one row per person/task combination, with columns for your primary key, task id, and person id.

answered Nov 14 '18 at 04:48

GrandmasterB

37,990
7
78
131

9

The ingredient example you reference is correct on the surface; but it would be plaintext in that case. It is not a list in the programming sense (unless you mean that the string is a list of characters which you obviously don't). OP describing their data as "a list of IDs" (or even just "a list of [..]") implies that they are at some point handling this data as individual objects. – Flater Nov 14 '18 at 11:11
1

@Flater the example is correct, period. Its a list that a developer may not want to store in a relational manner. Suggestion: rather than nitpicking people's answers, add your own answer with better examples. – GrandmasterB Nov 14 '18 at 17:55
10

@Flater: But it is a list. You need to be able to reformat it as (variously) an HTML list, a Markdown list, a JSON list, etc. in order to ensure the items are displayed properly in (variously) a web page, a plain text document, a mobile app... and you can't really do that with plain text. – Kevin Nov 14 '18 at 18:48
13

@Kevin If that is your goal, then it is much more readily and easily achieved by storing the ingredients in a table! Not to mention if, later, people would ... oh, I don't know, say, wish for *recommended substitutes*, or something silly like look for *all recipes without* any peanuts, or gluten, or animal proteins... – Dan Bron Nov 14 '18 at 20:49
10

@DanBron: YAGNI. Right now we're only using a list because it makes the UI logic easier. If we need or will need list-like behavior in the business logic layer, *then* it should be normalized into a separate table. Tables and joins are not necessarily expensive, but they're not free, and they bring in questions about element order ("Do we care about the order of ingredients?") and further normalization ("Are you going to turn '3 eggs' into ('eggs', 3)? What about 'Salt, to taste', is that ('salt', NULL)?"). – Kevin Nov 14 '18 at 20:54
1

@Kevin: If you expect this data to be reshaped in different types of lists they you are in fact, as per my comment, **at some point handling this data as individual objects**. Therefore, you should store the individual object. The data in a given column should never be a composite of multiple individual pieces of information, as this laziness (to create fewer database columns) leads to the additional cost of having to write and depend on a parser to translate the data to the individual components you need. – Flater Nov 15 '18 at 06:42
7

@Kevin: YAGNI is quite wrong here. You yourself argued the necessity of being able to transform the list in many ways (HTML, markdown, JSON) and thus are arguing that **you need the individual elements of the list**. Unless the data storage and "list handling" applications are two applications that are developed independently (and do note that separate application layers != separate applications), the database structure should always be created to store the data in a format that leaves it readily available - while avoiding additional parsing/conversion logic. – Flater Nov 15 '18 at 06:45
2

@Flater: If not YAGNI, then KISS, but actually both apply, as well as a common sense. Relational databases are _horrible_ at keeping arbitrarily ordered lists, so if you have this kind of list, but don't have a very specific need to keep stuff in a separate table (Kevin clearly doesn't), you just shouldn't do that. Also, using a list _is_ storing individual objects. Your argument comes from a religious/ideological reasoning, not practical one. – Frax Nov 16 '18 at 20:01
1

@Frax You're shifting the effort towards parsing/conversion at runtime, instead of normalizing the database at design time. Even ignoring simple good practice, the conversion will be a constant performance hit, to what benefit? Being able to avoid doing something just because you can technically get away with it? I agree with you if there were no beed to transform the data at any point, but Kevin specifically pointed out the need to transform it and thus the separation i to separate elements is well justified and necessary to not have to use additional conversions. – Flater Nov 17 '18 at 06:34
@Flater Man, this is wrong on so many levels. From performance perspective using a list, even JSON-based and not native one, is a _huge win_ in the given case. It's like the worst argument you could make. Making a join (either on DB side or in the app) means _time complexity_ hit, and probably even bigger constant penalty. And in the first place, your code gets more complex, and very brittle: as I said, implementing arbitrarily ordered lists on top of relational DB is a nightmare. – Frax Nov 17 '18 at 13:52
@Flater Even if there was need to access the list content in queries (which is a real issue that may happen), I would likely opt for denormalization and keeping data in both forms, just because of how much better and simpler a list/array is in any other part of this job. – Frax Nov 17 '18 at 14:02
@DanBron Actually, if you want to filter recipes by allergens, you really should keep the list of allergens separately, or have explicit boolean fields for that. Relying on some mapping between ingredients and allergens just isn't safe. I mean, at some point someone will add "medium eggs" instead of "eggs", or what not, and your filter will fail. But that's a good point in general, that any queries over the list content are to be avoided, and should never happen in frequent requests. – Frax Nov 17 '18 at 14:15
@Frax Advocating database denormalization is a patently absurd idea. Also, you're contradicting yourself about the allergens - splitting allergens from non-allergens but not splitting the sublists means that you need to reshuffle the lists when an allergen is added to or removed from the system. You're adding even more complexity and workload because you're refusing to normalize your database. – Flater Nov 17 '18 at 14:29
@Frax How are you going to handle your indexes if you want to search for recipes with a particular allergen? You're going to be stuck doing a contains search on the allergen list. How are you going to store quantities of the ingredients, all concatenated into the same list blob? There are a lot of bad consequences from trying to push it in one column amd then still expecting to run any sort of logic against the individual items on the list. – Flater Nov 17 '18 at 14:35
@Flater No, in this case I just used "list" in a more general sense. My focus was on keeping it a separate thing from the list of ingredients. Yes, alergens should be kept as a separate table, that's what meant. Sorry for imprecise wording. – Frax Nov 17 '18 at 14:49
@Frax Thus completely negating your earlier argument that avoiding joins is better than normalizing your database. Because then you're either going to be stuck having to join half of the data and convert the other half; or duplicate every allergen as both an allergen and an ingredient. Both are considerably worse options compared to normalizing your database. – Flater Nov 17 '18 at 15:15
Let us [continue this discussion in chat](https://chat.stackexchange.com/rooms/85882/discussion-between-frax-and-flater). – Frax Nov 17 '18 at 15:23

score 22 · Answer 3 · answered Nov 14 '18 at 04:46

22

What you're describing is known as a "many to many" relationship, in your case between Person and Task. It's typically implemented using a third table, sometimes called a "link" or "cross-reference" table. For example:

create table person (
    person_id integer primary key,
    ...
);

create table task (
    task_id integer primary key,
    ...
);

create table person_task_xref (
    person_id integer not null,
    task_id integer not null,
    primary key (person_id, task_id),
    foreign key (person_id) references person (person_id),
    foreign key (task_id) references task (task_id)
);

answered Nov 14 '18 at 04:46

Mike Partridge

6,587
1
25
39

2

You may also want to add an index with `task_id` first, if you might be doing queries filtered by task. – jpmc26 Nov 16 '18 at 22:40
1

Also know as a bridge table. Also, wish I could give you an extra plus for not having an identity column, although I would recommend an index on each column. – jmoreno Nov 18 '18 at 02:06

score 14 · Answer 4 · answered Nov 14 '18 at 12:02

... it's never (or almost never) okay to store a list of IDs or the like in a field

The only time you might store more than one data item in a single field is when that field is only ever used as a single entity and is never considered as being made up of those smaller elements. An example might be an image, stored in a BLOB field. It's made up of lots and lots of smaller elements (bytes) but these that mean nothing to the database and can only be used all together (and look pretty to an End User).

Since a "list" is, by definition, made up of smaller elements (items), this isn't the case here and you should normalise the data.

... if I save these tasks individually in "Person", I'll have to have dozens of dummy "TaskID" columns ...

No. You'll have a few rows in an Intersection Table (a.k.a. Weak Entity) between Person and Task. Databases are really good at working with lots of rows; they're actually pretty rubbish at working with lots of [repeated] columns.

Nice clear example given by whatsisname.

When creating real life systems "never say never" is a very good rule to live by. — l0b0, Nov 14 '18 at 21:31
In many cases, the per-element cost of maintaining or retrieving a list in normalized form may vastly exceed the cost of keeping the items as a blob, since each item of the list would have to hold the identity of the master item with which it is associated and its location within the list in addition to the actual data. Even in cases where code might benefit from being able to update some list elements without updating the entire list, it might be cheaper to store everything as a blob and rewrite everything whenever one has to rewrite anything. — supercat, Nov 16 '18 at 21:29

miroxlav · Answer 5 · 2018-11-15T07:52:00.713

It may be legitimate in certain pre-calculated fields.

If some of your queries are expensive and you decide to go with pre-calculated fields updated automatically using database triggers, then it may be legitimate to keep the lists inside a column.

For example, in the UI you want to show this list using grid view, where each row can open full details (with complete lists) after double-clicking:

REGISTERED USER LIST
+------------------+----------------------------------------------------+
|Name              |Top 3 most visited tags                             |
+==================+====================================================+
|Peter             |Design, Fitness, Gifts                              |
+------------------+----------------------------------------------------+
|Lucy              |Fashion, Gifts, Lifestyle                           |
+------------------+----------------------------------------------------+

You are keeping the second column updated by trigger when client visits new article or by scheduled task.

You can make such a field available even for searching (as normal text).

For such cases, keeping lists is legitimate. You just need to consider case of possibly exceeding maximum field length.

Also, if you are using Microsoft Access, offered multivalued fields are another special use case. They handle your lists in a field automatically.

But you can always fall back to standard normalized form shown in other answers.

Summary: Normal forms of database are theoretical model required for understanding important aspects of data modeling. But of course normalization does not take into account performance or other cost of retrieving the data. It is out of scope of that theoretical model. But storing lists or other pre-calculated (and controlled) duplicates is often required by practical implementation.

In the light of the above, in practical implementation, would we prefer query relying on perfect normal form and running 20 seconds or equivalent query relying on pre-calculated values which takes 0.08 s? No one likes their software product to be accused of slowness.

It can be legitimate even without precalculated stuff. I've done it a couple of times where the data is stored properly but for performance reasons it's useful to stuff a few cached results in the main records. — Loren Pechtel, Nov 15 '18 at 04:10
@LorenPechtel – Yes, thanks, in my use of term *pre-calculated* I also include cases of cached values stored where needed. In systems with complex dependencies, they are the way to keep the performance normal. And if programmed with adequate know-how, these values are reliable and always-in-sync. I just did not want to add case of caching into the answer to keep the answer simple and on safe side. It got downvoted anyway. :) — miroxlav, Nov 15 '18 at 09:52
@LorenPechtel Actually, that would still be a bad reason... cache data should be kept in an intermediate cache store, and while the cache is still valid, that query should never hit the main DB. — Tezra, Nov 16 '18 at 20:18
@Tezra No, I'm saying that sometimes a piece of data from a secondary table is needed often enough to make it make sense to put a copy in the main record. (Example that I have done--the employee table includes the last time in and the last time out. They are used only for display purposes, any actual calculation comes from the table with the clock-in/clock-out records.) — Loren Pechtel, Nov 17 '18 at 02:21

score 1 · Answer 6 · answered Nov 18 '18 at 06:27

If it is "not ok" then it is fairly bad that every Wordpress site ever has a list in wp_usermeta with wp_capabilities in one row, dismissed_wp_pointers list in one row, and others...

In fact in cases like this it might be better for speed as you will almost always want the list. But Wordpress is not known to be the perfect example of best practices.

score 0 · Answer 7 · answered Nov 14 '18 at 19:26

Given two tables; we'll call them Person and Task, each with it's own ID (PersonID, TaskID)... the basic idea is to create a third table to bind them together. We'll call this table PersonToTask. At the minimum it should have it's own ID, as well as the two others So when it comes to assigning someone to a task; you will no longer need to UPDATE the Person table, you just need to INSERT a new line into the PersonToTaskTable. And maintenance becomes easier- need to delete a task just becomes a DELETE based on TaskID, no more updating the Person table and it's associated parsing

CREATE TABLE dbo.PersonToTask (
    pttID INT IDENTITY(1,1) NOT NULL,
    PersonID INT NULL,
    TaskID   INT NULL
)

CREATE PROCEDURE dbo.Task_Assigned (@PersonID INT, @TaskID INT)
AS
BEGIN
    INSERT PersonToTask (PersonID, TaskID)
    VALUES (@PersonID, @TaskID)
END

CREATE PROCEDURE dbo.Task_Deleted (@TaskID INT)
AS
BEGIN
    DELETE PersonToTask  WHERE TaskID = @TaskID
    DELETE Task          WHERE TaskID = @TaskID
END

How about a simple report or who's all assigned to a task?

CREATE PROCEDURE dbo.Task_CurrentAssigned (@TaskID INT)
AS
BEGIN
    SELECT PersonName
    FROM   dbo.Person
    WHERE  PersonID IN (SELECT PersonID FROM dbo.PersonToTask WHERE TaskID = @TaskID)
END

You of course could do a lot more; a TimeReport could be done if you added DateTime fields for TaskAssigned and TaskCompleted. It's all up to you

score 0 · Answer 8 · answered Nov 14 '18 at 19:46

It may work if say you have human readable Primary keys and want a list of task #'s without having to deal with vertical nature of a table structure. i.e. much easier to read first table.

------------------------  
Employee Name | Task 
Jack          |  1,2,5
Jill          |  4,6,7
------------------------

------------------------  
Employee Name | Task 
Jack          |  1
Jack          |  2
Jack          |  5
Jill          |  4
Jill          |  6
Jill          |  7
------------------------

The question would then be: should the task list be stored or generated on demand, which largely would depend on requirements such as: how often the list are needed, how accurate how many data rows exist, how the data will be used, etc... after which analyzing the trade offs to user experience and meeting requirements should be done.

For example comparing the time it would take to recall the 2 rows vs running a query that would generate the 2 rows. If it takes long and the user does not need the most up to date list(*expecting less than 1 change per day) then it could be stored.

Or if the user needs a historical record of tasks assigned to them it would also make sense if the list was stored. So it really depends on what you are doing, never say never.

As you say, it all depends on how the data is to be retrieved. If you /only/ ever query this table by User Name, then the "list" field is perfectly adequate. However, how can you query such a table to find out who is working on Task #1234567 and still keep it performant? Just about every kind of "find-X-anywhere-in-the-field" String function will cause such a query to /Table Scan/, slowing things to a crawl. With properly normalised, properly indexed data, that just doesn't happen. — Phill W., Nov 15 '18 at 15:52

score 0 · Answer 9 · answered Nov 14 '18 at 23:06

You're taking what should be another table, turning it through 90 degrees and shoehorning it into another table.

It's like having an order table where you have itemProdcode1, itemQuantity1, itemPrice1 ... itemProdcode37, itemQuantity37, itemPrice37. Apart from being awkward to handle programmatically you can guarantee that tomorrow someone will want to order 38 things.

I'd only do it your way if the 'list' isn't really a list, i.e. where it stands as a whole and each individual line item doesn't refer to some clear and independent entity. In that case just stuff it all in some data type that's big enough.

So an order is a list, a Bill Of Materials is a list (or a list of lists, which would be even more of a nightmare to implement "sideways"). But a note/comment and a poem aren't.

Is it ever okay to use lists in a relational database?

9 Answers9

It may be legitimate in certain pre-calculated fields.