Best Practice regarding table schema

Question

We are discussing in team regarding best practices.

For eg. there are three tables

Users
Groups
Teams

A user will create a request to join a Team or Group or to follow a Private User.

A request actions will be:

Create
Approve
Reject
Cancel

This process will be almost same for all kind of requests.

The question is: Is it a good approach to have all kind of requests in one table in database?

i.e.

+----+----------+--------+-----------------+---------+----------+----------+-----------+----------+-----------+-------------+
| id | uid_from | uid_to |      type       | created | accepted | rejected | cancelled |  status  | parent_id | parent_type |
+----+----------+--------+-----------------+---------+----------+----------+-----------+----------+-----------+-------------+
|  1 |        1 |      2 | GROUP_MEMBER    |         |          |          |           | ACTIVE   | group_id  | GOAL        |
|  2 |        1 |      2 | GROUP_MODERATOR |         |          |          |           | REJECTED | group_id  | GOAL        |
|  3 |        1 |      3 | PROFILE_FOLLOW  |         |          |          |           | ACTIVE   | -         | USER        |
|  4 |        1 |      2 | TEAM_MEMBER     |         |          |          |           | APPROVED | team_id   | TEAM        |
|  5 |        1 |      4 | GROUP_INVITE    |         |          |          |           | ACTIVE   | group_id  | GOAL        |
+----+----------+--------+-----------------+---------+----------+----------+-----------+----------+-----------+-------------+

Or User, Team, Group all should have separate tables for request handling?

Are you sure that the process for joining a Team or a Group or to follow a User will always be the same ? — Walfrat, Nov 17 '16 at 13:20
This question is not mysql-dependant. It applies to any RDBMS. So I'll delete the mysql tag. — Tulains Córdova, Nov 17 '16 at 15:39

score 6 · Answer 1 · edited Apr 13 '17 at 12:45

6

One of the drawbacks of having a single table is that you can't have foreign keys. In other words, you don't benefit from referential integrity.

This means that the IDs you specify there will eventually get out of sync: you may end up with entities referring to users, groups or teams which don't exist any longer, and in a case of bugs in the code, with entities referencing users, groups or teams which didn't exist in the first place. Needless to say that if you reuse the same IDs over time (something you should practically never do, by the way), you'll have a complete mess.

An elegant solution suggested by Walfrat is to use a parent table. In other words, you'll have a common table containing a single column Id set as an auto-incremented primary key, and the tables User, Team and Group will have an Id column which is at the same time a primary key and a foreign key pointing to the common table. Given the uniqueness of IDs in the common table, an ID used for a user won't be reused for a team or a group (however, at the schema level, nothing prevents the rogue Ids where the corresponding user, team or group were removed.)

Then, the table Action you described in your question will point to the common table. From there, you have a choice:

Either you specify the type of the referenced entity right in the Action table: it gives you good performance with fewer joins, but the data integrity is not guaranteed, since the type you specify in the Action table is not checked/enforced.
Or you do it at the level of the common table, which still has the same problem as the previous solution.
Or you do a additional selects to determine the type of the entity, which ensures you always get the right data, but would negatively impact the performance and make your code overly-complex.

edited Apr 13 '17 at 12:45

Community

1

answered Nov 17 '16 at 13:18

Arseni Mourzenko

134,780
31
343
513

Unless the tables User, Team, Group have a common "parent" table which their PK come from, inheritancy in database basically. But yes that's definitively a good point. – Walfrat Nov 17 '16 at 13:20
@Walfrat no the parent tables are different – Faizan Hasan Nov 17 '16 at 13:23
@Walfrat: this isn't something supported by commonly used RDMS such as Microsoft SQL Server or PostgreSQL, is it? Also, how would one know that the FK points to the user 123, and not a team or a group with the same ID? – Arseni Mourzenko Nov 17 '16 at 13:25
@ArseniMourzenko this is exactly how inheritancy works with ORM. Child tables have a primary key which is also a foreign key which point to the parent table. Which mean when you need to create a new object, you need first to create the entry in the parent table, keep the id given (either from last_insert_id() or using a sequence and keep the value) and put it in the child table. This mean ther eis no duplicate id between child tables. – Walfrat Nov 17 '16 at 13:28
@Walfrat: oh, I see now what you were talking about. – Arseni Mourzenko Nov 17 '16 at 13:29
THe only problem with that approach, is that you get more complicated query if you want to query from the parent. For instance Hibernate using case switch and check which id from the join is not null to affect an identifier to identify the subclass. But if you only query from child table and only use the parent table only for reference, you're fine. – Walfrat Nov 17 '16 at 13:39
Your answer is more about validation of data *(which is the part of code)* rather than the database design. Why IDs will be out of sync? Only if developer hasn't handled every thing properly. But combining will be a better approach I guess because data fetching will much more easy (off course proper indexing required) and in case of a new entity you won't need a brand new table. – Shaharyar Nov 17 '16 at 13:40
@Walfrat: would you mind writing your own answer? Right now, I edited mine based on your comments, but I feel like stealing your idea. – Arseni Mourzenko Nov 17 '16 at 15:31
I'll try, I will focus on the application side as an addendum to the database side. This solution can only be optimal if the design of the application that use it match to it, which is not simple as it seems. OP talk about best practices, but the best practices applied on a database only can be suboptimal if the application layer and the frameworks involved (if there is one) can't handle it properly. – Walfrat Nov 17 '16 at 15:39
@Shaharyar: so your approach is to ignore referential integrity. This might be a valid choice in some cases, but it's usually not the case. Types, foreign keys, unique constraints, other constraints stored procedures with validation phase, etc. are here for a reason, because in most cases, we don't want random apps to write random data to the database. So no, validation of data is not exclusively done in code. – Arseni Mourzenko Nov 17 '16 at 15:44
Furthermore it's only declarative SQL line in the database instead of checking manually in each application that use the said database. – Walfrat Nov 17 '16 at 17:54

score 0 · Answer 2 · answered Nov 17 '16 at 13:02

0

It looks like all your requests basically have the same attributes and generally that's a good reason to keep them all in the same table.

Good indexing can mitigate the effects of the table becoming overly large, but since there are only three types of request the size of the combined table wouldn't be that much more than the size of the split tables.

answered Nov 17 '16 at 13:02

mendosi

101
2

Should we separate tables for request and invite or combined ones are fine? – Faizan Hasan Nov 17 '16 at 13:14
Hard to say, what is an invite? Maybe modify your question with additional info. – mendosi Nov 17 '16 at 13:16
Invite is similar to request, I forgot to add it. Anyone can invite a user to be a part of `Group` or `Team`. – Faizan Hasan Nov 17 '16 at 13:23
@FaizanHasan invites are going to be a far more ephemeral kind of data than group membership IMO. There might be benefits to keeping them in the same table, but it's likely worth separating them. YMMV. – RubberDuck Nov 18 '16 at 11:24

Walfrat · Answer 3 · 2016-11-17T22:42:15.690

To answer to your question first before developping : there is no best practices, just what fit the best for your database and the application that will use it. I will present here some solutions more elaborated with their pros and cons.

As I suggested in the comment you can handle foreign key to multiple table to make them stand in one table which the other will depend.

I will analyze how does it works.

This would mean that table will be a parent table of User, Team, Group.

This give the following SQL (Postgres valid) :

CREATE SEQUENCE parent_id_seq;
CREATE TABLE parent(
    id integer primary key default nextval('parent_id_seq')
);


CREATE TABLE "user"(
    id integer primary key NOT NULL,
    firstName character varying NOT NULL
);


CREATE TABLE "group"(
   id integer primary key NOT NULL, 
   "name" character varying NOT NULL
);

CREATE TABLE team(
   id integer primary key NOT NULL, 
   "name" character varying NOT NULL
);
// Relation to the parent table here
ALTER TABLE "user" ADD CONSTRAINT FK_USER FOREIGN KEY(id) REFERENCES parent(id);
ALTER TABLE "group" ADD CONSTRAINT FK_USER FOREIGN KEY(id) REFERENCES parent(id);
ALTER TABLE team ADD CONSTRAINT FK_USER FOREIGN KEY(id) REFERENCES parent(id);
CREATE SEQUENCE request_id_seq;
CREATE TABLE request(
  id integer primary key default nextval('request_id_seq'),  
  uid_from integer not null,  
  uid_to integer not null, 
  "type" character varying, 
  status character varying,
  request_type character varying  
);
ALTER TABLE request ADD CONSTRAINT FK_REQUEST_FROM FOREIGN KEY(uid_from) REFERENCES "user"(id);
ALTER TABLE request ADD CONSTRAINT FK_REQUEST_TO FOREIGN KEY(uid_to) REFERENCES parent(id);

I removed created/approved/rejected/cancelled, since you have a status column i don't any point of having those. The status should be enough. If you're a bit extrem about validation, you can use a trigger to check that the state follow the steps for the process.

I changed the type for request_type since i will use the type column as a discriminator to differentiate the type of request. Note that if you're still want to be extrem about validation, you can add check constraint to ensure that the request_type match to the type.

Pure Database

I will focus for this part as we want everything to stand as SQL statements.

As you can see, our child tables depends on our parent, but what makes impossible to have two child with the same id, well nothing, unless you check it with a trigger. Since it's fairly simple I let you figures out the SQL for that. Note that for table like "user" such trigger aren't heavy.

To insert you need first to use the parent table and store the id given by the sequence to reuse it in the user... but how do we do that in raw SQL ?

Well you can't not in raw SQL without any application stuff. You need to use some SQL specific to the RDBMS like this :

CREATE OR REPLACE FUNCTION user_add(varchar) RETURNS void AS
'DECLARE
v_nextSeq int ;
BEGIN
select nextval(''parent_id_seq'') INTO v_nextSeq ;

INSERT INTO parent(id) VALUES(v_nextSeq);
INSERT INTO "user" (id, firstname) VALUES(v_nextSeq, $1);
END;'
LANGUAGE plpgsql;

Which means you need to be sure that your database user will have access to those functions too.

Now : how do we query our super request table ?

Let's suppose our applications have no general views off the requests. We have one view for groups requests and so on.

This mean we can use pretty simple SQL to handle this :

select * from request r inner join "user" u on u.id=r.uid_from
inner join parent p on p.id = u.id inner join "group" g on g.id=uid_to
inner join parent p2 on p2.id = g.id
where r."type"='GROUP';

Note : the join on parent is not necessary here as it only hold the id, it would if hold some shared fields.

But now what if we need a general view with some informations from the tables users/group/team :

We add the column in the parent table if we consider it common. Since it's easy I pass on that point.
We use this kind of query :

select * from request r left join "user" u on u.id=r.uid_from inner join parent p on p.id = r.uid_to left join "user" u2 on u2.id=p.id left join "group" g on g.id=p.id left join "team" t on t.id=p.id;

But wait, what are thos horrible empty column ? can't we do better than this ? Of course we can but it's getting trickier :

   select r.id, r.type, r.status, r.request_type, 
      CASE 
         WHEN r.type='GROUP' then g.name
         WHEN r.type='USER' then u.firstName
         WHEN r.type='TEAM' then t.name 
       END as target_identifier 
   from request r 
   left join "user" u on u.id=r.uid_from
   inner join parent p on p.id = r.uid_to
   left join "user" u2 on u2.id=p.id
   left join "group" g on g.id=p.id
   left join "team" t on t.id=p.id;

Note : this can seems to be a lot of join, the join between parent and subtables are oneToOne, so they're still very fast you don't need to consider them.

Of course you can just request each table individually to get each relation, this make more query for the database to handle, but the query are simpler to write.

ORM Based solution

Writing SQL queries can be really a pain, but without a little effort of doing things well. But we don't write query for the fun of manipulating data, we write them to be used by applications. So let's see if we can take advantages of application framework that handle database for us aka ... ORM!!

I will present here a solution with Java/JPA, but I'm pretty sure this can be used with other languages/ORM.

JPA

First let initialize our model (I removed all getter/setter but don't forget them if you reproduce) :

@Inheritance(strategy=InheritanceType.JOINED)
@Entity(name="Parent")
@Table(name="parent")
public abstract class Parent {

    @Id
    @GeneratedValue(generator="parent_seq_id")
    @SequenceGenerator(name="parent_seq_id",sequenceName="parent_seq_id", allocationSize=1)
    @Column(name="id")
    private Integer id;
}

@Entity(name="User")
@Table(name="\"user\"")
public class User extends Parent{

    @Column(name="firstname")
    private String firstName;

} 
.. same for team and group

@Entity(name="Request")
@Table(name="Request")
@Inheritance(strategy=InheritanceType.SINGLE_TABLE)
@DiscriminatorColumn(name="type", discriminatorType=DiscriminatorType.STRING)
public abstract class Request {
    @Id
    @GeneratedValue(generator="parent_seq_id")
    @SequenceGenerator(name="parent_seq_id",sequenceName="parent_seq_id", allocationSize=1)
    @Column(name="id")
    private Integer id;
    @Column(name="status")
    @Enumerated(EnumType.STRING)
    private StatusEnum status;
    @Column(name="request_type")
    private String requestType;
    @JoinColumn(name="uid_from")
    @ManyToOne
    private User from;
    @JoinColumn(name="uid_to")
    @ManyToOne
    private Parent to;
}

@Entity(name="TeamRequest")
@DiscriminatorValue("TEAM")
public class TeamRequest extends Request{
}
... same for UserRequest and Group Request just change Discriminator value...

As you can see I choose specific type of request, of course this isn't required, it works perfectly fine with a single type of request, you just need to filter yourself things after. This is why I will analyze what if I want a specific type for each request

(And yes I was lazy with the enums)

Let's query all our request :

entityManager.createQuery("Select r from Request r", Request.class).getResultList()

SQL generated
    select
        parent0_.id as id1_1_0_,
        parent0_1_.name as name1_0_0_,
        parent0_2_.firstname as firstnam1_4_0_,
        parent0_3_.name as name1_3_0_,
        case 
            when parent0_1_.id is not null then 1 
            when parent0_2_.id is not null then 2 
            when parent0_3_.id is not null then 3 
            when parent0_.id is not null then 0 
        end as clazz_0_ 
    from
        parent parent0_ 
    left outer join
        "group" parent0_1_ 
            on parent0_.id=parent0_1_.id 
    left outer join
        "user" parent0_2_ 
            on parent0_.id=parent0_2_.id 
    left outer join
        team parent0_3_ 
            on parent0_.id=parent0_3_.id 
    where
        parent0_.id=?
[UserRequest [getId()=1, getStatus()=ACCEPTED, getRequestType()=USER_FOLLOW, getFrom()=User [firstName=user_1], getTo()=User [firstName=user_2]], GroupRequest [getId()=2, getStatus()=REJECTED, getRequestType()=GROUP_MEMBER, getFrom()=User [firstName=user_1], getTo()=Group [name=group_1]], TeamRequest [getId()=3, getStatus()=CANCELLED, getRequestType()=TEAM_MEMBER, getFrom()=User [firstName=user_1], getTo()=Team [name=team_2]]]

The SLQ generated is pretty clean, nothing to say.

The only point I cannot treat here with single table with JPA/Hibernate (but maybe some others ORM in others languages can do it) :

Use specific enum type according to the request type (TEAM_MEMBER for TEAM, ...)
Instead of using a field to of type Parent use the according type (genericity doesn't with this kind of configuration).

Note that in JPA I can treat those with a

 @MappedSuperclass Request<P extends Person, E extends Enum>
 //override those in inherited classes and add annotation of hibernate on the field or the getter of the subclass
 // remove the fields To and requestType from request they will be declared on the subclass
 abstract P getTo();//and setter if you want 
 abstract E getRequestType();// and setter if you want

Which will require to have each type of request to have its own table.

So basically the best with JPA is to have :

If one table for request : one type of Request, just use filters however you will have to apply yourself (either in the application or in the database) constraint like "I can only aim a Group for a Group request".
If three table for request : you can have a generic declaration of the Request type that centralize quite a lot. You can restrit the status stype according to the type of request and the type of the to fields without anything more than configuration (= no additional logic to implement).

Sql server would allow you to create filtered unique index which would prevent duplicates. Also, you could create a view for each type. — jmoreno, Oct 18 '21 at 00:14
Yes unique index does also exists in postgres btw. Creating view for each type will also work, however it will means we can only map for reading, we will need a different mapping to write data in order to write on the real tables. So that's some extra works, I'm no expert to determinate if it really worth or not. — Walfrat, Oct 20 '21 at 08:59

Best Practice regarding table schema

3 Answers3

Linked