42

I am trying to understand how to properly store ordered information in a relational database.

An example:

Say I have a Playlist, consisting of Songs. Inside my Relational Database, I have a table of Playlists, containing some metadata (name, creator, etc). I also have a table called Songs, containing a the playlist_id, as well as song-specific info (name, artist, duration, etc).

By default, when a new Song is added to a Playlist, it is appended to the end. When ordering on Song-ID (ascending), the order will be the order of addition. But what if a user should be able to re-order songs in the playlist?

I came up with a couple of ideas, each with their advantages and disadvantages:

  1. A column called order, which is an integer. When a song is moved, the order of all songs between its old and new position are changed, to reflect the change. The drawback of this is that a lot of queries need to be done each time a song is moved, and the moving algorithm is not as trivial as with the other options.
  2. A column called order, which is a decimal (NUMERIC). When a song is moved, it is assigned the floating point value between the two adjacent numbers. Drawback: Decimal fields take more space, and it might be possible to run out of precision, unless care is taken to re-distribute the range after every few changes.
  3. Another way would be to have a previous and a next field that reference other Songs. (or are NULL in the case of the first, resp. last song in the playlist right now; Basically you create a linked-list). Drawback: Queries like 'find the Xth Song in the list' are no longer constant-time, but instead linear-time.

Which of these procedures is most often used in practice? Which of these procedures is fastest on medium to large databases? Are there any other ways to archieve this?

EDIT: For simplicities sake, in the example a Song only belongs to one Playlist (a many-to-one relationship). Of course, one could also use a Junction Table so song⟷playlist is a many-to-many relationship (and apply one of above strategies on that table).

Qqwy
  • 4,709
  • 4
  • 31
  • 45
  • 1
    You could use option one (order as Integer) with 100-steps. Then you need no re-order if you move one song, just take a value between the 100. From time to time you may need a new renumbering to get again gaps between the songs. – knut Dec 08 '15 at 22:02
  • 7
    "The drawback of this is that a lot of queries need to be done each time a song is moved" ?! - `update songorder set order = order - 1 where order >= 12 & order <= 42; update songorder set order = 42 where id = 123;` - that's two updates - not thirty. Three if you want to put a unique constraint on order. –  Dec 08 '15 at 22:03
  • 2
    Use option one unless you know for a fact you need something else. One problem programmers new to databases encounter is not understanding that databases are very, very good at this sort of thing. Don't be afraid to put your db to work. – GrandmasterB Dec 08 '15 at 22:09
  • @MichaelT: I stand corrected. – Qqwy Dec 08 '15 at 22:09
  • @MichaelT thank you for the syntax hint that you added to my answer. Unfortunately, by looking at the edit I was unable to figure out how you did it. Can you please explain to me how you did it, or point me to a document explaining how? – Mike Nakis Dec 08 '15 at 22:11
  • @MikeNakis Look at the side by side markdown in [the revision history](http://programmers.stackexchange.com/posts/304594/revisions) - you will see I added ``. The complete list of supported tags and some of the nuances that don't fit into a comment is on [MSE](http://meta.stackexchange.com/a/184109/213963). C# gets a 'funny' name as 'cs'. –  Dec 08 '15 at 22:15
  • 3
    `Queries like 'find the Xth Song in the list' are no longer constant-time` is also true for option 2. – Doc Brown Dec 08 '15 at 22:17
  • 1
    Guys, the only thing I have against the first approach is that although *it seems* like it is cheap, because it only takes a couple of queries, it performs an update which modifies every row in the table, so it is **an extremely expensive query**. – Mike Nakis Dec 08 '15 at 22:21
  • @DocBrown can it not be done in constant time (or near constant time) with SQL `LIMIT 1 OFFSET N`? – Mike Nakis Dec 08 '15 at 22:22
  • 2
    @MikeNakis: It seems expensive, but all the work is being done on the server, which is (usually) optimized for this kind of work. I wouldn't use this technique on a table with millions of rows, but I wouldn't discount it for a table with only a couple thousand. – TMN Dec 08 '15 at 22:25
  • @MikeNakis: ok, seems the OP should have written "Queries like 'find the Xth Song in the list' are no longer possible with a simple, single SELECT statement". Let's assume that is what he really meant ;-) – Doc Brown Dec 09 '15 at 01:20

3 Answers3

54

Databases are optimized for certain things. Updating lots of rows quickly is one of them. This becomes especially true when you let the database do its work.

Consider:

order song
1     Happy Birthday
2     Beat It
3     Never Gonna Give You Up
4     Safety Dance
5     Imperial March

And you want to move Beat It to the end, you would have two queries:

update table 
  set order = order - 1
  where order >= 2 and order <= 5;

update table
  set order = 5
  where song = 'Beat It'

And that's it. This scales up very well with very large numbers. Try putting a few thousand songs in a hypothetical playlist in your database and see how long it takes to move a song from one location to another. As these have very standardized forms:

update table 
  set order = order - 1
  where order >= ? and order <= ?;

update table
  set order = ?
  where song = ?

You have two prepared statements that you can reuse very efficiently.

This provides some significant advantages - the order of the table is something that you can reason about. The third song has an order of 3, always. The only way to guarantee this is to use consecutive integers as the order. Using pseudo-linked lists or decimal numbers or integers with gaps won't let you guarantee this property; in these cases the only way to get the nth song is to sort the entire table and get the nth record.

And really, this is a lot easier than you think it is. It is simple to figure out what you want to do, to generate the two update statements and for other people to look at those two update statements and realize what is being done.

Vedant Agarwala
  • 775
  • 1
  • 8
  • 16
  • 3
    I am beginning to like this approach. – Mike Nakis Dec 08 '15 at 22:29
  • 3
    @MikeNakis it works well. There's also a binary tree that is based on a similar idea - the [modified preorder tree](http://www.sitepoint.com/hierarchical-data-database-2/). It takes a little bit more to get your head around, but it lets you do some very nice queries for hierarchical data. I've never had performance problems with it, even in large trees. Being able to reason about the code is something I put great emphasis on until it is shown that the simple code lacks the performance needed (and that has only been in extreme situations). –  Dec 08 '15 at 22:34
  • Will there be any issues with using `order` since `order by` is a key word? – kojow7 Feb 20 '18 at 23:48
  • @kojow7, if your fields have names conflicting with keywords, you should wrap them in tickmarks " ` ". – Andri Nov 06 '18 at 20:11
  • This approach makes sense, but what is the best way to get the `order` value when adding a new song to a playlist. Say it's the 9th song, is there any better way to insert 9 into `order` than doing a `COUNT` prior to adding the record? – delashum Oct 15 '19 at 17:08
  • @MikeNakis sorry to ping you about this, but is it correct to assume that the method described here only works if moving a track from higher to lower on the list? I assume you have to adjust the first `update` query to `increment` the order between the two ranges (like so: `set order = order + 1`) if going from lower in the playlist to higher in the playlist? – Jordan Lewallen Dec 31 '19 at 08:53
  • 2
    @JordanLewallen I am not sure why you are asking me in particular, but I guess the OP is gone, so why not. Yes, it seems to me that you'd need to have a total of 3 queries defined, not 2 as the OP suggested. You would need a query with `set order = order + 1` when moving an item from a higher to a lower order. – Mike Nakis Dec 31 '19 at 11:33
  • @MikeNakis yea OP is gone and it appeared you had understood his method. Thanks for being my sanity check on this old question. I appreciate it :) – Jordan Lewallen Dec 31 '19 at 16:20
  • Should "order" be indexed here? which would mean the entire table would be rescanned for every update/insert on the table + extra memory overhead. Also, this solution entails time complexity of O(N). Perhaps storing a top-level JSON representation of the order of the list is more efficient. For updating order it's simply updating the json (one record -> O(1).) (Of course, in case of deleting an item, it also means deleting the record itself) – BinaryVeil Nov 11 '22 at 08:50
3

First of all, it is not clear from your description of what you have done, but you need a PlaylistSongs table which contains a PlaylistId and a SongId, describing which songs belong to which playlists.

It is in this table that you have to add ordering information.

My favorite mechanism is with real numbers. I implemented it recently, and it worked like a charm. When you want to move a song to a specific position, you calculate its new Ordering value as the average of the Ordering values of the previous song and the next song. If you use an 64-bit real number, you will run out of precision at about the same time that hell will freeze over, but if you are really writing your software for posterity, then consider reassigning nice rounded integer Ordering values to all songs in each playlist every once in a while.

As an added bonus, here is the code that I have written which implements this. Of course you cannot use it as it is, and it would be too much work for me right now to sanitize it for you, so I am only posting it for you to get ideas from it.

The class is ParameterTemplate (whatever, don't ask!) The method gets the list of parameter templates to which this template belongs from its parent ActivityTemplate. (Whatever, don't ask!) The code contains some guard against running out of precision. The divisor is used for testing: the unit test uses a large divisor so as to run out of precision quickly, and thus trigger the precision guarding code. The second method is public and "for internal use only; do not invoke" so that the testing code can invoke it. (It could not be package-private because my testing code is not in the same package as the code it tests.) The field which controls the ordering is called Ordering, accessed via getOrdering() and setOrdering(). You don't see any SQL because I am using Object-Relational Mapping via Hibernate.

/**
 * Moves this {@link ParameterTemplate} to the given index in the list of {@link ParameterTemplate}s of the parent {@link ActivityTemplate}.
 *
 * The index must be greater than or equal to zero, and less than or equal to the number of entries in the list.  Specifying an index of zero will move this item to the top of
 * the list. Specifying an index which is equal to the number of entries will move this item to the end of the list.  Any other index will move this item to the position
 * specified, also moving other items in the list as necessary. The given index cannot be equal to the current index of the item, nor can it be equal to the current index plus
 * one.  If the given index is below the current index of the item, then the item will be moved so that its new index will be equal to the given index.  If the given index is
 * above the current index, then the new index of the item will be the given index minus one.
 *
 * NOTE: this method flushes the persistor and refreshes the parent node so as to guarantee that the changes will be immediately visible in the list of {@link
 * ParameterTemplate}s of the parent {@link ActivityTemplate}.
 *
 * @param toIndex the desired new index of this {@link ParameterTemplate} in the list of {@link ParameterTemplate}s of the parent {@link ActivityTemplate}.
 */
public void moveAt( int toIndex )
{
    moveAt( toIndex, 2.0 );
}

/**
 * For internal use only; do not invoke.
 */
public boolean moveAt( int toIndex, double divisor )
{
    MutableList<ParameterTemplate<?>> parameterTemplates = getLogicDomain().getMutableCollections().newArrayList();
    parameterTemplates.addAll( getParentActivityTemplate().getParameterTemplates() );
    assert parameterTemplates.getLength() >= 1; //guaranteed since at the very least, this parameter template must be in the list.
    int fromIndex = parameterTemplates.indexOf( this );
    assert 0 <= toIndex;
    assert toIndex <= parameterTemplates.getLength();
    assert 0 <= fromIndex;
    assert fromIndex < parameterTemplates.getLength();
    assert fromIndex != toIndex;
    assert fromIndex != toIndex - 1;

    double order;
    if( toIndex == 0 )
    {
        order = parameterTemplates.fetchFirstElement().getOrdering() - 1.0;
    }
    else if( toIndex == parameterTemplates.getLength() )
    {
        order = parameterTemplates.fetchLastElement().getOrdering() + 1.0;
    }
    else
    {
        double prevOrder = parameterTemplates.get( toIndex - 1 ).getOrdering();
        parameterTemplates.moveAt( fromIndex, toIndex );
        double nextOrder = parameterTemplates.get( toIndex + (toIndex > fromIndex ? 0 : 1) ).getOrdering();
        assert prevOrder <= nextOrder;
        order = (prevOrder + nextOrder) / divisor;
        if( order <= prevOrder || order >= nextOrder ) //if the accuracy of the double has been exceeded
        {
            parameterTemplates.clear();
            parameterTemplates.addAll( getParentActivityTemplate().getParameterTemplates() );
            for( int i = 0; i < parameterTemplates.getLength(); i++ )
                parameterTemplates.get( i ).setOrdering( i * 1.0 );
            rocs3dDomain.getPersistor().flush();
            rocs3dDomain.getPersistor().refresh( getParentActivityTemplate() );
            moveAt( toIndex );
            return true;
        }
    }
    setOrdering( order );
    rocs3dDomain.getPersistor().flush();
    rocs3dDomain.getPersistor().refresh( getParentActivityTemplate() );
    assert getParentActivityTemplate().getParameterTemplates().indexOf( this ) == (toIndex > fromIndex ? toIndex - 1 : toIndex);
    return false;
}
Mike Nakis
  • 32,003
  • 7
  • 76
  • 111
  • I would use an integer ordering and if I felt that reordering was too expensive, I'd just reduce the number of reorders, by having each one jump by X, where X is the amount I need to reduce reordering by, say 20, which should be fine as a starter. – Warren P Dec 08 '15 at 22:03
  • 1
    @WarrenP yes, I know, it can also be done this way, that's why I just called this "my favorite" approach instead of "the best" or "the one" approach. – Mike Nakis Dec 08 '15 at 22:12
  • I like this answer, but you could improve it by adding an estimate for when one can expect to run out of precision, depending on the number of items and operations. – Vincent Jan 19 '21 at 09:57
  • @Vincent it has been a long time, and I do not remember, but as I wrote, "The code contains some guard against running out of precision". Specifically, that is the `assert prevOrder <= nextOrder` part. You can give it a try and see how many insertions you need to do before you will run out of precision. I remember it was so many that I did not want to do them all one by one from the testing code, that's why I needed to be able to specify from the tests a divisor which is much greater than 2.0. – Mike Nakis Jan 19 '21 at 10:57
0

What worked for me, for a small list on the order of 100 items was to take a hybrid approach:

  1. Decimal SortOrder column, but with only enough precision to store 0.5 difference (i.e. Decimal(8,2) or something).
  2. When sorting, grab the PKs of the row above and below where the current row was just moved to, if they exist. (You wont have a row above if you move the item to the first position, for example)
  3. Post the PKs of the current, previous and next row to the server to perform the sort.
  4. If you have a prev row, set the current row's position to prev + 0.5. If you only have a next, set the current row's position to next - 0.5.
  5. Next I have a Stored proc that updates all of the positions using the SQL Server Row_Number function, ordering by the new sort order. This will transform the ordering from 1,1.5,2,3,4,6 to 1,2,3,4,5,6, since the row_number function gives you integer ordinals.

So you end up with an integer order with no gaps, stored in a decimal column. It's fairly clean, I feel. But it may not scale up extremely well once you have hundreds of thousands of rows that you need to update, all at once. But if you do, why are you using a user defined sort in the first place? (Note: if you have a large table with millions of users but each user only has a few hundred items to sort by, you can use the above approach just fine since you will use a where clause anyway to limit the changes to just one user)

John
  • 113
  • 6