6

Me and a couple of colleagues are having a discussion regarding the following case:

In an OrderStatus table we're keeping track of all the statuses an order goes through in time, including "Pending", "Available", "Returned", etc. There's a timestamp field in the table to keep track of when each status was added. However, there are two statuses that may be added exactly at the same time, one after the other, in which case both may have exactly the same timestamp (including milliseconds). In another part of the code, we need to check which is the last status an order was set to, and since we're ordering by the timestamp we can easily get the wrong record from the database.

The table has an IDENTITY(1,1) primary key, and my colleagues think we should order using that column, since that will always necessarily give you the latest record added to the table.

I feel like an IDENTITY column should be voided of any meaning and thus should not be used for any sorting, and would rather do a Thread.Sleep(1) to make sure the timestamps are different and we can keep sorting using that field.

What do you think is the best practice in this case?

erictrigo
  • 315
  • 5
  • 8
  • Relying on time fields for differentiating entries is problematic due to granularity as you've already seen. See [this](http://programmers.stackexchange.com/questions/207726/do-i-need-ids-in-my-database-if-the-records-could-be-identified-by-the-date/207734#207734) related question. – Robbie Dee Jan 15 '16 at 16:01
  • @RobbieDee I can see how it _can_ be problematic, for example in this case. But isn't it _better_ than sorting using an `IDENTITY` column, which I can also see being problematic in certain cases? – erictrigo Jan 15 '16 at 16:06
  • Identity fields do has certain implementation problems, but which particular issues did you have in mind? – Robbie Dee Jan 15 '16 at 16:08
  • 1
    @RobbieDee For example, a case in which different applications which are accessing the database insert different statuses at almost the same time, one actually supposed to precede the other one, but having been inserted in the DB later. Very unlikely, but I think it _could_ happen. – erictrigo Jan 15 '16 at 16:13
  • 1
    How's a thread.sleep(1) going to help? If you wait a second and then the next second you can have a collision as well. Use the identity column, it's guaranteed to be unique and auto increments. Dates not so much. You could put a unique constraint on the date column, but you will run the risk of constraint violation and then re-running the transaction again back to the database if that occurs. Avoid that complexity. – Jon Raynor Jan 15 '16 at 16:17
  • @JonRaynor Assuming we're talking SQL Server, IDENTITY columns certainly are NOT guaranteed to be unique. They're easily broken by IDENTITY_INSERT and DBCC CHECKIDENT. They don't work in the same way as say, Oracle sequences. – Robbie Dee Jan 15 '16 at 16:38
  • If you want to fool around and turn off identity inserts or run a DBCC CHECKIDENT to reset the identify column one can insert the same number more than once. If inserting records in a table, the identity column will work nicely because the database will automatically increment the identity without any additional effort. The comment should be amended to say. the next number will not be the same as the previous. But for practical purposes, it will be unique provided it is not tampered with. – Jon Raynor Jan 15 '16 at 19:08
  • @RobbieDee Identity only breaks if you break it. And if you put a unique index on it then it is guaranteed to be unique. And in the question it is stated to be the PK. – paparazzo Jan 17 '16 at 17:44
  • The stated question is "last record inserted". How is Identity void of meaning when that is exactly what it does? Other record is *supposed* to get there first? How does timestamp or sleep do anything to fix that? – paparazzo Jan 17 '16 at 17:53
  • You can't ever possibly state "100% guaranteed" on any database for any given tipe. That's the general principle of both data security and data management. For instance, [you can reset Oracle sequences too](http://stackoverflow.com/questions/51470/how-do-i-reset-a-sequence-in-oracle). That being said and considering you're using the Identity column as PK (and thus, it will be sequential both logical and physically and also there will a key to enforce uniqueness), you should order by that key and get the last record. It's your best option, IMHO. – Anderson Matos Jan 20 '16 at 19:27
  • See also http://dba.stackexchange.com/questions/133556/can-i-rely-on-reading-sql-server-identity-values-in-order/135138#135138 – Michael Green Apr 21 '16 at 03:01

4 Answers4

5

Use numeric status values where the higher value represents the higher state. Add that to your index in descending order. Then pull back the result ordered by time stamp and the status in descending order. This way the entry with the farthest progression is always returned.

GrandmasterB
  • 37,990
  • 7
  • 78
  • 131
1

It sounds like an event sourcing approach would be helpful here

http://martinfowler.com/eaaDev/EventSourcing.html

essentially, each time you update the object, you load it, add the event to the history and save everything.

When you save the object, you check to see that nothing else has updated the history in the meantime. If they have, you rebuild the object from the new history and reattempt adding your event

Alternatively you can lock the object to cause other events to wait for you to finish.

This is superior to things like auto-inc columns because it ensures that the events are processed in order, not just saved in an order.

In your particular case however, it seems to me that your status is a function of the events which have happened. Instead of having the status read from the last event make it a calculation based on all the events which have occured.

In this case, you don't really mind which event happened first, or if two happened at the same time.

ie

say you have 'order picked' and 'customer credit check passed'

conceptually both of these can happen at the same time, but the status might be

'not picked' - no picked event

'pending credit check' - picked event but no credit check event

'ready for delivery' - both picked AND credit checked

Ewan
  • 70,664
  • 5
  • 76
  • 161
1

A better approach would be two keep two tables, current and history. The current table would always be the latest update. The other table would be a history table, which would always be just inserts and reads. If you want to get the latest record, query the current table. If your not allowing any dirty reads it will always be the latest record. If you need the history, query the history table and sort by the PK.

For the current table you want to keep a timestamp so if someone updates the record after your read it you can handle the conflict appropriately before making the update. For the history table you are always inserting so no check is necessary. You could handle the history inserts with a trigger off the current table if desired.

In your example:

  • ORDER (CURRENT)
  • ORDER_HISTORY (HISTORY)

That would make the logic straight forward.

Jon Raynor
  • 10,905
  • 29
  • 47
  • 1
    If you need history, then have a history table. Adding another table and introducing redundancy is a woeful solution. Implement proper keys. – Robbie Dee Jan 15 '16 at 16:14
1

Your colleague is right. Order by the primary key. The primary key is not devoid of meaning. It identifies each row in the table. It is numeric, which implies the order in which records were created.

It doesn't fit perfectly into the terminology of your business domain, but don't overlook the simple and obvious solution for ideological reasons, especially when this solution requires little more than a sort clause in a SQL query.

Greg Burghardt
  • 34,276
  • 8
  • 63
  • 114