Why is a duplicate database Insert fault not considered a Concurrency Error?

Question

A textbook I am teaching from about ASP.Net using ADO.Net in C# says:

Note that concurrency checking isn't necessary here, because a concurrency error can't occur for an insert operation.

The textbook emphasizes using Optimistic Concurrency (as extensively described in the Microsoft online documentation): Update and Delete statements include a Where clause that provides all the original values, and the number of rows affected should be one. If the original row is not found, no exception is thrown, it just returns zero for number of rows affected.

But for Inserts, normally a duplicate row exception would be thrown on unique index values. So the Insert throws an exception and the Update / Delete does not. But it still amounts to "concurrent changes to the data" no matter how they might have come about. It could even be the same user attempting to add the same data again.

It seems to me that an update or delete that affects zero rows ought to throw an exception. But equally, the Insert could simply return zero rows affected. Regardless, I do not see why Inserts cannot be considered as a source of concurrency errors. Is this just the Microsoft take on these terms?

It would appear that the textbook in question defines "concurrency" as "two users attempting to edit the same row at the same time." This is consistent with "Optimistic Concurrency," which basically says "do not lock the row while someone else is editing," and "Pessimistic Concurrency," which says "lock the row when someone begins editing it." Neither condition can occur for inserts, because the row doesn't exist yet in the database. — Robert Harvey, Feb 26 '18 at 17:38
@RobertHarvey Thank you. Is there a name for the situation where duplicate inserts cause a failure? If we look at it logically, Update and Delete both assert that "this row does exist'" and Insert asserts that "this row does not exist". It seems a bit flaky to consider one case to be an error and the other not - or a very different kind of error. It amounts to a temporal anomaly, so why piddle over which direction time flowed? As the *Gita* says: at first something is unmanifested, then it manifests, then becomes unmanifested again. "What is there to lament in this?" — , Feb 26 '18 at 18:26
This is what happens when we get tangled in word definitions. During the first day of class I told my students that they would be getting vocabulary quizzes, but I also warned them that it is more important to understand the underlying ideas than it is to memorize the word's precise textbook definition, as that definition won't help you when a system goes down unless you have a firm grip on the underlying principles that the word embodies. At the end of the day, what you really want is a sensible approach to handling insertion and deletion problems, not precise word definitions. — Robert Harvey, Feb 26 '18 at 18:34
That said, I believe that the best vocabulary for your insert and delete conditions is "Record already exists" and "Record not found," implying that someone else already inserted or deleted the record in question. — Robert Harvey, Feb 26 '18 at 18:36
I was thinking that a friendly error message could say: "Someone else beat you to it." — , Feb 26 '18 at 19:18
Apparently I am thinking of the high-level User view of "I am Adding / Changing / Deleting records" where two people can cause colliding changes (first one submitted succeeds and the second gets an error). But the phrase "Optimistic Concurrency" is about the low-level, row at a time events occurring. Using the Where clause to detect that a change has not occurred is **a stroke of genius**, but I think if my code asks to Update or Delete a row and it was not found, that should be an error. Why would I try to change a row that isn't there? I guess SQL can't know if my Where clause is correct. — , Feb 27 '18 at 16:28
Very likely, the condition you describe on "first one submitted" is detected by a [row-version](https://docs.microsoft.com/en-us/aspnet/mvc/overview/getting-started/getting-started-with-ef-using-mvc/handling-concurrency-with-the-entity-framework-in-an-asp-net-mvc-application#add-an-optimistic-concurrency-property-to-the-department-entity) field. The second person cannot submit an edit because the row-version no longer agrees with the record that he retrieved. Trying to change a record that's already been deleted is... well, obvious. And yes, those are both error conditions. — Robert Harvey, Feb 27 '18 at 19:05
One solution is for all complex records to have an associated "header" row that indicates the internal and external primary keys, the datetime created and updated, possibly by whom, and also a column to indicate if the record has been 'deleted'. Deletes cannot be distributed, so this is the only way to handle that. With this header idea in place, then yes, the timestamp will allow detecting that the version being submitted does not match the one in the database. — , Feb 27 '18 at 20:30

score 3 · Accepted Answer · answered Feb 27 '18 at 23:02

There are several statements in your question that are not correct and once clarified should answer your question:

But for Inserts, normally a duplicate row exception would be thrown on unique index values. So the Insert throws an exception and the Update / Delete does not. But it still amounts to "concurrent changes to the data" no matter how they might have come about

No. An insert that generates a unique key exception is unrelated to concurrent changes. You can try to insert that row one year later and it will still throw the same exception.

It seems to me that an update or delete that affects zero rows ought to throw an exception

No, why? An update or delete query can affect zero to all of the records of a table, depending on your where clause. Affecting zero rows is not an error, it's just that no rows satisfied your where clause.

But equally, the Insert could simply return zero rows affected.

No. An insert is an operation that inserts one row (no more no less). If the operation cannot be executed, it's an error. Note that there several errors could happen (trying to insert a duplicate value in a unique key or providing null for a non nullable column, for example).

Regardless, I do not see why Inserts cannot be considered as a source of concurrency errors

As I mentioned before, errors during inserts are unrelated to concurrency. You can insert a row with key 100 and an hour later try to do it again and it will fail. No concurrency involved whatsoever. On the other hand, to do an update or delete with optimistic concurrency check, you read a row and submit the update or delete with the original values in the where clause. The only reason why the update or delete would not modify any row is that "someone" has modified or deleted that row after you read it. In other words, while (concurrently) you were doing your read/update or read/delete someone else changed that row. That is why this pattern detects concurrency issues. If the other process modifies the data just before or after you, your update or delete will modify the right number of rows. But if it modifies it at the same time, then the operation will affect zero rows.

If I try to update or delete a row using a unique index to locate that row, it indicates quite clearly that my *intent* was to modify exactly one row, just like my insert intended to insert exactly one row. "Sauce for the goose is sauce for the gander": if a unique-identifying clause is included, it should work consistently. And just like trying to do a duplicate insert could happen hours later, I could view a screen, go to lunch, then submit an update and it could fail. If one is concurrency, the other should be also. I am thinking of a user's experience: "I tried to do this and it failed." — , Feb 28 '18 at 14:33
Probably we just need a TryInsert() and TryUpdate() and TryDelete() to indicate the intention, just like TryParse() in C#.Net. I don't make calls to the database in vain, I want them to succeed, or say clearly why they did not. "0 rows affected" is not good enough. — , Feb 28 '18 at 14:35
I still disagree. The different behaviour of inserts and updates/deletes is what it is. The fact that you use an update with a unique id won't change its behaviour. That unique Id could or could not exist, therefore that update can succeed both updating 0 rows or 1. It is not an exception. The insert succeeds when inserts 1 row and fails when it doesn't. That's it. — Francesc Castells, Feb 28 '18 at 15:02
Regarding the concurrency, again, I disagree. What you describe about going to lunch is by definition concurrency and it's precisely why the concurrency check was invented. The concurrent operations are not the updates. The operations that are concurrent are the read/update, not only the update. That's what optimistic concurrency is checking: That nobody change the data since your read it, because it would mean that your user has taken a decision based on stale information. Therefore you get a concurrency error instead of overriding the latest data. — Francesc Castells, Feb 28 '18 at 15:08
Regarding the "0 rows affected" not being good enough, note that if you use Entity Framework a delete or update with optimistic concurrency that affects 0 rows will cause a ConcurrencyException. — Francesc Castells, Feb 28 '18 at 15:10
Right, but only one person on Earth really understands EF (Julie Lerman). I guess that guy I worked with 3 decades ago who basically invented the same thing without SQL (PC databases didn't have it back then) was pretty smart, too. I keep trying to introduce the same idea and it is not understood. Oh, well. — , Feb 28 '18 at 15:28

score 1 · Answer 2 · answered Feb 26 '18 at 19:46

1

I don't think a return of zero rows is necessarily an exception or an error in any case of insert, update, delete or select. I've never had a need to treat it that way.

In the case of cleaning up data, I may want to update a field that is null and just because they all have a value, why is that an exception? It could be a good query or a bad query as far as my domain logic is concerned, but there's no rule violation here. In fact, everything is great.

An insert can return zero rows especially in the case where you may be transferring data from one table to another or from the result of a select statement to a table. Think of the example of the Null values getting updated and I want to log this in another table. Now I have to handle two exceptions when everything worked as planned.

Most relational databases manage transactions. This is how they're able to be rolled back from a backup to a particular point in time. Many batch processes work in an all or none fashion, so thinking of concurrency in a literal way when who knows what is exactly happening behind the curtain of a RDBMS, isn't much help.

Using record locking or some other mechanism to handle concurrency as far as your application or domain is concerned, that's a different set of issues as others have pointed out.

answered Feb 26 '18 at 19:46

JeffO

36,816
2
57
124

"After all that typing, you added zero rows! Congratulations!" – Feb 26 '18 at 22:14
@nocomprende - That's better than the messages most devs come up with. – JeffO Feb 27 '18 at 13:33
I used to be evaluated on "Understanding the User's view of the system." I guess no one worries about that these days, what with immutable data and 400 high level languages to play around in. – Feb 27 '18 at 13:53
@nocomprende - Most users care about the database about as much as the database cares about them. – JeffO Mar 01 '18 at 15:51
So, should programmers take the user's view, or the hardware / software's view? Which is more important? – Mar 01 '18 at 19:54
@nocomprende - When it comes to databases, there are many different types of users. You're looking at it from an application's perspective. I've inherited databases. When cleaning up the data, I may run an update statement to account for missing data (They had no default value.). If none of the records are missing data, why would I want the database to tell me that's an exception because zero rows were updated? That's what I had hoped for. When you work with other systems like databases, you take what you can get and make life easier for your users. – JeffO Mar 06 '18 at 19:33

score 0 · Answer 3 · answered Feb 26 '18 at 18:54

0

The book may perfectly state that concurrent inserts do not cause concurrency errors, if it assumes that the concurrent operations are performed by processes/threads that are independent from each other.

In your example with the unique code, the error is not caused by the concurrency of the insert but the inconsistency of the data that you attempt to insert (i.e. two records with the same id that should be unique).

This would not happen if you'd let some database functionality generate a unique ID, or if you would use some UUID algorithm to generate one.

Eventually the issue could be caused by another concurrency issue, in the the algorithm that generates the unique ID (but not the insert). This can for example happen if you read the last sequential id and add 1 and assume it is unique. Of course without any synchronisation operation, this will not be the case, as two different processes could obtain the same result.

answered Feb 26 '18 at 18:54

Christophe

74,672
10
115
187

Unique IDs are not always necessary or even appropriate. The example in the book is a list of "Categories" where the first column is the unique key and it is a human-readable value. Even if some autonumber column was added, we still do not want the displayable values to be duplicated. Two people could very well both be adding rows and try to add the same row. Or, a user might forget that they already did it and try again. I do not see that these cases differ meaningfully from two users both deleting or updating the same row, or the same user trying it again on a different device. – Feb 26 '18 at 19:23
@nocomprende: I think the differentiator is that, in Optomistic or Pessimistic concurrency, an indeterminate window of time is opened between the start of an edit and its completion. No such window exists for deletions and insertions. – Robert Harvey Feb 26 '18 at 19:25
@RobertHarvey Trying to grok... I must have some idiosyncratic view of data systems in terms of human beings looking at a form on a screen and submitting changes. Thoroughly inappropriate, I guess. Anyway, two users can have a record on a screen at around the same time, and if one user hits Submit before the other, the second one "loses out". It does not matter if the request Adds, Changes or Deletes the data. So I don't see that the window of time is different in these cases. Am I missing something? – Feb 26 '18 at 19:30
@nocomprende: Actually, my experience with optimistic concurrency is that the last save wins. Anyway, I suppose that time window is only opened with pessimistic concurrency; a record lock exists while the user is editing the record, preventing anyone else from starting an edit on the same record. – Robert Harvey Feb 26 '18 at 19:38
@nocomprende The cause of the error is still not the concurrency, but still the inconsistency of the data: if one user would change existing user-readable ID to a value X, and another user would enter the same value X for a new record to be inserted, the cause of the error you'll observe is not due to concurrency. In fact, the users do not have to press enter in the same time for the error to happen. One user could perfectly do his operation in the morning, and the other in the afternoon, and one of them would still get the error message. – Christophe Feb 26 '18 at 19:39
@RobertHarvey Wow, last save wins. I hope you are not using my bank, because if I take out money and then you hack my account and take it out too, I don't want those greenbacks vanishing from my wallet. Oh, I guess we have to consider side effects. Banks are not Functional, after all. But they do like ACID as you mentioned. – Feb 26 '18 at 22:12
That's a different thing entirely. Banks don't allow those kinds of edits to an account. Everything in a bank is transaction-based; if money is transferred from one account to another, a debit is made to the source account, a credit is made to the target account, and the whole thing is wrapped in a transaction scope so that either the whole thing succeeds and gets committed, or it fails and gets rolled back to its original state. – Robert Harvey Feb 26 '18 at 22:26
In any case, I did review [optimistic concurrency,](https://en.wikipedia.org/wiki/Optimistic_concurrency_control) and apparently the only real difference between that and pessimistic concurrency is that pessimistic concurrency checks for conflicts *before editing begins,* and pessimistic concurrency checks for conflicts *when a save is attempted.* In either case, an arbitrary time window is opened between two competing users, which is not what happens with inserts or deletes, which merely check for the presence or absence of a record. – Robert Harvey Feb 27 '18 at 00:55

Why is a duplicate database Insert fault not considered a Concurrency Error?

3 Answers3