8

We currently have a store/shopping cart system that uses a single database. We have products with a field for the number we have in inventory (say 100 widgets). We have a customer table. When someone adds a widget to their cart, we insert a record in a join table between the customer and the product which represents intent to purchase. That customer_product record has a status indicating that it's either in the cart or that the purchase has been completed ('Pending','Purchased').

When a customer request hits the system to add a product to their cart, we count the number of purchased and pending customer_product records for that product and disallow it if the number is equal to the total (100). This way, we ensure that we don't allow 101 people to have 100 items.

The database is our system bottleneck, and the join table gets hit a lot. I suspect row and page locks affect performance under load. I would guess systems like Amazon's/eBay's must have a distributed db architecture, and yet somehow manage the problem of 2 people wanting to put the last item in their cart at the same time. I'd like to rearchitect our store/cart to alleviate the db constraint.

With a single database, we can do something in our join record insert WHERE clause to include a subquery count so that if two db transactions are trying to do the "last widget" insert concurrently that whichever tries to commit second will fail because the count will prevent it after the 2nd-to-last transaction takes the last widget and changes the count. But in a distributed database, I'm guessing that trick won't work.

What general system architecture guiding principles or patterns apply when addressing such concurrency and shared resource challenges in a distributed system?

Note: I'm aware of similar questions (like Best-practice to manage concurrency into a basket in a e-commerce website). This question is specifically about how to handle it in a distributed architecture where every db instance has a copy of the tables and changes in one propogate to the others only every so often (at least that's how I imagine it - I haven't actually set up a distributed db system before).

jinglesthula
  • 305
  • 1
  • 8
  • 4
    You don't mention the [CAP theorem](https://en.wikipedia.org/wiki/CAP_theorem) which is definitely relevant. Searching on this will yield a lot of information about strategies around this problem. – JimmyJames Jan 13 '17 at 20:53
  • A distributed relational database can still manage a transaction. – JeffO Jan 13 '17 at 21:27
  • Transaction processing is only one part of the overall system. Ultimately, you need a person watching the rates of change, prompted by analysis software. – SDsolar Jan 14 '17 at 14:24
  • 1
    Just create a queue an process each job started by event, and there will be no chance of overselling – Ere Oct 11 '18 at 15:01
  • @Ere when you add it in Queue even at that time also it would be from distributed system and there would be common queue to handle it. Then how it can consider which one to consider as last and second last if it has came at same time. Asking for curosity – insoftservice Sep 05 '22 at 10:32
  • @insoftservice If there's a central queue with (for example) an auto-increment id field, the db will have to create the queue records sequentially in order to produce ids. Processing in order of creation will guarantee a deterministic order, even if the requests arrive at the "same time" as far as the db's time precision measures it. – jinglesthula Sep 06 '22 at 18:25
  • Thans @jinglesthula but then similar can be achieved even by deadlock of db , then why we require queue for it. Is there some another option beside queue. Is there some link steps to implement queue – insoftservice Sep 07 '22 at 19:17

2 Answers2

12

It depends on the widget.

If the widget is rare and expensive (exactly 10 Ferraris), then the approach you're following is correct. Of course, you also need to account for inventory that's being returned but hasn't been restocked yet, inventory that's out for repair, etc.

If the widget is a bit more common (5,000 wrenches) then the usual approach is to:

  • Accept all orders. It's perfectly fine for 5,500 people to order your wrenches. When it comes time to ship them out, 5,000 will be shipped and the rest will be placed on back order. Typically your order volume will be so low that you won't need a distributed database.
  • Define a "low quantity" trigger along with a reorder amount. For example, you might have a rule that says "whenever the number of wrenches on hand gets below 1,000, call the vendor and order another 5,000". This rule is rarely automated - it's better for the system to send a notification to a human to make the final decision.
Dan Pichelman
  • 13,773
  • 8
  • 42
  • 73
5

You can use a separate database for users with their carts than for inventory tallies, by using simple id's instead of foreign keys, and making up the non-null requirements by the application.

This will reduce some contention as compared with a single database.


The inventory database can store the total available inventory count for each item, and also in that inventory database (as you suggest) store/cache the computed value that is the total cart-claimed count from the all of the users/carts database, which will need to be updated as items are claimed/released by carts.

This will reduce some load on the user/cart database at the expense of managing the cached value by the application (caching/denormalized for performance).


Both the user/cart database and the inventory database can be sharded across numerous databases.

Sharding stores the same tables in multiple databases, though not the same data, as specifically chosen different rows go in each of the databases to spread the various access and modification loads across those databases. Sharding works well for things like users and inventories that don't need to be all accessed at the same time / in the same query (we don't often need to query all users (e.g. count of all users) or all items of inventory at the same time, e.g total of all inventories).

If the sharding strategy is simple (e.g. for inventory, the inventory id modulo the number of shards), it is relatively easy to identify which inventory shard has that inventory item.

The combination of the above should significantly reduce contention for database services.


Orthogonal to some of the above, you can also distribute the inventory count among inventory replicas, where if you have 5000 widgets and 2 replicas, each is given a tally of 2500 to sell.

No coordination is needed until some minimum threshold is reached (say one replica sells 2400 and is now down to 100).

At that point then the system may request a rebalancing of inventory from the other replica, so if the other replica still has 2000 remaining, then maybe half of that can be taken by the other.


The replication/distribution of inventory method can be combined with the sharding method, in that replicas can be sharded.

Erik Eidt
  • 33,282
  • 5
  • 57
  • 91