How does a company like Amazon avoid bottlenecks accessing the database layer?

Question

If you imagine a company like Amazon (or any other large e-commerce web application), that is operating an online store at massive scale and only has limited quantity of physical items in its warehouses, how can they optimize this such that there is no single bottleneck? Of course, they must have a number of databases with replication, and many servers that are handling the load independently. However, if multiple users are being served by separate servers and both try to add the same item to their cart, for which there is only one remaining, there must be some "source of truth" for the quantity left for that item. Wouldn't this mean that at the very least, all users accessing product info for a single item must be querying the same database in serial?

I would like to understand how you can operate a store that large using distributed computing and not create a huge bottleneck on a single DB containing inventory information.

Amazon architecture in the mid-2000's (still relevant to your question): http://highscalability.com/amazon-architecture — Joeri Sebrechts, Dec 12 '16 at 09:09
This also happens with seats in airplanes (or for e.g. packaged holidays where one item in the shopping cart represents a flight there, a rental car, a hotel stay and a flight back), with many different agencies selling the same seats on their respective sites. Solutions are myriad but they all come down to having one final truth database with the actual status for each part somewhere. — RemcoGerlich, Dec 12 '16 at 09:50
@RemcoGerlich: the way you say "one final truth database" makes me think of a single machine with *the big sacred database* on it. In reality, what happens for critical data is rather that all the transactions reach multiple servers at once, ensuring that all those databases are in sync at all times. — Arseni Mourzenko, Dec 12 '16 at 11:01

score 27 · Accepted Answer · answered Dec 12 '16 at 10:50

27

However, if multiple users are being served by separate servers and both try to add the same item to their cart, for which there is only one remaining, there must be some "source of truth" for the quantity left for that item.

Not really. This is not a problem that requires a 100% perfect technical solution, because both error cases have a business solution that is not very expensive:

If you incorrectly tell a user an item is sold out, you lose a sale. If you sell millions of items every day and this happens maybe once or twice a day, it gets lost in the noise.
If you accept an order and while processing it find that you've run out of the item, you just tell the customer so and give them the choice of waiting until you can restock, or cancelling the order. You have one slightly annoyed customer. Again not a huge problem when 99.99% of orders work fine.

In fact, I recently experienced the second case myself, so it's not hypothetical: that is what happens and how Amazon handles it.

It's a concept that applies often when you have problem that is theoretically very hard to solve (be it in terms of performance, optimization, or whatever): you can often live with a solution that works really well for most cases and accept that it sometimes fails, as long as you can detect and handle the failures when they occur.

answered Dec 12 '16 at 10:50

Michael Borgwardt

51,037
13
124
176

2

Pat Helland's [Memories, Guesses, and Apologies](https://blogs.msdn.microsoft.com/pathelland/2007/05/15/memories-guesses-and-apologies/) also covered in [Building on Quicksand](http://db.cs.berkeley.edu/cs286/papers/quicksand-cidr2009.pdf) and [compensating transactions](https://msdn.microsoft.com/en-us/library/dn589804.aspx) are relevant ideas here. – Derek Elkins left SE Dec 13 '16 at 03:47
1

You said "not really" but I feel like you're agreeing with what I suggested. It sounds like what you're saying is that when the user is just browsing, we give a cached approximation of the remaining inventory, but only when they actually try to complete the purchase do we do the write to decrement the remaining inventory. The DB containing that value will execute each transaction atomically, and if two users try at the same time, we show an error message for the second, since this is unlikely to happen. So, there eventually is one integer on a single machine that contains "the truth." – mattgmg1990 Dec 13 '16 at 05:46
2

@mattgmg1990: correct, eventually you of course have to know "the truth" somewhere, but the important difference is that the processing of orders can be done in a queue so you don't need concurrent atomic write access at all. In my case, the "error message" actually came hours after I finished the order on the Amazon website - I got an email saying that they had problems with the supply of that item and I could choose to cancel the order or do nothing and wait for them to fulfill it. I did the latter since I didn't need the item immediately, and they actually delivered it several weeks later. – Michael Borgwardt Dec 13 '16 at 08:29
@DerekElkins that's a great article, especially the point about digital data being a representation of reality that is unavoidably imperfect because reality can always have changes your system cannot automatically know about. – Michael Borgwardt Dec 13 '16 at 08:34

Ewan · Answer 2 · 2016-12-12T10:38:20.823

I have seen the 'Last Item In Stock' problem solved in the following way:

Update all the stock levels daily and flag products as high, low, on order or out of stock categories according to threshold levels.

Obviously its the 'low stock' items which are problematic

Items with high stock levels

Don't bother checking the stock level. Just place the order

Items with low stock levels

Warn the user when browsing 'Last few left!'. when they go to pay, check and decrement the stock level. If its out of stock, Update the item status.

This way you only hit the database for the 'low stock' items and you only do that when the customer is quite far down the process of buying. The cost is that some customers will not be able to complete their purchase.

However, In most cases 'out of stock' really just means you are waiting for another delivery, so you want to accept the order anyway and maybe just pop up a warning or restrict the delivery options. So those customers arent lost.

During high load times such as sales, you might even turn the stock checking off and just email customers later, 'sorry we ran out of X, would you like Y'

Essentially the aim of any ecommerce platform is never read from the database. Always serve cached pages and do everything client side.

Michael Durrant · Answer 3 · 2016-12-12T10:28:25.530

6

A combination of

hashing
sharding
replication
distribution
high fail-over
key-value stores

There's no magic, just more and more complex situations. Just like DNS, it is made to scale.

The 'single version of the truth' is part of such systems. Generating a new key becomes a more complex operation than just generating the next number in the sequence. For example other sequences exist. This is the sort of complexity that distributed database systems can handle and they do it by making several operation to and from components when making new objects, making them available to others, ensuring that sequences are unique when they need to be, composite keys, etc.

edited Dec 12 '16 at 10:28

answered Dec 11 '16 at 23:09

Michael Durrant

13,101
5
34
60

I've read about each of these concepts but the part I keep getting stuck on is the specific scenario of remaining inventory. If there's only 5 books left, and users making requests on multiple servers, do they always resolve to a single database table when it comes time to query the remaining inventory to ensure no two users can get the last book at the same time? What specific use of the above is making it so that this isn't slowing down the whole system and replication can still be useful with multiple DB instances? – mattgmg1990 Dec 12 '16 at 01:18
Added a bit more. i can't really explain all the complexity involved in this format, sorry. – Michael Durrant Dec 12 '16 at 01:30
1

Only some people are interested in the any given book, this means, book can be handled by a shard with relatively small load. – Basilevs Dec 12 '16 at 02:02
Thanks Micheal I don't have the complete picture yet but getting there. @Basilevs that makes a lot of sense to me, any individual book can be served by just one DB. What about some extremely popular product though, like Amazon Echo? There could be thousands of people browsing that page every second. – mattgmg1990 Dec 12 '16 at 05:38
@mattgmg1990, that's handled by caching and replication - front-end provides a cached, less consistent copy of a page generated by back-end occasionally. Front-end is multiplexed. With greate load growth exact consistency is not so important - risks of unsynchronized stock data are negligible. – Basilevs Dec 12 '16 at 05:55
6

I think in the scenario you're describing the system just has to apologize to the user that someone else bought the last copy. I imagine this does occur from time to time. – Matthew James Briggs Dec 12 '16 at 08:55
1

I bet that _there's only 5 books left_ indicator is less computing and more marketing. – mouviciel Dec 12 '16 at 08:57
If they really only have one left, you are probably one of the few people on earth considering buying it. I have seen the "one left" notice persist for days. No one else wanted that artist / CD combination, or whatever. This is a frequent even in my life. Also that my desired products are discontinued for lack of interest. – Apr 10 '17 at 14:29

score 2 · Answer 4 · answered Dec 12 '16 at 20:41

In this video, Martin Fowler discusses NoSQL databases:

https://www.youtube.com/watch?v=qI_g07C_Q5I

One of the points (somewhere in there), is that places like Amazon would rather keep 99% of people happy by accepting their order without being able to check "for sure" whether it's actually available, and maybe irritate a very small percentage by having to say "sorry, looks like someone beat you to it."

Which is to say, there's no real handling for the scenario you describe, just that Amazon takes the benefit of the doubt based on the last successful inventory read, and if a concurrent transaction slipped in between - oopsie.

(btw, that's a great video if you're curious about NoSQL)

How does a company like Amazon avoid bottlenecks accessing the database layer?

4 Answers4

Linked