5

From this answer about latencies, we have some numbers (yes, caveat caveat) for latencies when coding (slightly edited):

  • L1 cache reference 0.5 ns
  • Branch mispredict 5 ns
  • L2 cache reference 7 ns
  • Main memory reference 100 ns
  • Send 2K bytes over 1 Gbps network 20,000 ns
  • Read 1 MB sequentially from memory 250,000 ns
  • Disk seek 10,000,000 ns
  • Read 1 MB sequentially from network 10,000,000 ns
  • Read 1 MB sequentially from disk 30,000,000 ns
  • Send packet CA->Netherlands->CA 150,000,000 ns

Do you have a metaphor that puts this in human terms? For example, a human might find a value in a table on their desk (L1) in 15s, so the equivalent time to go find the same thing if the page is still in the book on the shelf (L2) would be 14x longer, or 3.5 minutes.

I know this seems like an ELI5, but I think most programmers (including me) lack a real grasp of how the orders of magnitude vary, making it hard to reason about whether to memoize some function or pre-cache some data. I'll expand the example in an answer.

If you have a handle on CPU pipelining and cycles instead, that would be a great boon for understanding CPUs. Also mutex performance, SIMD paralellism, multi-threading, multi-core...

Phil H
  • 209
  • 1
  • 5
  • 1
    This doesn't seem objectively answerable, related to any conceptual programming problems, and is a slightly different take on "name that thing" questions. – Doval May 15 '14 at 11:51
  • @Doval: Hmm. I considered putting that on SO, but it seemed likely to be closed as subjective. Since this is really for programmers, I thought Programmers.SE was more appropriate. If you have a better place for it, let me know, or delete. – Phil H May 15 '14 at 12:38
  • Sorry, I don't know where's the best place to put this question. I do think it's a fairly interesting one, but I'm almost sure it'll eventually be closed as off-topic. – Doval May 15 '14 at 12:41
  • http://meta.programmers.stackexchange.com/questions/6483/why-was-my-question-closed-or-down-voted/6491#6491 – gnat May 15 '14 at 12:50

1 Answers1

5

Expanding the example:

You need the result of a log calculation, log(7.2), but you have only a slide rule and no computer. It takes you:

  • 15s [1x] to look up the value for 7.2 in a table on your desk (L1), because you have the page of 7.1 to 7.2 photocopied.
  • 3.5 mins [14x] for your secretary to go find the same thing from a book on the office shelf (L2), though this is shared with your colleagues (cores) so you might have to wait.
  • 50 mins [200x] for your secretary's assistant to go find the appropriate book from the campus library (Main memory) if you don't have a copy on the office shelf.
  • 9.5 years [2e7 x] for your value to be found in the national archive which involves setting up a human chain all the way to a remote mountain village (disk seek).
  • 143 years [3e8 x] for your value to be found by sending a rocket to neighbouring star system and back again (packet request with 150ms round trip).

As it goes, the galactic gods are ok to wait about 250 years if you just have the 1 answer to get, but they expect to see some sign of progress (user expects a response in 250ms).

If you had been doing a few of these and thought the result was going to come out below 0.85 (allowing you to carry on with your work on that basis), when you realise it's actually 0.857 it will take you 2.5 minutes to erase all the stuff you wrote since that mistake (branch mispredict).

Should you need more than 1 value, say a lot of values from the same book:

  • 43 days [5e5 x] to receive a million or so of these from your campus library sequentially; that works out to about 7.5s each, compared to 50 minutes for getting 1.
  • 3.5 days to send a large request for a load of values internationally (local network request).
  • 9.5 years [2e7 x] to receive a million or so of these (local network receive), assuming they have them all to hand and don't need to look in their own national archive (disk seek), which works out to about 5 minutes each.
  • 29 years [6e7 x] to receive a million or so of these from the national archive sequentially; that works out to about 15 minutes each, compared to 9.5 years for getting 1.

If it seems odd that the national archive takes longer than an international request, then yes it is odd. But that's how slow hard disks are; it is quicker to ask the next machine if it has the values to hand (i.e. cached) than to scan the head to the appropriate bit of disk ready to start receiving values.

Hopefully if the metaphor isn't too tortuous, you can start to see how you might use caching, pre-fetching and so on to avoid having to ask the national archive (disk), or heaven forbid make a network request (Proxima centauri, prepare for a generational ship in several decades time). For example, if someone said 'why don't I just pre-calculate all the log results and put them on disk so I can find one if I need it', it should be pretty apparent why it might be quicker to just calculate it every time (hint: 9.5 years!), but also why sometimes it's still quicker to cache stuff for a big sequential batch (7.5s each on average from RAM (campus library) instead of say 30s to calculate).

Phil H
  • 209
  • 1
  • 5