How do computers deal with RAM latency?

Question

The clocks of CPU's have gotten faster than RAM. So computers have to wait a particular number of clock cycles between sending an address and receiving data. But how does a computer know how many clock cycles to wait? What does the computer do during the in between clock cycles? As well, how does your computer manage the timing if you change the clock speed (overclocking) wouldn't changing the clock speed completely mess up your computer since the RAM only works at a particular speed?

This question is essentially a copy of the question "What is the precise use of a memory controller and ram latency" which has already been answered. It may answer much of this question. — , May 04 '16 at 20:02
https://en.wikipedia.org/wiki/Clock_domain_crossing This is the main thing you're concerned with. Other than that, there's prefetching and caching that attempt to limit the latency of memory to the cores. In the interim, the cores can task switch to something else or go to sleep for power savings. — horta, May 04 '16 at 20:33
cpu clocks have been faster than ram for...almost...ever...dram is about 133Mhz last time I checked (at the time had not gotten faster in like 10 years), the 2133Mhz and such are not the ram speed, but the bus speed. the turn around time is still an eternity. How do computers deal? with caches, or they simply stall for hundreds of clock cycles while they wait. — old_timer, May 05 '16 at 01:24
in the x86 world I think the last time the clocks matched was 386 days when you bought an sram chip and you had to buy one to match the cpu speed. after that I think the cpu was faster and will always be. we can always run faster inside the package than the I/O can tolerate. so you would have to really clock down a lot inside the package to even deal with I/O much less the slowness of dram. — old_timer, May 05 '16 at 01:25
busses have not been the old fashioned read/write chip select address data style for a long time as well, accesses even to the caches still takes a number of clock cycles, so even if the cache responds in one cycle it is still several at best. — old_timer, May 05 '16 at 01:28
overclocking just means you adjust the pll for the cpu clock, there are other plls for the other interfaces, if it was one global clock or clock multiplier, pcie wouldnt work which means nothing else video, usb, disks, etc. not just ram. — old_timer, May 05 '16 at 01:29
CPUs have some local cache (L1, L2, L3) to tide the processor over until the data arrives. As you go down in levels, the cache gets progressivly smaller, but insanely fast (some L1 caches on big CPus are screaming along at many terabytes per second). The CPU often grabs big chunks of ram in one hit anyway and dumps it in it's own cache, that way it's got enough data to last it until the next external memory access. — Sam, May 05 '16 at 02:04
One thing to consider, RAM *can* be as fast as a CPU, but the IO constraints of economical parts hurts you. The fastest parts are not made on Si substrates, and don't use voltage for IO but ECL-type current couplers. If you want dataflow applications, where cache does not help you, you throw 100k at the problem and you can get CPUs and operate on clock parity with memory. — b degnan, May 05 '16 at 10:50

score 3 · Answer 1 · edited May 05 '16 at 02:38

how does a computer know how many clock cycles to wait?

It is hardcoded somewhere. On embedded hardware, it is hardcoded in the firmware. On older computers, it was hardcoded in the BIOS, because all DIMM modules behaved the same. At some point, it became configurable on some computers, in some obscure option menu within the BIOS configuration utility. Now, the DIMM modules contains a small serial ROM called Serial Presence Detect so the BIOS can obtain the latency configuration from the DIMM module itself. So it is now dynamic, but still hardcoded in the DIMM module.

What does the computer do during the in between clock cycles?

You mean during the DRAM latency clock cycles ? If it absolutely requires the data to proceed, it waits. But processors are now smart enough to make instruction prefetch (they request the opcodes from RAM a few cycles in advance to compensate from latency), and sometimes even data prefetch (when sequentially accessing a big RAM area - but this may require that the developer explicitly request it from his code, using special instructions). Moreover, the level 1 cache between the RAM and the processor most often hides this latency. But, when prefetch failed (for example, because of a conditional jump that was not predicted appropriately) and when data is not available in cache, it has to wait. This does not happen so often, hopefully.

As well, how does your computer manage the timing if you change the clock speed (overclocking) wouldn't changing the clock speed completely mess up your computer since the RAM only works at a particular speed?

I don't know the details of overclocking much in the PC world, so I'll try not to say wrong things. But I think the RAM and the CPU cores do not have the same clock base. So when you overclock the core, the RAM controller keeps working at his original speed and there is no problem (there are asynchronous interconnects between both, so they can work with independant clocks). Now, I think you can also overclock the RAM. In this case, if you reach a certain limit, you may have to increase the latency to compensate and still have enough time for the RAM to get the bits.

score 1 · Answer 2 · answered May 04 '16 at 19:46

The CPU gets a positive reply "here is the data you wanted earlier." It will usually wait a fairly long time before flagging an error, or even indefinitely.

You still get sensible performance out of the system by a combination of multithreading, prefetching and caching.

Multithreading, for example Intel's "HyperThreading" means that the CPU will execute two programs using one CPU core, and when one is stuck waiting for memory, the other can execute. Of course, if both are running at full speed, each of them only gets half the regular CPU speed, but for workloads that have lots of unpredictable RAM accesses, this helps increase CPU utilization.

Prefetching works by either guessing or explicitly telling the memory controller what memory will be read soon, and issuing the read requests in advance in the hope that the reply will be there in time. The most primitive implementation assumes that if a program is executing, it will most likely continue to read instructions, so large blocks can be read in advance. Some architectures, like the Itanium, have explicit prefetch instructions.

Caching keeps frequently accessed data in faster, nearer memory, and alongside, statistics are kept in order to keep the most relevant data around while evicting other stuff that is no longer needed.

These are generally combined, for example the prefetch will generally generate cache entries, so there is only one piece of logic that handles memory accesses from the CPU core and decides what to send to the memory bus, and there is a primitive processor that interprets the program and tries to predict jumps -- short backward jumps are translated to hints to keep the cache entries containing the jump target, while longer jumps are set as a new target for the instruction prefetch mechanism.

The actual implementation will contain thousands of other small optimizations to make these processes interact with each other well. Quite often, there is dedicated logic to process C++ virtual tables efficiently -- here, the jump target address is read from memory, so the prefetch logic needs to handle this before the instruction stream actually reaches the CPU.

score 0 · Answer 3 · answered May 04 '16 at 19:56

Most memory accesses will be answered from L1/2/3 caches.

As for the overclocking, systems which allow this will have separate settings for the RAM. You can also overclock the RAM, which may or may not work, or leave it running at a lower speed. This is often called "front side bus" speed although that relates to an earlier design of motherboard.

A PC BIOS will have detailed RAM timings, which it initialises by asking a small I2C chip attached to each RAM module. Non-PC systems work similarly, or if the RAM is soldered to the motherboard will hardcode the timings in the firmware.

score 0 · Answer 4 · answered May 04 '16 at 20:15

High speed CPUs typically have on-board cache that can operate at the CPUs clock cycle. High speed CPUs in this case refers CPUs that operate at much higher clock speeds than their main memory.

This cache can be divided into various levels, with each lower level getting closer to the CPU. The closer the cache is to the CPU, the faster it is. The faster it is, the smaller the cache size. That's always the trade-off on memory: for the same price, you can get big, or you can get fast. If you want big and fast, you'll have shell out some major dollars to get it.

On your typical x86 system, the memory hierarchy looks something like this:

   CPU
    |
   L1 Cache
    |
   L2 Cache
    |
   L3 Cache
    |
   Main RAM
    |
   Persistent Storage (HDD, SDD, etc)

L1 through L3 is RAM physically on the processor die, and the contents of it are copies of small sections of main RAM.

There is a separate hardware module called the Memory Management Unit that, among other things, copies data to and from the various caches and RAM. Usually, the MMU does a good job of predicting what the CPU is going to asked for next, and when the CPU asks for some piece of data, that data is already in the L1 cache and can be immediately read into the correct register.

However, there are cases where the MMU gets caught off guard and doesn't have the correct data in the correct cache. At this point, the MMU typically triggers an interrupt known as a "cache miss". This interrupt lets the software/firmware/operating system know that it's going to be a while before the data for this process is available.

At that point, it's up to the software/firmware/operating system to decide what to do next. Typically, the OS will switch processes and come back to the one that caused the cache miss when the needed data is available. All of this is transparent to the application.

As well, how does your computer manage the timing if you change the clock speed (overclocking) wouldn't changing the clock speed completely mess up your computer since the RAM only works at a particular speed?

Kind of. RAM can be overclocked or underclocked just like a CPU. However, the RAM and CPU (along with some other pieces) all derive their individual clocks from a master clock. This helps ensure that all the pieces stay synchronized. It's a very simplistic explanation.

How do computers deal with RAM latency?

4 Answers4