Back in the early 2000s I remember asking about why it was so important that servers use ECC memory. The prevailing wisdom at the time was that systems with lots of RAM would be, statistically, more likely to suffer bitflips. This makes sense - if each cell has a 10-20 probability of suffering a bitflip per second, then 109 cells have a 10-11 probability per second. The more cells you have, the higher the probability of a bitflip in any given time period.
Back then we would be looking at a ballpark of 128MB to 1GB of RAM. These days we regularly put 16GB or more in laptops and desktops, with workstations commonly having 64GB or more. For argument's sake, let's say we've increased total RAM amounts by two orders of magnitude. So we should see a hundred times or so more bitflips, on average, in any given system, assuming that nothing else changed.
The more I thought about it, though, the more I realised that the random bitflip rate should be much higher in newer systems:
- Lower operating voltages means lower distinction between a 0 and a 1.
- Lower gate charge means less overall energy required to flip a bit.
- More densely packed gates increases the likelihood of being affected by cosmic rays.
- Refresh times don't seem to have gone anywhere. DDR2 tRFC was 40-60 clocks, DDR3 tRFC was more like 90-130 clocks, and DDR4 tRFC is more like 200-450 clocks. When you divide by the internal memory clock rates to get a wall time for each refresh timing it doesn't really show much of a trend - it's effectively flat but with a higher margin either way as time goes on.
But, as far as I know, we're not seeing bitflips everywhere on non-ECC RAM, at least within the confines of our atmosphere.
So, what's the deal? Why aren't we seeing endless bitflips everywhere, at least 100x if not 10000x more frequently than two decades ago? Is ECC actually important in the context of growing RAM sizes, or do the stats not back it up? Or is there some other technology advance that is mitigating bitflip problems in non-ECC memory? I'm particularly interested in answers with authoritative references rather than speculation about error rates.