5

The problem of heat dissipation in high-performance, small form-factor SSDs is well-known, for example, the paper Transient Thermal Analysis for M.2 SSD Thermal Throttling published in 2018 17th IEEE Intersociety Conference on Thermal and Thermomechanical Phenomena in Electronic Systems states:

Solid State Drive (SSD) technology continues to advance toward smaller footprints with higher bandwidth and adoption of new I/O interfaces in the PC market segment. Power performance requirements are tightening in the design process to address specific requirement along with the development of SSD technology. To meet this aggressive requirement of performance, one major issue is thermal throttling. As the NAND and ASIC junction temperatures approach their safe operating limits, performance throttling is triggered and thus power consumption would drop accordingly.

Naturally, if space allows, adding a huge heatsink is a possible solution to this problem, there are many products available on the PC gaming market. I also see many M.2 to PCI-E passive adapters on the market have built-in heatsinks by adding a huge copper pour with connection to the ground plane under the M.2 connector.

But one can find many unsourced posts on random computer hardware forums, which claims that the NAND chips should never be cooled. It is claimed that they are actually designed to heat itself up to an optimum operating temperature, and adding a heatsink to the NAND chips adversely affect its reliability. Here's some examples.

One claim reads,

Don't cool the NAND dies themselves!

They heat themselves up to operating temperature by design, cooling them means they just continually dump out power trying to hit temperature, and will be operating with a lower endurance (simplified: higher operating temperature = lower energy input to set/erase cells = less degradation of each cell per write/erase cycle).

Another claim reads,

Cooling the NAND is bad. You want the NAND to run warm and stay warm. As its temperature fluctuates, and as it cools down, if you suddenly transfer a large file (read or write, I can't remember) while the NAND hasn't had time to warm back up first, it can significantly reduce the life of the NAND.

It doesn't sound right to me. It suggest that the NAND chips depend on the self-heating effect to reach an optimum operating temperature, which is something I've never heard before. The only chips that I know that use self-heating are National's LM199/299/399 "Super Zener" voltage references, and Linear Technology's LT1088 Thermal RMS-DC Converter. But I don't believe NAND chips have anything to do with self-heating.

I tried to fact-check and/or debunk these statement, start by looking for a NAND chip datasheet found in some recent SSDs. I went to Digikey and Mouser, set the filter to the highest storage density and sorted them by prices. Unfortunately, it seems that datasheets are not available (all under NDA? I'm looking at the wrong place?).

Are these strange statements have any factual basis?

比尔盖子
  • 6,064
  • 2
  • 24
  • 51
  • 2
    Please cite the reference for the claim you posted - add a hyperlink. – Andy aka Jun 10 '20 at 12:08
  • https://www.anandtech.com/show/15182/gelid-unveils-subzero-m2-xl-a-diy-cooling-system-for-m222110-ssds It is a user contribution. – Bart Jun 10 '20 at 12:40

3 Answers3

8

The paper Influence of temperature of storage, write and read operations on multiple level cells NAND flash memories from 2018 shows the following graph, which suggests that writing to flash cells at a temperature of 25°C or lower results in earlier problems at reading compared to writing at 85°C.

In their discussion they deduce the following reasoning:

Most NAND Flash memories implement the Fowler-Nordheim tunneling effect 1 in order to inject charges through the floating gate [7] during write operation. During write cycles, the programming circuit controls the charge of cells to ensure a sufficient margin of voltage threshold. It is assumed that the writing management circuit probably drifts with low temperatures. Indeed, transistor parameters (threshold voltage and gain) vary with temperature which in turn induces drain current shifts.

And in the conclusion they summarize:

Write operations at low temperatures lead to a decrease in data retention time, probably not due to a degradation of the cell but due to parametric drifts of the die embedded electronics dedicated to write operations.

This suggests why the comment cited in the question might say that.
But in practise I would assume that this effect is not relevant, because a better cooling of the flash will simply give the flash controller more headroom to higher performance while keeping the same temperature (assuming cooling with a traditional heatsink). After seeing the above measurements I would NOT cool my SSD with LN2, though.

enter image description here
https://doi.org/10.1016/j.microrel.2018.06.088

jusaca
  • 8,749
  • 4
  • 36
  • 58
  • 1
    Thanks for the post! Adding a copper pour connected to the ground plane on a PCI-E adapter is extremely unlikely to cool a NAND chip to 25 °C (whether it's junction or ambient temperature), I'd say the case is closed and they're perfectly safe designs. – 比尔盖子 Jun 10 '20 at 13:09
  • That's an interesting find there. It clearly shows that the effect of high temperature dominates over the stress induced by writing at lower temperatures. Exactly the paper I was looking for but didn't find. – Arsenal Jun 10 '20 at 13:12
2

First, I don't think (e.g. haven't encountered) flash chips will actively heat themselves. They just get hot when they are used like a CPU will. Power consumption and temperature measurements of SSDs also seem to indicate this (they would use quite a bit of power in idle to stay at optimal temperature).

I think the claim is looking only at one thing and not at the whole picture. This article on EEWEB shows two things to consider. Raw Bit Error Rate (RBER) and data retention. There is a definitive increase in the RBER with decreasing temperature, but at the same time data retention decreases significantly with higher temperatures.

There are probably different failure modes at work which contribute differently on different temperatures. Seeing that the data retention is significantly reduces at elevated temperatures (above 55°C), I'd try to keep it relatively cool.

Data suggests there is no negative effect on staying at room temperature where data retention is still a lot better than at higher temperatures.

It's surprisingly hard to find anything on retention and endurance at temperatures below 25 °C.

Arsenal
  • 17,464
  • 1
  • 32
  • 59
-1

How true is this claim?

In my opionion: this is completely untrue.

As far as I know there is no "optimum" temperature, the highest temperature at which the chips will work is limited by the temperature control on the chip. The chips slow themselves down when the maximum operating temperature is reached. Slowing down means less power consumption so less heat generation so the temperature of the chips will stabilize at a maximum value that is controlled by the chip.

When a heatsink is attached, more power can be dissipated in the chips while they are at that same temperature. A heatsink will help to remove the heat from the chips. As more power can be dissipated the chips can run at a higher speed.

higher operating temperature = lower energy input to set/erase cells = less degradation of each cell per write/erase cycle

I doubt that the amount of energy needed to write to a cell depends so much on temperature. Also, this seems to relate only to writing to cells and not reading. It has been proven that cooling an SSD increases its read performance.

The writer of the article (you should include a link where you fond it) does not seem to understand very well how flash memory works.

Bimpelrekkie
  • 80,139
  • 2
  • 93
  • 183
  • Your personal opinion doesn't matter at all and it's terribly wrong. You're mixing up what the controller and the actual NANDs do. If you would've read the paper you would know your last phrase is just wrong too. You are filling your lack of knowledge with "opinions" and then even dare to discredit the op? – duketwo May 20 '21 at 01:08