0

After reading this excellent question, it seems that thermal issues are currently one of the largest limiting factors in CPU design. What specifically is preventing forward progress with heat removal? Shouldn’t it be relatively simple to build better cooling mechanisms?

Why is it so hard to build cooler CPUs to begin with?

dalearn
  • 121
  • 6
  • Energy density inside a120W CPU with say 1mm square die is comparable to the energy density inside a fission reactor . – crasic Jul 29 '20 at 21:30
  • Die more like 20mm sq than 1mm sq. (Mostly cache, but...) –  Jul 29 '20 at 21:40
  • Maybe it is not a technology issue, but economics issue? Surely technology exists to conduct heat away from CPU cores to heatsink more efficiently (e.g. using artificial diamond as substrate), but maybe manufacturing it would make that single CPU so expensive that it's just cheaper to buy a rack filled with computers that have matching processing power. It also needs other infrastructure and standards around it, e.g. motherboards that can provide 240W to a CPU instead of 120W, heatsinks that can dissipate double the power, power supplies that can provide double the power on CPU supplies. – Justme Jul 29 '20 at 21:52
  • Infrastructure - imaging few million gamers, who don't know which end of a screwdriver to hold in their hand, having to change the radiator fluid in their tower PC ... – AnalogKid Jul 29 '20 at 21:59
  • Performance increases with TMSC's 7nm litho has given AMD an advantage over Intel with uses reduces Miller capacitance which affects heat dissipate per switched gate. Intel's ***[Coffe Lake S][1]*** is 14 nm litho with a 149 mm² die and unknown layer thickness both which affect the thermal conductivity limit of ***~1'C/mm²-W*** +/-50 %(est.) depending on water or air cooled. AMD's ***Ryzen 3900X 12C 7nm*** now has the advantage with 105W max and 142W pk – Tony Stewart EE75 Jul 29 '20 at 22:14
  • It's not so much that CPU's that produce heat and that it's hard to remove, it's that they waste energy, and that costs money. The more you have to cool a CPU, the more money you need to spend (LN2 gets expensive) – Voltage Spike Jul 29 '20 at 22:19
  • By cooling mechanisms you are probably thinking of heatsinks, fans, and water cooling, but those all go on the outside of the chip and it takes time for heat to travel from inside to outside. If heat is being generated in the chip faster than it can be removed, it accumulates inside the chip until the chip heats up enough so the increased thermal gradient is shoving heat out of the chip as fast as it is being generated. How do you fit those cooling mechanisms where they need to go which is basically everywhere inside the chip next to every heat source? And how do you lead the heat out? – DKNguyen Jul 29 '20 at 22:21
  • The alternative is to increase surface area but now everything is far apart so distances is larger so everything is slower, or fit more more stuff into the same area without increasing power density but when you're down to only atoms thick and increasing leakage, density, and capacitance, that's kind of difficult to do – DKNguyen Jul 29 '20 at 22:22
  • It's easy to build a cooler CPU. Use a larger gate size, run it at a lower clock rate, and generally reduce the die size by simplifying the design. You end up with something like the Microchip PIC range that can run at 1W. – Simon B Jul 29 '20 at 22:22

1 Answers1

1

Its hard to build cooler CPUs, because the successful use of the gates and flipflops and static_RAM and the data_movement busses are all

OVER DESIGNED

for performance margin.

You may need a 7picoSecond NAND gate. But because of variations in the doping (which with 2020 and earlier CPUs, was already a problem), you design it for 4 picoSeconds +- 1pS (as example). The gate has to charge and discharge the metal_metal capacitance of its output metallization that runs very near other pieces of metal that may be switching in the OPPOSITE DIRECTION, thus our little gate has to do the work of two gates so far as handling parasitics (this is a matter of routing, correctable if the human/tool bothered to detect this timing challenge.).

And the GROUND metal bounced up and down, because OTHER gates are also busy at the same time (or within 10 or 20 picoseconds of the same time) and their need for charge will cause Ground transients.

Ditto for the VDD metal.

This bouncing of GND and VDD requires MORE OVERDESIGN.

Result is a MCU that is reliable for many years, and some tolerance of power voltage, but may be 5:1 over_designed. But the MCU is dependable in its state machine behavior.

But it is overdesigned.

But the state changes are trustable.

But it is overdesigned.

analogsystemsrf
  • 33,703
  • 2
  • 18
  • 46
  • I take it that they have to be intentionally overdesigned just to have good yields that can run for decades with adequate cooling. –  Jul 29 '20 at 23:55