17

I have a commercially-produced board with an Altera EPF10K30-series FPGA. There is an abnormally high current draw on the board. Where +5V should be present, there is only +2.6V.

When comparing thermal images of a known good, working board to the defective board, it is clear the FPGA is getting much warmer on the defective board than it should. But interestingly enough, the heat bloom is around the entire perimeter of the FPGA. What could be causing something like this to happen? If one or two I/O's were shorted, for example, wouldn't the heat flare be around the pin driver and not the whole chip?

I realize the metal piece in the center of the package could be hiding the fact that the whole chip may be getting hot, but it sure seems to be an even heat all the way around it. I'm used to seeing heat from the center where the die is, not around the entire device.

EDIT 08/30/23

I covered the chip with electrical tape to even out emissivity and now I can see the whole device is heating up nearly evenly (animation added below). Now the mystery to solve is, why is this happening instead of just one hot spot as would normally be the case of a shorted I/O, etc.?

Link to follow-on question (and answer)

FPGA hot around perimeter

Comparison of a working board and overheating FPGA:

Comparison of two identical FPGAs, one working, one overheating

Altera FPGA

FPGA covered with electrical tape heating up over 52 seconds:

Animation of FPGA heating up

colintd
  • 2,018
  • 2
  • 17
jfriend
  • 465
  • 1
  • 10
  • 2
    The I/O pads (blocks) are located around the periphery of the chip. They could all be damaged, though not likely, unless something bad happened to the I/O bank voltage. – SteveSh Aug 29 '23 at 21:46
  • I agree, all I/O’s damaged is highly unlikely. So what else can cause this? – jfriend Aug 29 '23 at 23:24
  • Can you verify the clock input to the device? Those FPGAs, have a fairly linear relationship between Icc (internal) and Fclk, so current draw and power dissipation would be a lot higher if the clock frequncy were higher than anticipated, for some reason. – SteveSh Aug 29 '23 at 23:35
  • 1
    Based on previous experience in a normal MCU you could definitely see the pins with a short with the thermal camera as distinct hot spot, which could be used to find the root cause much quicker. So this is looking quite odd. – Arsenal Aug 30 '23 at 07:55
  • I'd say that your pictures look out of focus (or the resolution is quite low). After you tried painting / taping the FPGA, you could also try and play around with the scale, sometimes something important is hidden by a bad auto scale. – Arsenal Aug 30 '23 at 07:57
  • 11
    With the number of circular marks on the chip, **triple-check that it is mounted correct way around**. Compare the text orientation against datasheet specification. I have had even professional PCBA companies fail this. – jpa Aug 30 '23 at 07:58
  • Have you ruled out something simple like a partial short? – Mast Aug 30 '23 at 13:44
  • 2
    My FLIR is low-res at 320x240 so the images will not be sharp. The board has been operating perfectly for over 20 years, I can confirm the chip is not installed incorrectly! Clock is stable at 10.00MHz, the same as it has been for two decades. – jfriend Aug 30 '23 at 15:26
  • 1
    This would have been good to know at the beginning. – SteveSh Aug 30 '23 at 17:35
  • Could the power supply be damaged? Is it plausible that with only a 2.6V supply, the chip draws *more* current than it would at the 5V it's designed for? Or maybe the chip failed somehow, from some old-age effect? – Peter Cordes Aug 30 '23 at 18:02
  • The P/S is fine. The board is a plug-in card for a larger machine. Swapping out the board with a good one works perfectly; +5V and no heat flare from the FPGA. With the board removed, there is 4.2 ohms between +5V rail and ground on the board. – jfriend Aug 30 '23 at 19:09
  • 2
    As has been suggested (and I fully agree), we're onto a new problem than the original question about the heating around the perimeter. As such, I'll open a new question as to why the FPGA is not working. – jfriend Aug 30 '23 at 19:11
  • 1
    Just a guess (as to the future question) ... I did S/W for some FPGAs at a company, so I saw some of the H/W issues. [I was told that] a good board design has one voltage regulator chip physically near the FPGA (along with the deglitching caps). Maybe the regulator and/or the cap is (partially) failing (i.e. scope/probe the power on both sides of the regulator). In our case, bad/slow voltage regulation caused the chip to reset frequently (or it seemed like the chip's logic was incorrect). When many gates flipped at once, it needed extra power that couldn't be delivered fast enough – Craig Estey Aug 30 '23 at 22:28
  • That's an interesting idea to consider, thanks for the insight. But in my case there is 4.2 ohms between +5V and GND. It seems the chip is internally shorted, it's not a program hangup or continuous reset. But good point, I like that. – jfriend Aug 31 '23 at 13:21
  • The question related to the heating of the chip in general is here: https://electronics.stackexchange.com/questions/679672/entire-fpga-getting-hot-evenly-not-just-a-single-hot-spot – jfriend Aug 31 '23 at 13:56
  • 1
    Confirmed the chip is shorted, upon removal a blowout was found on the bottom. – jfriend Aug 31 '23 at 21:46
  • 1
    Glad you were able to track this down, and thanks for reporting back. The feedback make things much more useful to others with similar problems. – colintd Aug 31 '23 at 23:06
  • 1
    X-ray probably would have shown that. If your company doesn't have that capability in house, there are failure analysis labs that do. – SteveSh Sep 01 '23 at 01:58

1 Answers1

33

The shiny heat spreader/transfer section in the middle of the chip almost certainly has a much lower emissivity than the rest of the body so will show lower temp on the IR camera. If you used thermocouples, I suspect you would see similar temp across the whole chip.

You can also see how the darker lettering shows as higher temperature, confirming the effect of emissivity.

(This is also why copper busbars run cooler after a couple of years use. Initially they are shiny, so less radiative cooling. After a while they oxidize, darken, and their emissivity goes up significantly, reducing their equilibrium temperature.)

colintd
  • 2,018
  • 2
  • 17
  • Try colouring the top of the chip with a black spirit marker, and I suspect you will see the IR image will even out. – colintd Aug 29 '23 at 22:01
  • 1
    I will do this and report back with the results. – jfriend Aug 29 '23 at 23:25
  • 6
    You may find a bit of thin black tape works even better – colintd Aug 29 '23 at 23:28
  • If it does explain the image, please do mark this as an answer to help others. – colintd Aug 29 '23 at 23:38
  • 1
    Tipp-ex has good emissivity and a nice dull appearance to not reflect too. – winny Aug 30 '23 at 09:43
  • 1
    The emissivity was definitely the reason for the "edge warming" appearance. See the animation I added. I'll mark this as an answer to the original question, although I'm now faced with the question as to why the whole chip is heating up and not just a single point on the die. – jfriend Aug 30 '23 at 15:29
  • 3
    @jfriend - Hi, "*I'm now faced with the question as to why the whole chip is heating up and not just a single point on the die.*" That's a new / different question. Please don't continue with that question *here* as it's different from your original question, answered above, about why the FPGA's perimeter appeared to be the only part getting hot. You should include a link back to this question in that new question, for context but without *relying* on people reading this one. Thanks. – SamGibson Aug 30 '23 at 15:31
  • 2
    Agree there should be a new question. Would be useful to know whether any functions of board still work, or if whole thing dead. Also good to know if current draw has gone up despite voltage going down. Also useful to know exact part number, and whether you have verified program integrity? – colintd Aug 30 '23 at 18:48
  • 4
    I will open a new question about the FPGA's lack of functionality. This "perimieter" question has been solved, it was only uneven emissivity. – jfriend Aug 30 '23 at 19:14