What are the failure mechanisms in an integrated circuit?

Question

Context

I've always tried to design my circuits for chips to be well within their absolute maximum ratings. However understanding the failure mechanisms is vital to be able to debug a chip when something does go wrong (usually I end up trying with another chip which costs time).

Question

Hence my question: what are the most common failure mechanisms due to incorrect use (i.e. input/output voltages and currents) of integrated circuits? Details and diagrams are a big plus for this complex question, the ideal answer would link most common families of inputs/outputs (CMOS etc.) and bad uses (reverse voltage, overcurrent...) to failure effects.

Example

This is a general question, but what triggered it is that in order to protect against negative input voltages of an asymmetrically powered device it seems that current limiting resistors are enough (I had parallel schottkies before). Does that mean there is no such thing like voltage failure, every failure is current-related (to some extent)? How exactly?

There are so many interesting failure modes for electrical devices: http://en.wikipedia.org/wiki/Electromigration being just one of them. — HL-SDK, Jun 02 '14 at 15:31

horta · Answer 1 · 2014-06-05T14:05:03.167

Two immediate modes of failure are over-voltage and over-current.

If you have a high impedance input like a gate to a mosfet, then high voltage (even at very low current) will cause a puncture in the capacitive gate of the mosfet as the electrons have enough energy to cause the dielectric to breakdown. Once this occurs, the resistance of the input drops to near nothing and a later low-voltage high-current will further heat up and destroy the mosfet. This mode of failure is why there is ESD protection on many chips.
Over-current causes over heating of the device. Once temperatures get high enough to start changing the structure and/or burning of the internal semiconductors, it will start acting funny, working less efficiently, or completely failing as an open or short.

It's possible that you could think of reverse voltage as another failure mechanism, but generally that still falls under one of the other two categories, it's just different to think about. For instance, if someone reverses a power supply on a circuit with a diode in it, they may expect no current through the diode and instead get an over-current condition because the diode would now be forward biased.

Note that capacitors, resistors and inductors (and any other circuit element) are likely to be damaged in similar ways as transistor ICs, i.e. over-current and/or over-voltage.

Other failure modes of electronics in general may be found here: http://en.wikipedia.org/wiki/Failure_modes_of_electronics

Additionally (and often overlooked) operating outside specified temperature ranges results in undefined performance, possibly failure. — HL-SDK, Jun 02 '14 at 15:29
Thanks for explaining this. I always thought of over-currents as the cause of overtemperature failures (melting the junction) instead of the other way around. So to protect against reverse voltage, is a simple series resistor enough to protect the chip (even if it would not work that way)? — Mister Mystère, Jun 05 '14 at 09:08
@MisterMystère You're more correct in general, over-current causes over-heating. I'll edit my answer to make that more clear. You can externally heat a chip up to cause it to fail, but that's usually not the failure mode. To prevent reverse voltage, a low-foward voltage drop diode can be used. A resistor would likely cause the circuit to act differently than expected. — horta, Jun 05 '14 at 14:03

score 3 · Answer 2 · edited May 18 '16 at 01:48

Short Answer: Temperature is the biggest issue in an otherwise properly designed circuit.

This is a pretty broad question with an entire field of engineering. Some useful references can be found by browsing JEDEC specs (free). JEDEC is a standards body that helps improve quality throughout the semiconductor industry. Pretty much every company follows JEDEC criteria for qualification to prevent latent, factory, or systematic defects from reaching the field.

Back to your question, some primary failure mechanisms of IC's that I have observed or worked with the factory to improve:

MOS Gate Oxide Integrity: Contaminated oxide can alter the VT, Temperature or Voltage can cause a conduction path through the oxide (punch-through). A great deal of attention is usually placed placed here during device qualifications.
- Temperature: This is the #1 acceleration factor in the Arrhenius model that describes semiconductor reliability. If an IC designed to operate at 60C ambient is actually operated at 100C ambient, the lifetime of the device will be shortened dramatically (several years).
- Voltage: ESD, voltage transients, etc can weaken the oxide. ESD is usually specified and well controlled in the factory. VGS transients sometimes need to be considered in the design.
Latchup: Parasitic thyristor triggered by either over-voltage or current injection. Most devices are specified with some level of latchup immunity. Isolation resistors and clamp diodes can help mitigate this depending on the type of latchup.
- Overvoltage: Pulling a voltage node above it's positive supply rail or below it's negative supply rail could trigger a parasitic that damages the device.
  - Your example of the isolation resistor sounds like a latchup event waiting to happen. The clamp diodes might be more prudent, especially if you are going to mass produce the design.
- Current injection: Transient current spikes, especially on IO pins, can trigger latchup as well.
- Temperature: Typically reduces the latchup immunity.
Temperature Cycling: Expansion and contraction due to rapid power up/power down cycles, high current or voltage devices, etc can cause the internal and external device layers/connections to wear.
- Voids/Fill: IC internal metal layers can 'buckle', causing inter layer voids (opens) or fill (shorts), delamination, etc. Obviously, this is the same for PCB's.
Temperature/Pressure/Humidity: Can cause galvanic corrosion, even memory failure when moisture permeates the plastic packaging of most popular IC's. This is usually mitigated through material selection and 'baking' the moisture out of the device.

Thanks for your extensive answer. I don't fully get the latchup though, could you explain more in application to my example (simple anotated schematics would be well welcome)? — Mister Mystère, Jun 05 '14 at 09:13

score 1 · Answer 3 · answered Jun 05 '14 at 15:13

This is kind of a silly question. A complete answer would fill up a book, In the context of a board design the answer is irrelevant (I'll explain later). But it's still a useful question.

Follow the data-sheet. Especially notable areas, if the designer has decided to tell you NOT to ramp the voltage on first power up too fast, don't get creative and decide you know better. Even if it is a hassle to design a power supply that ramps slowly. Most indications given will be subtle, if the data-sheet is strident, then really pay attention.

There are many failure mechanism in ICs but the impact on the chip subsequently requires intimate knowledge of the internal workings. You can have a failed transistor and it may not ever be noticed or it may take out the whole chip. Even the designer may not know, FMEA is performed but usually only in key areas (FMEA = Failure Mode Effects Analysis).

A lot of the constrains on the chip are imposed by the process. The foundry dictates design practises, placement etc. and they have software that checks for violations. Other limitations are operational and these translate directly into Maximum voltages, heat sinking requirements, dV/dt on signals etc.

Because of the increased cost, the complexity of the design and the knock-on effects of getting it wrong (schedule delays etc.) for the most part most chip designs will have most of the bases covered. But there are notable exceptions when the process failed.

So, asking about chip design failure modes with respect to board design is kind of like opening up your car hood, removing a random bolt and asking "will this stop the car from working?" if you remove the oil pan plug - yes. if it one of several redundant bolts on the valve covers - no. Where the real answer is, "don't be removing bolts!".

For some designs, the chip designer will sometimes put "gotcha's" into the design that if the board designer is clever will cause them to pay attention. The best interactions is when the board guy says "why did you put that signal out on that pin over there, it messes up the blah blah etc." these are almost always the start of wonderful conversations and usually a long time interaction and friendship. The response back is usually " very good, how ever, this is the signal flow in the chip, and if you did it that way ...." the response is always a " Eureka".

An IC will not over current unless you have damaged it or are operating it outside of it's specifications. Over-current = bad things already happening, which only makes sense. A chip will only present a certain sized load to a voltage source, this means a fixed current. Saying that over current is what kills chips is like saying a fixed voltage source driving a fixed resistor might have a failure because the resistor might go over current. The over current is a fault/effect it is NOT a cause.

Interesting, thanks. However I know we must follow the datasheet guidelines (if I hadn't I would know it for sure today), I am rather asking what happens generally when the precautions we take go wrong (or if we overlooked one). It is for certain chip-specific, but since I hear that lots of the chips' inputs/outputs are similar maybe it can be explained what happens when wrong voltages/currents are applied (e.g. CMOS input etc.)? I have updated my post to show that. — Mister Mystère, Jun 05 '14 at 16:08

What are the failure mechanisms in an integrated circuit?

3 Answers3