1

I am using custom built arduino uno for a home automation project. Sometimes, the arduino locks up due to electrical noise or sparks in the ac lines. More details here.

I tried conducting some controlled tests to know more about this where I left the module powered up and tried my best to introduce as much noise as possible by turning on/off switches and turning fan regulator knobs. I also tried to zap the circuit using a gas lighter while the circuit was on. I made a setup which allowed me to make sparks jump on any part of the pcb that I wanted. I did this because I couldn't afford a esd simulating gun.

After all this, I was able to make my microcontroller hang up after torturing it for few minutes and I could repeat the hang up process. Success till this part.

Once my micro controller freezes, sometimes I am able to bring it back to normal process by pulling the reset pin LOW but mostly I can't. It's as if reset pin is not working at all. However I am able to bring it back to normal operation by power cycling it.

(Reset pin works fine otherwise when normal code execution is going on. Reset pin has a pullup and protection diode as recommended by atmega datasheet. Decoupling taken care of. Filter caps very close to power pins. Reset pin going LOW has been verified by an oscilloscope as well.)

Is power cycle reset different as compared to reset pin reset? Is this an expected behavior? Can I do something to ensure my reset pin always works in such conditions?

Whiskeyjack
  • 7,946
  • 8
  • 53
  • 91
  • 1
    The difference between reset and power cycling on an arduino depends very much on how the Atmel reset pin is treated by the Arduino Uno circuit, how the Atmel is configured and what the arduino boot-loader is programmed to do before jumping to the sketch. For most people there is no difference. For some it helps to know a reset is very much like an interrupt where (certain) Atmel registers are cleared/set. Chances are a reset like activity or a reset is part of a normal Arduino power on cycle. – st2000 May 17 '16 at 12:17
  • There are sparks coming out of your ac power supply...? You mean transients, I hope. – Lundin May 18 '16 at 06:22
  • Anyway, to design a PCB which can handle "sparks anywhere on the PCB" is quite a challenge. You wouldn't exactly pick a hobbyist microcontroller for such tough requirements. Normal ESD tests involve sparks on metal cases, shields and connectors. Counter-measures are good EMC design and a resistor in series from each connector pin. Needless to say, the MCU needs a pull-resistor on the reset pin, and also a cap, with a value as recommended by the manufacturer. – Lundin May 18 '16 at 06:42
  • @Lundin - Sparks were not coming from ac power supply but I thought it could be occurring inside appliances like refrigerators and fan regulators where there is a possibility of contact arcing. Most likely it was transients. However I wanted to replicate the hang-up on table and this seemed to be the only option which could force the mcu into a locked state. As pointed out by others, this might not be the same thing as what was causing lockups and hence it's difficult to conclude anything from this. – Whiskeyjack May 18 '16 at 06:53

4 Answers4

7

Is power cycle reset different as compared to reset pin reset?

Yes. There are some semiconductor behaviours where only loss of power will allow previous behaviour to resume. An SCR is a type of component where this behaviour (i.e. only loss of power allows previous state to resume) is normal, and SCR-like structures exist within most ICs.

Is this an expected behavior?

Depends what you mean by "this". Is it expected behaviour that you can provoke an extreme situation using unknown voltage ESD discharges on a live circuit, where only a power-cycle will recover? Yes

Can I do something to ensure my reset pin always works in such conditions?

Always, for the conditions of that test? Realistically, no - without extreme measures.

Based on your description, I suspect your ESD injection is triggering latch-up, sometimes called SCR latch-up (or a latch-up like behaviour) inside the MCU. Texas Instruments have a nice app note called Latch-Up, ESD, and Other Phenomena which contains information very relevant to your questions.

I was going to reply to your earlier question which you linked, but too many details were unclear to me, some decisions were based on assumptions that I disagreed with, and experience tells me it's unlikely I can add much in that situation. I also don't believe that your current test is realistic, in the context of your problems described in the previous question.

After all this, I was able to make my microcontroller hang up after torturing it for few minutes and I could repeat the hang up process. Success till this part.

I respectfully disagree that this is "success". What you have done is to provoke a similar symptom to your earlier question (i.e. MCU locks-up). But there are several possible root causes for that symptom - how do you know that you are triggering the same cause with the ESD gun, as with your device installed in its planned location? There was no evidence in your earlier question of direct ESD sparks onto the live PCB (which is an extreme test) - yet that is what you are now testing.

Therefore I see your current ESD test results as a bit like an XY-problem. You have the original problem with MCU lock-up. You weren't yet able to diagnose that. Now you have now found another way to trigger MCU lock-up (using direct ESD sparks). But that does not mean that direct ESD sparks are the cause of your original problem (therefore I wouldn't call ESD triggering an MCU lock-up a "success") - nor does it mean that any fixes designed to help with ESD protection would help with your original problem either.

I realise this sounds negative, sorry about that. I'm just suggesting that, based on many years experience of troubleshooting complex systems, you are quite likely to be "going down a rat hole" by following your current testing, as it may not help with your original problem.

Personally, if I was in your situation, I would be focussing on better diagnosing the original problem, rather than introducing a new (and, I believe unrealistic) test case. Best of luck!

SamGibson
  • 17,231
  • 5
  • 37
  • 58
  • 2
    To add on an off topic personal experience on just how bad an ESD latchup issue can be: I once interned at a company that produced power line switches. It needed to pass a 12Kv esd injection at a connector. Occasionally, we would get incredible behavior, like fake serial data etc. Once, we even got the independent watchdog timer and supervisory circuit to lock up. Even watchdogs aren't foolproof in esd situations. In the end, after running >2000 tests, we gave up. – 0xDBFB7 May 17 '16 at 15:26
  • Thanks a lot for this beautiful answer. I agree that it was a bit extreme. I believed noise to be the root cause of my issues and thought esd sparks are the best form of noise. However regarding the diagnosis, what should I do? Do I install and wait for a week (which might extend up to a month) for it to freeze? The problem with this is, I won't be able to know what was the exact issue. I have already tried WDT and esd suppression devices. Like @DC177E mentioned, I have experienced the same. I am just wondering how others do it. Are there any pre-defined methods? – Whiskeyjack May 17 '16 at 15:34
  • Better ESD and transmitted-EMI protection is really the only route. Metal cases, ferrite beads, snubbers, TVS diodes etc. – pjc50 May 17 '16 at 15:47
  • 1
    Again off topic but @pjc50 The device we were testing had: A direct route to the ground plane with a thick cable, a metal shield over most circuitry, a 0.25 in thick steel case, a ferrite around the shielded connector, a TVS on every data line, and a tuned inductor-capacitor setup on every data line to attenuate some of the hf. We still got lockups every few discharges. – 0xDBFB7 May 17 '16 at 16:45
  • 1
    @Whiskeyjack - "I believed noise to be the root cause" - [of your original problem] - quite likely, but noise != ESD sparks. || "Are there any pre-defined methods?" - For troubleshooting? Or noise suppression/tolerance? Yes to both, but there are whole books on each subject, so too much to fit here. || "regarding the diagnosis, what should I do?" - Again, too much to fit into a comment, and there are many unknowns about your constraints. However I see gaps [previous questions] in the process of reproducing the problem *under controlled conditions* and correlating failure timing with events. – SamGibson May 18 '16 at 00:43
  • Thanks Sam. You gave me some real good info to start off. I'll try to enhance the circuit as much as I could using the theory available. This thing seems like an ever-going learning and doing process. – Whiskeyjack May 18 '16 at 05:03
1

As an Arduino user I suspect you're way off in left field. The Arduino is remarkably INsensitive to transients. And if you're concerned about transients coming through the power supply anyway, you could put a surge suppressor (or a UPS) on the power input. If you're really really concerned about transients coming through the air, you should simply shield the Arduino (maybe with a foil roasting pan).

From my experience what's far more likely though is that occasionally all your outputs happen at the same time, and the total current draw is too much and so it causes the regulated voltage to sag. Dropping to 4V rather than 5V for just a few milliseconds is almost guaranteed to hang the Arduino Uno. One way to fix this is look carefully at all your outputs and see if any of them either draw a lot of current or have an inductive lag, and rearrange the offenders. Another way to fix it is to add a little complexity to your software by making a queue of "desired" changes, then making only one of these changes at a time every nth call to loop(). That way you'll never have too much happening at the same time.

Even though I suspect your question is completely irrelevant to your hangs, here's the answer: The Arduino Uno board is arranged to do the opposite of what most chips do (including the Arduino chip) -- with the Arduino Uno a power cycle won't do a whole lot-in fact it will leave the program intact. That's how you move an Arduino from your develop/test bench to the place it will be deployed. A reset on the other hand will completely clear everything and wait for a fresh download. If things are screwed up enough that the tiny Arduino bootloader program on the board doesn't work right, then the reset will be handled by the chip alone instead, and the behavior you see might result.

Exactly what will happen after your sort of hang is ill-defined. Rather than trying to fix the symptom, fix the cause so you never get into this situation in the first place.

If you're extremely unlucky and can't figure out the source of the hangs, you may need to add some debug trace code or some logging to your program. You might for example log every entry into any function. Then when it hangs, just see what the last trace message was, and you know the problem is somewhere in that subroutine (probably most especially if an interrupt happened at the same time).

(Oh, and if there's more than one power supply [including one that isn't obvious because it's part of a remote sensor, and the one inside the Arduino Uno board], be sure to connect all the negative/grounds together. Otherwise one power supply may "float", and for example provide analog inputs that are way out of range.)

(Also pay attention if the compiler warns about "unstable" operation because too much data space is used. On an Arduino there is no protection against stack overflow. You may have enough free space for all normal operation, but if an interrupt happens exactly when execution is very deep and has a very large stack, you can overflow. Data will be written into space. That's not a huge problem until you try to read it back and it isn't there. Usually the function you're in will somehow keep running, but when it finishes and "returns", the return address also can't be read and so will be random, and in essence your code will "goto" some random location. This often results in a hang.)

0

Resetting or power cycling a processor should not be a normal activity. An apparently locked processor is a good indication that a software or hardware problem exists.

Arduinos are not the ideal debugging platform, but look through your code anding debug print message to determine code activity during normal execution and during lockup.

Check you power supply for possible improvements. Run your tests while temporarily running the Arduino on batteries.

Check if you are over driving inputs. Digital circuits can lock up if the voltage presented on any pin is higher then the supply voltage to the chip.

Finally, there is a common feature called a watch-dog that, as a last-resort / safe-guard, resets the processor. It is usually an external (to the processor) physical timer which. However some processor have it built in. Design with the understanding a watch-dog is a safe-guard and not something used during normal operation.

Here is a discussion about an Arduino software implemented watch-dog. I have not studied it. If it involves a physical watch-dog inside the Atmel it will probably be of use. If it is only a software implementation it may be of use.

Added later...

I found this web page about detecting Arduino lockup. A watch dog is discussed.

st2000
  • 3,228
  • 9
  • 12
  • 1
    This would be applicable to normal situations, however the mentioned piezo igniter tests are unlikely to be survived by an ordinary Arduino or any other circuit not specifically armored against them at board level. One should not expect such to keep operating, or necessarily to ever operate again afterwards. – Chris Stratton May 17 '16 at 12:30
  • WDT didn't work. Only luck I had was the reset pin which seems to be fading away. @ChrisStratton - Thanks for comment. Yes, I agree to your point. I won't be doing that. However I am still puzzled what might be going inside the mcu that prevents it from resetting even when the reset pin is pulled low. – Whiskeyjack May 17 '16 at 14:07
0

If the silicon is designed for it and all the logic on the board is designed for it and the board is designed for it then you should be able to just reset. An avr is definitely not in that class of parts as it specifically has logic that only operates when in reset. so reset is not a solution for an avr being confused by esd. you need to reset the power control and have it cycle power on the avr, long enough to put it back in a known state.

old_timer
  • 8,203
  • 24
  • 33
  • 1
    the in circuit programming interface on avr's is active when the chip is in reset. which means there is logic that has a separate power on reset and only uses the reset pin as an input (and/or the reset pin in the asserted state is that logics deasserted reset). – old_timer May 17 '16 at 15:05