We have a bunch of ARM microcontrollers on test at the moment. The test runs for 360 hours and mostly completes without a hitch, but very occasionally, one of the microcontrollers will hang. We have seen this problem occur twice so far.
Looking at the firmware, there is only one type of place where the code might legitimately hang, and that's where it's waiting for an internal peripheral to complete (e.g. for an EEPROM write to complete, or a byte to finish transmitting on the SPI). There are no other while loops in the code.
There seem to be only two possibilities:
- One of the peripherals is getting stuck and failing to complete, causing the code to get stuck in a while loop.
- The CPU itself has stopped executing code.
Both of these scenarios seem unlikely, but I was wondering if anyone had seen anything similar?
Further to this question: How common is it to receive PCBs with failed or weak vias?, we know that some of the PCBs they're mounted on have potentially weak vias. It's possible that the supply voltage could have been interrupted very briefly during the test.