DS1307 : Long term effects, time gets corrupted

Question

I have designed a PCB which has ATMega328P and one DS1307, along with other sensors on the I2C line. The design is as follows :

Y2 is 32.768KHz crystal, and BT1 is a standard 2032 3v3 coin cell. I am able to easily program the time and the RTC counts well. However, over a period of time, some of my chips start showing 2001/01/01 00:00:00 or 2150/12/22 00:00:00. Some of the chips show time which is consistently 2 hours behind the current time, even though they were programmed to show the current time.

These kind of effects only happen after a long time, easily ~ 3-4 months of operation.

Most of the times, reprogramming the chip works. Is this a voltage issue or something else?

Thanks.

(a) *Full* schematic required - info might be missing from snippet. (b) "*These kind of effects only happen after [...] ~ 3-4 months*". Do you *know* that no problem happens until at least 3 months after the time is set, because you check every day? Or is it possible that problems start sooner, but you only notice after 3 months? (c) Are the devices running *only* from battery during those 3 months, or is "main power" being switched on/off during that time period too? (d) What troubleshooting/tests have you done so far, with what results? (e) Has this design ever *not* had this problem? — SamGibson, Mar 09 '18 at 15:37
"**Most** of the times, reprogramming the chip works." - so sometimes, reprogramming the chip _doesn't_ work? Why not? — Bruce Abbott, Mar 09 '18 at 15:48
@PlasmaHH - "*200uA standby current*" I understand why people could be worried by that figure :-) but according to the [Maxim DS1307 datasheet](https://datasheets.maximintegrated.com/en/ds/DS1307.pdf), that maximum figure of \$\small I_{CCS}\$ only applies when running from the "main power" 5V \$\small V_{CC}\$. The current when running on battery is *much* lower - typical 300nA \$\small I_{BAT1}\$ is mentioned on the datasheet, although an even lower figure (~190nA) is shown in the related "Typical Operating Characteristics, \$\small I_{BAT}\;vs.\;V_{BAT}\$" graph on page 5 with SQW off. — SamGibson, Mar 09 '18 at 16:55
@SamGibson I'm currently on mobile. I'll make the schematic available once I'm on my computer. b) So 3-4 months is an estimate. I regularly see the data and can confirm that it doesn't happen before 3-4 months. c) It is a mix of both. As an estimate, I can say that the chip is on (5V) for roughly 40-50% of the time. Remaining time, it is running on the 3.3V battery. d) Regarding tests : I've just checked communication lines and power measurements (although the values are instantaneous and not over a period of 3-4 months) e) Yes, 90% of my setups do not face this issue, while 10% do. — c517381, Mar 10 '18 at 05:07
@BruceAbbott - Yeah. I found that very weird. I then have to replace the chip and the crystal. Then it definitely works. Sometimes I try and replace only the crystal - and it works. Sometimes I also have to replace the chip. — c517381, Mar 10 '18 at 05:09
If the chip cannot be reprogrammed then perhaps it has been damaged somehow (over-voltage?). Have you tried testing the 'bad' chips and/or crystals on another board? — Bruce Abbott, Mar 10 '18 at 06:14
Is it genuine DS1307 from reputable distributor? Or it is some rip-off part from china (ebay, aliexpress)? — Chupacabras, Mar 10 '18 at 09:22
@chupacabras - Yes, the supplier is sure that the source of the chip is genuine and not a rip off. However, I've asked for more details about the supplier. Apart from checking the name of the supplier on the manufacturer's website, what other steps do you think I should take, to ascertain whether the source of the chip is genuine or not? — c517381, Mar 14 '18 at 12:10

score 1 · Answer 1 · edited Jun 11 '20 at 15:10

Hypothesis

Power supply spikes to the RTC, e.g. during main power on/off, causing data corruption and hardware damage.

Analysis of information so far

From the information so far:

90% of my setups do not face this issue, while 10% do.

Therefore there must be 1 or more differences between the group without the problem ("good") and the group with the problem ("bad"). Look for those difference(s), since analysing those may lead to finding the underlying cause(s).

In addition to the usual suspects of differences (e.g. real differences in the hardware between the good & bad groups, perhaps components bought from different sources†) the difference(s) between the good & bad groups might be in how the devices are used - for example noise on the incoming power supply rail(s) depending on the specific PSU, length of power supply cabling (i.e. added inductance) connected to your devices, how often the devices are switched between main and battery power etc.

(† As Chupacabras kindly mentioned in a comment, some hardware suppliers are less trustworthy than others.)

As an estimate, I can say that the chip is on (5V) for roughly 40-50% of the time. Remaining time, it is running on the 3.3V battery.

I interpret this as meaning that, during the 3-4 months before problems might be seen, there is some switching of the main 5V power. That is critical information, as it leads to a hypothesis about a possible cause which RTC devices are particularly prone to suffer from - see below.

Your answer to Bruce Abbott's question is also important:

"Most of the times, reprogramming the chip works." - so sometimes, reprogramming the chip doesn't work? Why not?

I found that very weird. I then have to replace the chip and the crystal. Then it definitely works. Sometimes I try and replace only the crystal - and it works. Sometimes I also have to replace the chip.

That strongly implies hardware damage has occurred.

In my experience of RTCs, the main cause of (a) corrupt time (not just clock running fast or slow, which can have different causes) and (b) hardware damage, are spikes in the power supply to the RTC during "main power" on/off.

Notice how a typical RTC datasheet (including the one for the DS1307) has a line similar to this:

WARNING: Negative undershoots below -0.3V while the part is in battery-backed mode may cause loss of data.

The consequences can be worse than just a loss of RTC data. Latch-up may occur, potentially resulting in internal damage.

For more background on this topic, see this application note AN1549 from Intersil (now Renesas):

Addressing Power Issues in Real Time Clock Applications

and this application note AN504 from Maxim:

Design Considerations for Maxim Real-Time Clocks

(especially section "Data Loss/Data Corruption" on page 10)

and this previous question, where power-supply spikes when turning the main 5V supply off & on, affected the RTC part of an MCU - look at the oscilloscope traces added at the end of the question:

STM32F091 VBat pin sinking a lot of mA's

Fix to be considered and further tests

I could be wrong (until we see the full schematic), but I'm assuming that you don't have any other relevant components near the RTC, which are not shown in the snippet of the schematic in the question currently.

Therefore, as recommended in Intersil, Maxim & ST documentation, a suitable reverse-biased Schottky diode (e.g. BAT54) across the power pins of the RTC, to prevent any negative excursions below -0.3V may be required.

I suggest doing some experiments with a scope connected directly across the DS1307 V_CC on one of your boards (absolute minimum ground wire length on your scope probe - preferably a ground spring - to minimise the added inductance affecting readings) while power-cycling the "main power" to the board. Use the same power supply source and power supply wiring as some of the "bad" (i.e. "affected/damaged") devices you have seen so far, to improve your chance of reproducing problems. What "spikes" on the DS1307 V_CC (main power) do you see on the scope traces?

Another suggestion: It may be helpful if you were able to devise a test, which could trigger the problem more quickly than the current 3-4 months, as that would allow you to make changes and test whether they resolve the problem. As you see above, my hypothesis is that power-cycling the main power might be that trigger (with a power supply that was used on a device which eventually went "bad" etc.). That power cycling could be automated with a suitable test rig, which could also read the RTC values and detect if a problem has occurred (you have to define the failure criteria of course e.g. wrong year, or time not incrementing etc.).

Thank you for taking the time out to respond to this question. I'll be doing the voltage tests and will post the results. I need to wait for a couple of ones to go bad. I'll also be integrating the Schottky diode in my design (and update accordingly - this may take a couple of months). Other than the VCC ripples, I do not see any other significant issue. I should mention that the crystal I'm using is a Mercury T-38 32.768KHz crystal. — c517381, Mar 14 '18 at 12:42
@c517381 - Thanks. (a) "*Other than the VCC ripples [...]*" It sounds like you are talking about VCC ripples that you already know about, but I didn't see any mention of that so far. Are you referring to my hypothesis (of VCC *spikes* during power on/off) or do you have evidence of VCC *ripples* from your own testing? If the latter, please add all information you have about VCC ripple & other VCC characteristics that you know so far. (b) Please add the full schematic when you can. (c) "*Mercury T-38 32.768KHz crystal*" The [specs](http://www.wdi.ag/specs/mercury/quarze/T38.pdf) look OK. — SamGibson, Mar 15 '18 at 21:24

DS1307 : Long term effects, time gets corrupted

1 Answers1

Hypothesis

Analysis of information so far

Fix to be considered and further tests