5

We have an SBC board, in the style of the Leopardboard or Beagleboard, that is misbehaving. It's based on the Leopardboard design (TI-DM368 CPU, DDR2 RAM, NAND Flash).

Developing software on the leopard all works well. However, the 1st prototype board turned up and refused to boot. Investigation lead us to the point where slowing the clocks (ARM & DDR) down means the board will boot.

The hardware (either board layout, termination, DDR chip, whatever) is the #1 suspect, as we can run identical software on the Leopard and it works fine. Unfortunately, the nature of the fault means we can't boot into linux to run strenuous RAM tests that may provide better diagnostics.

From the hardware side, the DDR clock is one of 345MHz, 486MHz, 680MHz depending on the clocking settings - beyond the scope of any of our scopes or logic analysers.

So - it's sort of two questions in one really:

From the hardware point of view, other than "rent a faster scope", is there an approach to diagnosing this with what's to hand? We have 200MHz DSO's, <100MHz logic sniffer, and 1.5GHz spectrum analyser to play with.

From the software point of view (I know, wrong forum) if anyone has tips or code snippets on exercising DDR RAM, they'd be greatly appreciated.

Edited to add the answer:

Well we borrowed a Tektronix 7104 and it worked so well we didn't even have to touch the board with it ;)

The problem revealed itself to be a sagging 1v3 power supply line being strangled by 0402-size SMT ferrites.

The symptoms were that, near the marginal operating frequency, the board would boot but lock when a memory-hungry / high-bandwidth video streaming process tried to start. This, coupled with the fact that running slower made it work OK, led us to believe it was a frequency-related issue when in fact the lower clock frequency was also putting less load on the power supply components.

The 0402 ferrite beads used for filtering were going surprisingly high impedance as the current went up, dropping a critical supply line below the allowable operating point.

Unfortunately this means I can't give the "winning answer" to Dave Tweed, but it does mean our board now works. Even better, it's the boss's fault not mine!

Oh and Tek 7104's are freakin' awesome feats of electronic engineering. If you've never looked at how they work, it's pure analogue kung-fu.

John U
  • 7,041
  • 2
  • 21
  • 34
  • This might be absurd because I don't know much about DDR memory or the platform in question, but would there be the possibility of say providing an external 20-50MHz odd clock and/or bypassing the PLL so you can take comparitive measurements of rise times etc between boards using your current equipment? – PeterJ Sep 20 '13 at 10:16
  • TBH I don't know enough about DDR either! I don't think I could clock the DDR independently of the CPU-provided clock as it would throw the timings out of the window, but I will look into it as a possibility. We can slow the DDR to 345MHz, which means a 172.5MHz nominal rate (because DDR uses marketing specmanship for specs) to bring it juuust inside the scope's happy place, but only marginally. – John U Sep 20 '13 at 10:20
  • Just to be clear I was thinking of reducing the whole CPU clock rate so the DDR clock was reduced as well just as a diagnostic. I'm not sure if that would mess with the DDR refresh rates though, plus the PLL might not lock depending on the design but that might not be a problem with an external clock (if it allows for it). – PeterJ Sep 20 '13 at 10:27

2 Answers2

5

This is more of an "extended comment" than an answer, but let me start by saying no, I don't think you can debug this issue with such a limited set of test equipment. A person who has had a lot of experience doing these designs might be able to get some clues about what's going wrong using them, but I get the impression that you're not such a person.

For example, issues with risetime and ringing don't scale with clock frequency. If you can't see them at the high clock rate, you won't see them at the low clock rate, either.

The degree of success at this sort of thing depends on just how closely you duplicated the reference design — not just the schematic, but also the physical design, such as relative parts placement, PCB stackup, the routing and length-matching of traces, etc. Unless you know exactly what you're doing, you must match every detail of the reference design.

The fact that it does run at a lower clock rate suggests that you have issues with timing skews, possibly due to mismatched trace lengths, but also possibly due to mismatched terminations. You could verify this by renting the high-speed scope, but your time might be better spent getting started on a respin of the board right away.

Also, it's foolish to take a brand new board design and expect to boot a full-blown OS and application on it right away. You should always plan to develop (or find) some basic functional tests on individual functional units such as memory and communications interfaces.

Dave Tweed
  • 168,369
  • 17
  • 228
  • 393
  • Thanks Dave. You're right that I'm not a board layout expert, but the guy that laid the board out really is, and followed the very tight design rules laid out by TI. So we now find ourselves with a board that the rules say _should_ work, and really wanting to identify **why** it doesn't before we invest a pile of time and money in another spin. – John U Sep 23 '13 at 07:59
0

Added full detail to post, but it were the volts not the hertz wot dun it!

John U
  • 7,041
  • 2
  • 21
  • 34