Why is length matching important for high-speed signals?

Question

Broadly speaking, I understand that PCB trace length matching is important from signal timing and signal integrity point of view, but I want to know some more specifics about these two parameters and if there are any other considerations and parameters that make length matching important for high-speed signals in particular.

Any comments are highly appreciated. Thank you.

EDIT: Sorry if my context was not clear. By high-speed I of course mean high frequency signals, like for example DDR4, HDMI, USB 2.0/3.0, etc.

Its high frequency (short wavelength) rather than high speed. Once the wavelength of the signal (including harmonics) is *in the order of* the length of trace you need to consider it as a transmission line rather than a conductor. — JIm Dearden, Aug 05 '17 at 09:09
@Jim Dearden: I think it's obvious that in this context "high-speed" means bits/s and therefore high bandwidth and not distance per time (m/s). Cf. term "high speed logic" — Curd, Aug 05 '17 at 09:27
@Curd Simply trying to avoid/clarify the terminology. In this case the term 'high speed' is meaningless as the signal moves at about 0.8c regardless. — JIm Dearden, Aug 05 '17 at 10:11
@Jim Dearden as incorrect as this may be, most of the market speaks of 'High Speed Board Design', and not of 'High BW' or of any others of would-be better named alternatives. — Claudio Avi Chami, Aug 05 '17 at 14:12

score 8 · Answer 1 · answered Aug 05 '17 at 17:52

Synchronous buses:

Consider DDR, PCI, etc. There are various signals, plus a clock.

You'd like the signal that says "this data is valid" to arrive when the data is actually valid. Also all the bits should arrive at the same time. And everything should be properly aligned relative to the clock.

Source Synchronous (HDMI)

This is similar, as you have a few data lines synchronized to a clock. (see below).

Asynchronous (SATA)

In this case, there is no clock, so the length of the cable does not matter. However you still have a:

Differential Pair

Both halves of a differential pair radiate and pick up the exact same amount of noise than a single wire or PCB trace would. Since both halves carry exactly opposite signals, the radiated fields cancel, and as a result the overall radiated noise is very low. Also, the receiver can reject the common mode noise that is picked up by both halves of the pair.

However, if the lengths of both halves are unmatched, this no longer works. Noise which is picked up will appear first on one half of the pair, then on the other. It is no longer common mode, so it cannot be rejected. Also, if one line is longer than the other, the signals in each half will no longer be opposite, one will be delayed. So the emitted EM fields will no longer cancel, and it will radiate noise.

The Clock

The clock is the most important. Usually, the device will latch in the incoming data on a clock edge:

Now, data changes from "S0" to the next bit "S1" then "S2", etc. In this example, data is latched in on the clock rising edge. So you want the clock rising edge to be right in the middle of each bit. This ensures maximum robustness to variations in timing towards one side of the other. (Depending on setup/hold times, the optimum may shift a little bit).

Both RAM and CPU can send signals on the same data bus. Back in the old days, everything would use one single clock. This simple scheme has one drawback: roundtrip times. Consider this:

Clock goes low
Clock propagates from clock generator to RAM chip
RAM receives clock edge, then outputs a bit (this also takes some time)
Bit propagates from RAM chip to CPU

In this scenario, the clock used by the CPU to read the data sent by the RAM is the main clock... so, the propagation time from the clock generator to the CPU should match the propagation time of the whole sequence above, includes the RAM chip's response time. This is a mess, and the reason why busses like PCI were limited to rather slow frequencies.

If you want high-throughput, then the clock has to be generated at the source of data, and sent along with the data, on length matched traces. This eliminates the need to synchronize everything to a distant clock chip.

This is why SDRAM (SDR, DDR, etc) uses two clocks. The CPU sends its clock and data. When the RAM replies, it also sends its own clock along with the data. Both clocks have the same frequency, but different delays. Usually the slave chip needs a PLL to regenerate the clock while adjusting its phase relative to main clock, in order to compensate for variations in propagation times due to different chips, boards, temperature, aging, etc.

Every modern high-throughput link uses one of two schemes:

Embed clock into data (SATA, PCI-Express, USB, Ethernet, etc). In this case, different parallel lanes may not need to be length-matched.
Both sides transmit in source-synchronous mode (data+clock) to allow a clock faster than the roundtrip delay (SDRAM,DDR,etc).

Matching trace lengths is only part of the story. Drivers and receivers delays do vary with temperature, and chip to chip. This is called "skew". For example a 8-bit parallel buffer will specify a skew, which is the difference in propagation times between each individual buffer in the chip.

Thanks a lot for the detailed explanation. Really appreciate your time and help. I got the information I was looking for. I wish I could mark two answers! — LoveEnigma, Aug 06 '17 at 07:49

Claudio Avi Chami · Accepted Answer · 2017-08-05T14:10:29.480

Let's take DDR4. In this case, length matching is done for the data lines and DQS lines within a group. The reason for length matching in this case is because of TIMING. Data and DQS lines with similar length will undergo similar propagation delay on the PCB trace.

Let's take another case, a differential line. The reason for having length matching between the positive and negative traces of the differential line is for the electromagnetic wave travelling the differential line to arrive at the same time on the positive and negative traces. If these lines are not paired, the differential line behaves less as such and you start losing its advantages (common mode noise rejection including EMI, generated RFI).

For each standard bus you mentioned, it is not uncommon to find very detailed manufacturer routing guidelines, including the reason for those guidelines, like this one from TI for HDMI https://e2e.ti.com/cfs-file/__key/telligent-evolution-components-attachments/00-138-01-00-00-10-65-80/Texas-Instruments-HDMI-Design-Guide.pdf

Thank you for the explanation. To clarify the case of DDR4: DQ lines are length matched with DQS because data is referenced to DQS edges, and ADDR lines are matched to CLK and CMD lines because address and commands are referenced to clock edges. Is my understanding correct? — LoveEnigma, Aug 05 '17 at 16:05
As a note on DDR with differential DQS (which are part of the DQ group interestingly); the voltage at which they cross is *very tight and critical for proper operation* — Peter Smith, Aug 05 '17 at 16:11

Why is length matching important for high-speed signals?

2 Answers2