4

I was told that 66b/64b encoding in 10Gb Ethernet (10GBASE-R) requires a one-cycle barrel stage, which adds a necessary one cycle to the theoretical terminal latency.

The Wikipedia page on barrel shifters states that

A barrel shifter is a digital circuit that can shift a data word by a specified number of bits in one clock cycle.

Is "in one clock cycle" a real requirement? Can't barrel shifters be done combinatorially? Why does 66b/64b encoding have a terminal latency of one cycle?

Randomblue
  • 10,953
  • 29
  • 105
  • 178
  • 1
    To squeeze it in without adding a clock cycle, the propagation time through the shifter would have to be subtracted from the setup time margin of an existing path between registers, which is probably not sufficient unless the system is drastically underclocked compared to the possible performance of that stage. – Chris Stratton Jul 31 '14 at 23:53

3 Answers3

3

You don't need an arbitrary lane alignment barrel shifter in a TenG base-r MAC or PCS (TX or RX side). You can add two lane alignment positions in the TX PCS as an optimisation if want to use a running IPG that can add the next packet on a 4-lane boundary rather than 8-lanes, and you have a MAC that can emit with the half alignment. But that's only a layer of 2-input muxing and saves you 33U in latency, only for back to back packets.

On the RX side there's a shifter of sorts in the RX gearboxing, but that is usually hardware in the PMA (below the 64/66) so you generally don't need to worry about it. The lowest (serially clocked) part usually just skips a bit when looking for block alignment, rather than do muxing. The higher conversions 32:66 or 40:66 toward the block-lock side are involved shifters, but again they usually come in the hard PCS.

For 64/66 encoding, each of the 64 data bits appears in only three possible bit positions in the output, so this packs really nicely into a single 6-LUT. As ever, if timing permits you can merge the shift with the surrounding logic, limited only by your imagination and the part.

shuckc
  • 3,012
  • 14
  • 19
2

You are dealing with synchronous logic, which is FF -> logic cloud -> FF -> logic cloud ad nauseam. the relatching/catching of the new state and presentation for the next clock cycle is what is taking up the clock cycle. And the Muxing is likely to be done combinatorially.

Spehro Pefhany
  • 376,485
  • 21
  • 320
  • 842
placeholder
  • 29,982
  • 10
  • 63
  • 110
2

The fact that it takes one clock cycle means that it's done combinatorially, i.e. there are no memory blocks inside. The point is that in a digital system the clock is the time base, hence any time interval lasts an integer multiple of clock cycles. Since anything can't be instantaneous the smallest time interval is one clock cycle, so a purely combinational module will need (if it's well built) at least one clock cycle. That's a sort of convention anyway.

You can see it from another point of view though: your combinational logic will have some inputs and some outputs, and since your is a digital clocked system you will have a register on the input and one on the output. If you connect more than one combinational modules in cascade you can see them as a big combinational module WLOG. When you put in the input register a valid input you expect a valid output on the output register, but it will be available only after at least one clock cycle.

About your ethernet question, if you now think about it I'm sure you can understand why one cycle must be added: in a digital synchronous (clocked) system everything happens on one of the clock edges, so when the input is valid you must wait the next edge before consuming the output to be sure that the combinational logic did its work.

Vladimir Cravero
  • 16,007
  • 2
  • 38
  • 71