2

I am a bit stuck with the concept of carry-lookahead adder so I'd like to compare it with another concept I'm more familiar with: the ripple-carry adder. I'm trying to make some basic math comparison between the two ones.

Assuming every gate can accept at most 2 (two) inputs and has a delay of '1', and that, at least for the moment, we're adding two 4-bit numbers.

Let's start with the ripple-carry adder. I've built it putting four full-adder in column. Every carry-out is computed as

$$c_i^{(\text{out})} = a_i \cdot b_i + \left[ a_i + b_i \right] \cdot c_i^{(\text{in})} \qquad \text{where} \; \cdot \equiv \text{AND},\, + \equiv \text{OR}$$

Since

$$\left[ a_i + b_i \right]$$

is computed in parallel for every full-adder, the cost for computing the k-th carry-out is

$$1 + 2k$$

right?

Now say we want to compute carry's at first, then the sum -- bit by bit. Using the formula above, (and omitting the superscript) we write that

$$c_{i+1} = g_i + p_i c_i$$

Hence, we can write recursively the fourth term as a function of the first one, i.e.

$$c_1 = g_0 + p_0 c_0 \\ c_2 = g_1 + p_1 g_0 + p_1 p_0 c_0 \\ \vdots \\ c_4 = g_3 + p_3 g_2 + p_3 p_2 g_1 + p_3 p_2 p_1 g_0 + p_3 p_2 p_1 p_0 c_0$$

Of course it is the last term that costs more. How much? I'd say

$$\lceil \log_2{4} \rceil + \lceil \log_2{4} \rceil $$

where the first term is needed for computing every AND operations, while the second one for computing every OR operation.

Hence for an overview ...

$$1 + \lceil \log_2{k} \rceil + 2$$

where the first term is the gate delay needed for computing every propagator/generator term, the second term is the one needed for computing every carry_in, then the last term is the gate delay of a single full-adder.

That is, the comparison of "time" between a ripple-carry adder and a carry-lookahead's one is the comparison of the following two functions

$$1 + 2k$$ $$3 + \lceil \log_2{k} \rceil$$

I know that carry-lookahead can be improved more, and I've done that by myself...though I'd like you give me some feedback about the stuff above for now.

  • I think your assumption that each port can take 2 inputs does not correspond with reality: a 4 input NAND port would have 2 port delays under your assumption, but in reality it has just 1 port delay, just like a 2 input port. – Wouter van Ooijen Dec 15 '13 at 20:40
  • @WoutervanOoijen uhm yes!, I would've implemented a 4 input NAND `!(abcd)` with two NAND's in parallel then a OR gate `!(ab) + !(cd)`, resulting in a 2 port delay. Where am I wrong? – Giuseppe Crinò Dec 15 '13 at 21:32
  • 1
    A 4 input NAND is just that: a 4 input NAND, with delay 1. That's (roughly) how the hardware works. – Wouter van Ooijen Dec 15 '13 at 21:48
  • @WoutervanOoijen Hence, `a + b + c + d + e + f` can be implemented simply with a 6 input OR, with delay 1? I mean: I should not care about the number of inputs of the gates but instead about the number of gates I have to pass through before the output become stable, right? – Giuseppe Crinò Dec 15 '13 at 22:00
  • 1
    The delay through a logic gate as a function of the number of inputs depends greatly on how you are implementing the gate. For an FPGA, logic functions might be implemented using a 4 or 5-input MUX so anything with up to 5 inputs would have the same delay. In an ASIC, the gate delay tends to increase with the number of inputs (all else being equal). – Joe Hass Dec 15 '13 at 23:29
  • @Joe could you quantify "increase with the number of gates", for how many inputs would equal the delay be equal to two (2-input) gates in series? – Wouter van Ooijen Dec 16 '13 at 08:05
  • @WoutervanOoijen: According to Logical Effort (a simplified linear model of gate delay) the delay of an \$n\$ input CMOS NAND gate grows as \$\frac{n+2}{3}\$ while an \$n\$ input CMOS NOR gate has a delay like \$\frac{2n+1}{3}\$. http://en.wikipedia.org/wiki/Logical_effort#Delay_in_NAND_and_NOR_gates – Wandering Logic Dec 16 '13 at 12:52
  • @WoutervanOoijen: Gate delay scales per rising and falling edge differently per NOR and NAND gates. Since in NAND the pullup is in parallel, the pullup speed is much less effected by increased inputs, whereas the series pulldown path certainly feels every new input. Of course there is source/drain cap at the shared internal node which also increases per parallel input. An exact number is not possible because it changes based on technology, process, layout, drivestrength of the gate, how the gate is skewed for rise vs. fall, the output load, and the input slew. And some others. – jbord39 Oct 13 '16 at 13:19

0 Answers0