1

Divider can be made combinational, which uses more logic gates.. Divider can be made sequential, the throughput may stay the same, i.e. use as many stages as the width of the dividend (assuming width of dividend same as divisor), so that it can't compute multiple divisions at the same time.

Can sequential dividers be made so that they can compute multiple division at the same time with different stages of divider? what would be the input/output ports be like to accommodate such capability... I am thinking inputs are clk, reset, "dividend", divisor, and input_ready. Output should be quotient, remainder and output_ready... are these sufficient? what about the synthesizability of the code?

   module divide3 ( clk, rst, dd_input, dv_input, quotient, 
        done, rdy, remainder );
parameter [1:0] READY = 2'b00,
            BUSY = 2'b01,
        FINISH = 2'b10;
input clk, rst;
input signed [`DD_LEN-1 : 0] dd_input;
input signed [`DV_LEN-1 : 0] dv_input;
input  rdy;
output signed [`QLEN-1 : 0] quotient;
output signed [`QLEN-1 : 0] remainder;

reg signed [`QLEN-1 : 0] q;
reg signed [`QLEN-1 : 0] r;
output done;
reg done;
reg quotient, remainder;

reg [`DD_LEN-1 : 0] dividend;
reg [`DV_LEN-1 : 0] divisor;
reg [(`DD_LEN<<1)-1 : 0] A;
reg negDivisor, negDividend;

reg[1:0] cState, nState;
integer i;
reg sample;

always @(*)
begin
    case(cState)
    READY: 
    begin
        i=0;
        done = 0;
        sample = 1;
        if (rdy)
        begin
            if (dividend[`DD_LEN-1])
                negDividend = 1;
            if (divisor[`DV_LEN-1])
                negDivisor = 1;
            nState = BUSY;
            A = dividend;
        end
        else
            nState = READY;
        
    end
    BUSY: 
    begin
        if (i<`DD_LEN)
        begin
            i=i+1;
            nState = BUSY;
            sample = 0;
            if (A[(`DD_LEN<<1)-1:`DD_LEN] >= divisor)
            begin
                A[(`DD_LEN <<1)-1:`DD_LEN] = A[(`DD_LEN<<1)-1:`DD_LEN] - divisor;
                q = q +1;
            end
            q = q<<1;
            A = A<<1;
        end
        else
        begin
            if (A[(`DD_LEN<<1)-1:`DD_LEN] >= divisor)
            begin
                A[(`DD_LEN <<1)-1:`DD_LEN] = A[(`DD_LEN<<1)-1:`DD_LEN] - divisor;
                q = q +1;
            end
            r = A[(`DD_LEN <<1)-1:`DD_LEN];
            if (negDividend!= negDivisor)
                q = -q;
            if (negDividend)
                r = -r;
            nState = FINISH;
        end
    end
    FINISH:
    begin

        done = 1;
        sample = 0;
        nState = READY;
    end
    default:
    begin
        nState = READY;
        done = 1'bx;
        remainder = `QLEN'bx;
        quotient =  `QLEN'bx;   
    end

    endcase
end

always @(posedge clk, posedge rst)
begin
    if (rst)
    begin
         quotient <= 0;
        remainder <= 0;
            A <= 0;
             done <= 0;
        cState <= READY;
    end
    else
    begin

        cState <= nState;
        if (sample & rdy) 
        begin
            dividend <= dd_input;
            divisor <= dv_input;
        end

        if (done)
        begin
            remainder <=r;
            quotient <=q;
        end

    end

end


endmodule
  • If you were to draw up a sequential example, I think we could more easily comment on the specifics of your example. I have a particular sequential model in mind that is quite good for generating one bit of result per clock and is very cheap to implement. But it wouldn't lend itself to pipelining. It would be better in its case to just create multiple copies of it and arrange a registration station to allow allocation, should an instruction in the shadow of another one performing division also require a sequential division unit (with up to one allocation per clock?) Draw up something? – jonk Aug 12 '21 at 03:13
  • could you comment as much as you can.. I am all ears – hardware noob Aug 12 '21 at 03:31
  • When I said "Draw up something?" I was talking to you, not offering to do it myself. And well, I think Dave is in far better position. I haven't ever developed a fully combinatorial divider and so would have to start from scratch. You should ask him to offer some more detailed thoughts than, "Yes. Because I made one." That's not going to help you much, as it is. It's more a comment than an answer. – jonk Aug 12 '21 at 03:42
  • Are you familiar with the [CORDIC](https://en.wikipedia.org/wiki/CORDIC) family of algorithms? – Neil_UK Aug 12 '21 at 05:05

1 Answers1

2

Yes, you can take a combinatorial divider and add pipeline registers to increase the throughput. I've done exactly that for a few projects where I needed a result on every clock cycle @ 150 MHz.

Dave Tweed
  • 168,369
  • 17
  • 228
  • 393
  • you mean a sequential divider with same throughput as a comb divider? – hardware noob Aug 12 '21 at 03:17
  • 1
    No, it isn't a sequential divider. It's the same logic as a combinatorial divider, but with pipeline registers placed strategically to multiply the throughput. In one case, I had 18 pipeline stages, each computing one bit of the quotient. Although in one sense, you can think of it as "unrolling" the sequential divider. The registers for each pipeline stage need to hold the divisor, the dividend/remainder and the quotient bits that have been computed so far. – Dave Tweed Aug 12 '21 at 03:22