10

I have the following VHDL function that multiples a given mxn matrix a by a nx1 vector b:

function matrix_multiply_by_vector(a: integer_matrix; b: integer_vector; m: integer; n: integer)
return integer_vector is variable c : integer_vector(m-1 downto 0) := (others => 0);
begin
    for i in 0 to m-1 loop
        for j in 0 to n-1 loop
            c(i) := c(i) + (a(i,j) * b(j));
        end loop;
    end loop;
    return c;
end matrix_multiply_by_vector;

It works well but what does this actually implement in hardware? Specifically, what I want to know is if it is smart enough to realize that it can parallelize the inner for loop, essentially computing a dot product for each row of the matrix. If not, what is the simplest (i.e. nice syntax) way to parallelize matrix-vector multiplication?

fabiomaia
  • 203
  • 1
  • 6
  • 1
    If it wasn't, you would have to have some kind of memory and serially load all of the values and "execute" them pipeline style – Voltage Spike Jun 01 '18 at 17:41

2 Answers2

10

In 'hardware' (VHDL or Verilog) all loops are unrolled and executed in parallel.

Thus not only your inner loop, also your outer loop is unrolled.

That is also the reason why the loop size must be known at compile time. When the loop length is unknown the synthesis tool will complain.


It is a well known trap for beginners coming from a SW language. They try to convert:

int a,b,c;
   c = 0;
   while (a--)
     c +=  b;

To VHDL/Verilog hardware. The problem is that it all works fine in simulation. But the synthesis tool needs to generate adders: c = b+b+b+b...b;

For that the tool needs to know how many adders to make. If a is a constant fine! (Even if it is 4.000.000. It will run out of gates but it will try!)

But if a is a variable it is lost.

Ale..chenski
  • 38,845
  • 3
  • 38
  • 103
Oldfart
  • 14,212
  • 2
  • 15
  • 41
1

This code will parallelize both loops, since you haven't defined an event to control any subset of the processing. Loops just generate as much hardware as they need to generate the function; you need a PROCESS.

A process has a sensitivity list that tells VHDL (or the synthesizer) that the process is not invoked unless one of the nodes in the list changes. This can be used to synthesize latches, and expand beyond the realm of pure combinatorial implementation.