Lattice — Should I prefer IPX/PMI over Verilog arithmetic builtins?

Question

Like any FPGA vendor, Lattice provides a number of IP modules for users to put in their designs. I tend to use them whenever possible, but sometimes I doubt if they have any substantial benefit over plain Verilog.

For example, I can create a simple multiplier using the IPX dialog:

or do the same thing with a PMI intrinsic:

pmi_mult #(
    .pmi_dataa_width (9),
    .pmi_datab_width (9),
    .pmi_sign ("on"),
    .pmi_additional_pipeline (0),
    .pmi_input_reg ("off"),
    .pmi_output_reg ("off"),
    .pmi_family ("XO"),
    .pmi_implementation ("LUT")) ...

But given that I do not need additional pipeline stages or latching, is it any better than Verilog built-in multiplication operator? Like the following:

wire signed [8:0] a = (...);
wire signed [8:0] b = (...);
wire signed [17:0] result = a * b;

Shouldn't the LSE (or Synplify) be able to automatically match multiplication to pmi_mult, addition to pmi_add, and so on? Can I rely on it and just use plain Verilog arithmetic for simple computations?

Marcus Müller · Accepted Answer · 2019-09-27T19:49:33.987

There's a fundamental functional difference between your code and what the IPX thing does: yours is not clocked.

Also, you can explitly tell the IPX thing how many pipeline stages you want, something that's impossible in a simple a * b combinatorial statement.

I don't know the features of IPX or PMI primitives, but you can rest assured that unless you really just need a combinatorial multiplication (that is, unclocked!), it makes sense to use a module that lets you explicitly state the way you want things to be implemented.

Generally, don't underestimate the complexity and degrees of freedom that an implementation of a basic arithmetic operation brings (and hides): For example, on a Lattice ICE40, a multiplier of two 8-bit numbers (unsigned!) can look this complex when mapped to the technology available in that FPGA:

(full 100 Megapixels (!) image)

Don't assume that a*b can magically infer whether you'd rather trade speed for space, or vice versa; whether you need the result registered or not; whether maybe high clock rate is more important than requiring a low number of clock cycles... Not even mentioning things like how the desired output width, and if lower than the maximum product, how to deal with that.

Lattice — Should I prefer IPX/PMI over Verilog arithmetic builtins?

1 Answers1