How to minimize slack for mod in numerical package?

Question

Here is the snippet of the code in VHDL. I mean this IEEE.numeric_std.all

shift_reg <= b"0011" & std_logic_vector("/" (temp_result, divisor) (3 downto 0));
temp_result <= temp_result mod divisor;
divisor <= divisor / 4d"10";

This is what I got from Timing Analysis in Quartus. There is a lot of look up tables for mod

The reason why I use divisor is that the result should be displayed one by one in LCD. For instance, 365 have to divided into 3, 6 and 5 in num_array.

-- define the new type for the 6x4 RAM 
type num_array is array (0 to 5) of unsigned (3 downto 0);
 signal my_array                        : num_array:= (others => (others => '0'));

your `shift_reg` seems unused. I don't think `4d"10"` is valid VHDL, but I'm not too used to VHDL anymore, and especially not to VHDL2008. What does `std_logic_vector("/" (temp_result, divisor) (3 downto 0))` do? — Marcus Müller, Apr 15 '23 at 12:11
When you use the mod-operator, the synthesis tools create a combinatorial division module, as most (all?) FPGAs do not have any built-in division-module (as they have for multiplication). Combinatorial division modules always have a very deep path from input to output. So you could reduce your clock frequency, or define multicycle paths for the mod-operation, or build your own clocked division module. — Matthias Schweikart, Apr 15 '23 at 12:12
@greybeard you mean in a binary Galois field? That wouldn't work, would it, 10 not being coprime with 2^n? — Marcus Müller, Apr 15 '23 at 12:54
by the way, @kile: what are you trying to build in the larger picture? The module you're building is called `lcd`, but if that means `Lowest Common Denominator`, then it's not clear what the division with 10 has to do there. — Marcus Müller, Apr 15 '23 at 13:01
@MarcusMüller I am building a calculator for my LCD 16*2 display. I am using DE10 lite which use Intell Max 10 FPPGA. The datasheet for this LCD IC chip could be found here https://www.sparkfun.com/datasheets/LCD/HD44780.pdf — kile, Apr 15 '23 at 13:13
cool, @kile! But then: it's getting all a bit easier, because LCDs (and human eyes) are incredibly slow compared to FPGAs. So, instead of using `/` and `mod`, what about a state machine which simply subtracts until the result would get smaller than 0, and then emits the count of subtracting (so, the division) and the remainder (so, `mod`)? — Marcus Müller, Apr 15 '23 at 13:21
For human interface speed binary-to-decimal, there's always bit-serial double dabble. Going with a state machine may well end you with an embedded processor. — greybeard, Apr 15 '23 at 13:25
@greybeard heh; historically, there's a certain coevolution of desk/pocket calculators and small MCUs, so practically, that wouldn't be absurd at all. Regarding the double-dabble algorithm: Love that, but not convinced it's even more space efficient than just a stupid "binary search for the right multiplier"-division-and-mod algorithm – that "beast" of an FPGA has multiplier blocks. — Marcus Müller, Apr 15 '23 at 13:36
@greybeard What do you mean state machine here? I already have implemented state machine in lcd block. Do you mean that I have to do it once again for bit-serial double dabble algorithm? — kile, Apr 15 '23 at 13:41
@kile His statement about state machines referred to my proposal to replace your usage of mod and division with a state machine. But: Check wikipedia for [double-dabble](https://en.wikipedia.org/wiki/Double_dabble); that algorithm is a deeply pipelined, without explicit state-machine, algorithm to convert binary numbers to the decimal digits in BCD. — Marcus Müller, Apr 15 '23 at 13:48
Another point is that you don't have timing slack because you're running it on a 50MHz clock. You could probably just add in a clock divider down to 10 or 1 MHz and make this approach work. — W5VO, Apr 15 '23 at 13:50
My only point is that you don't seem to have a throughput requirement or part of your system that requires 50MHz, or anything close for that matter. If your just trying to implement the function, lower your clock frequency. If you're trying to learn about algorithms and STA, please continue. — W5VO, Apr 15 '23 at 14:03
This question (or the comments?) seems to wander from its title to *how to do a VHDL calculator* (and *Why?*). To begin with the title, you have *negative* slack. — greybeard, Apr 15 '23 at 14:29
@greybeard Do you mean that I have to change the title for this question? — kile, Apr 15 '23 at 14:42
I think the title should summarise what you want answered. If that was *How to do a 16 digit calculator in VHDL in general and binary-to-decimal conversion in particular?*, the title should reflect that. If you want to get Quartus not to report an error with minimum change to the existing design (of which you show little enough), just check if the problem was indeed *minimize slack*, 0 being as low as it usefully gets. — greybeard, Apr 15 '23 at 14:54
Or at least _edit your question_ to point out that you're doing a calculator that needs to operate at human speed. You may also want to mention the clock speed you're using, your goals for logic footprint, and **most importantly** the speed you actually need for the computation. — TimWescott, Apr 15 '23 at 15:03
**Comments have been [moved to chat](https://chat.stackexchange.com/rooms/145370/discussion-on-question-by-kile-how-to-minimize-slack-for-mod-in-numerical-packag); please do not continue the discussion here.** Before posting a comment below this one, please review the [purposes of comments](/help/privileges/comment). Comments that do not request clarification or suggest improvements usually belong as an [answer](/help/how-to-answer), on [meta], or in [chat]. Comments continuing discussion may be removed. — Voltage Spike, Apr 15 '23 at 22:06

Marcus Müller · Accepted Answer · 2023-04-15T12:59:15.620

Problems with your code

temp_result <= temp_result mod divisor;

That's a bad idea: integer mod is an immensely complex operation.

Unless divisor is a power of 2, the combinatorial approach to modulo will be nearly as complex as doing a division.

divisor <= divisor / 4d"10";

same problem; dividing by 10 needs a relatively complex divider with a large combinatorial depth.

Complexities of integer mod and division

Here's a slightly dated view of the combinatorial implementation of division in yosys targetting a small-cell FPGA (from an older answer of mine):

(Divide on an ICE40) (warning: ~100 Mpixel image)

So, in general, doing division or mod just in a combinatorial path like you do here is a pretty bad idea.

You'll want to implement a pipelined modulo, I'd guess. Depending of the relative lengths of temp_result and divisor, that can be relatively flat, or deep.

Ways out

Well, you'll notice

there's probably quite some overlap between what you need to do to calculate a mod b and to calculate a / b.
there's no easy way out – division is just combinatorial intense. So, we need to trade slack for pipeline depth.

The 2. issue is something you'll be quite familiar with when you look what time it takes for anything but really number-crunching optimized CPUs to divide integers – many processor instruction sets do have some kind of integer divide and integer modulo operations, but they take multiple, often many, clock cycles to complete.

So, I'd try to find an existing implementation of pipelined integer division (mod is often a side product of that) (for example, on opencores.org), or write a very naive one myself, which doesn't need to have constant throughput.

@MarcusMüller I have to display those decimal number one by one at a time. For example, 365 has to be divided into an array of numbers [3, 6, 5]. In VHDL, it will look like this. `-- define the new type for the 6x4 RAM type num_array is array (0 to 5) of unsigned (3 downto 0); signal my_array : num_array:= (others => (others => '0'));` — kile, Apr 15 '23 at 13:31
@kile excellent! Yeah, I'd just have a state machine counting the times I can subtract 1000 until the result would become negative (which gives you the thousands digit), then take the remainder (your `mod`), use the same state machine to count how many times you can subtract 100 (giving you the hundreds digit), and so on. greybeard recommended a much cooler algorithm under your question, but it's much harder to understand. — Marcus Müller, Apr 15 '23 at 13:41

score 1 · Answer 2 · answered Apr 15 '23 at 15:17

Division just takes a lot of steps
A modulo operation is basically division
Lots of steps translates into lots of logic, or a state machine
You're building a calculator, so it doesn't have to respond much faster than a human can see it working.

Typically, in an FPGA design, when you have something complicated to do and you have lots of clock cycles to do it in, then unless you have LUTs to burn you make a state machine that breaks the complicated thing down into simpler steps.

It's the same with discrete logic design. The end point of this process evolved from the mid 1930's to the late 1940's into general purpose computers. It's still a viable thing to do inside an FPGA: use a microprocessor block, run some code, do your business.

If I didn't just slap a microprocessor block in there (there's a lot of them, including the Nios straight from Intel) I'd either implement a little 4- or 8-bit one of my own (it's a bucket list thing) or I'd research the various available division algorithms that are out there and that use state machines for a complexity vs. clock-cycle tradeoff.

Successive subtraction (just subtract 10 from your number until it's less than 10) and CORDIC are probably your best bets. Since you're building a calculator anyway you may wish to make a division block, and use it for all your arithmetic needs.

While you're working on your gizmo, consider that most calculators store numbers internally in binary coded decimal, and often in a form of decimal floating point (i.e., mantissa in BCD and an exponent that's a power of 10). This is done because the numbers are so tightly coupled to what you're displaying that the extra work in doing all the math in base 10 is paid for by the savings in work when you display results. — TimWescott, Apr 15 '23 at 15:19

How to minimize slack for mod in numerical package?

2 Answers2

Problems with your code

Complexities of integer mod and division

Ways out