32 bit to 7 segment display (FPGA)

Question

After some research, I think I have quite a problem on my hands.

I have a project on an FPGA where I store a 32 bit binary value in a 32 bit memory register. The stored value ranges from 0 to 4000 in decimal.

I would like to display this decimal number from 0 to 4000 on a 4 digit 7 segment display. I know it would be significantly easier to display the number in hex format but I would like it in decimal.

Only way I can think of is a huge look up table but this is really tedious, or changing my system to calculate values in BCD instead of binary.

edit

Or a binary to bcd converter but again would be slow

edit

My architecture is 32 bit so I am just storing this value in a 32 bit register.

What are ways to solve this problem? I am not sure how to go about solving this problem and would just like to be pointed in the right direction if anyone knows how to do this

Truncate to 12 bits. Converting to decimal can be kept as simple as possible. Note that repeated subtraction of 1000 will soon get you a number below 1000 : you might also find the number of subtractions useful. Now repeated subtratcion of 100 will ...ah you get the picture. — , Nov 28 '20 at 22:14
Given its only a 12-bit number, a simple double-dabble implementation for bin-BCD would be pretty straight forward and not too slow (certainly not if you add a few pipeline stages to it). Heck for 12-bits a 4k lookup table would be a drip in the ocean for modern FPGAs. — Tom Carpenter, Nov 28 '20 at 22:23
@Brian Drummond wow I never thought of that, that is quite clever. And as my number is reasonably small, this might be quite a fast method. — David777, Nov 28 '20 at 22:23
If it's to be shown on a 7-segment display that will be read by a human, how is the speed of the conversion of any relevance? — pericynthion, Nov 30 '20 at 08:55

Marcus Müller · Answer 1 · 2020-11-29T11:56:51.457

Or a binary to bcd converter but again would be slow

um, no. Your Human eye can read at most say 4 different numbers a second. Your FPGA will be able to do this in the millions per second.

Speed doesn't even remotely matter in this application.

My architecture is 32 bit so I am just storing this value in a 32 bit register.

That's not how FPGAs work – they don't have fixed-size registers. Your task as designer is to use the bits you need.

In this case: numbers up to 4000 only need 12 bits. So, no matter what solution you go for, the first "step" you take is ignore the topmost 20 of your 32 bits. They're all zeros, or your value is not 4000 or below.

Only way I can think of is a huge look up table

And FPGAs are good at these,

but this is really tedious

why? All that it would take you is writing a program in a for loop in any scripting/programming language of your choice. You could even learn a new language for this – being able to print such a table would be what I'd expect people can do after the first day of learning Python, matlab, BASIC, C, or C++ for example.

If you don't know any programming language yet, do take 4 hours to learn the basics of Python. You'll thank yourself later.

LUT approach

size of the naivest table possible

a huge look up table

How huge is "huge", anyways?

So, we've got 4000 values. That's > 2¹¹ and < 2¹², so we use a table with 2¹² entries, as power-of-two sizes make it easy to address the table directly with the value you want to display.

So, each digit has 10 possible values. Need 4 bits to save these. When reading them out, you just read four 4-bit values, and convert each of these binary representations of a decimal digit to their respective 7-segment-output (that's another lookup, in essence, but it's so small that your FPGA will need 2 or maybe 3 logic cells for that. Um, you were starting out with 32 bit registers for 20 bits of data, so I'm sure this isn't a problem. Your FPGA will have thousands of these.)

A most naive table implementation hence would need 4 digits · 4 bits · 2¹² entries = 2¹⁶ bits of memory. That's 64 kbit blockram. Your FPGA might really have no problem at all supplying you this.

In that case you're done here. Write a script that generates the content of a lookup table, grab second breakfast.

easy, less wasteful approach

Cleverer: Same lookup table idea, but instead of one large table with 4 digits long entries, make 4 tables, one for each digit. This helps a lot:

Assuming you have 0-3999 (and treat 4000 as a special case before even doing a lookup), the first digit (the thousand) only needs 2 bits, instead of 4. Also, since 1000 is divisible by 8, the least 3 of our 12 bits don't even matter – so, our table for the thousands only need 2 bits · 2⁹ = 1 kbit.

You do a lookup in that table by "shifting right" the input word by 3 bits (that is, only using the upper 9 bits) and using that as address.

The hundreds, by the same logic, don't care about the last 2 of 12 bits, so that table only needs 4 bit for each of 2¹⁰ entries, so 4 kbit. Lookup by using the upper 10 bits. You get the idea.

Since 10 is divisible by 2, the tens don't care about the last bit, so they need only 2¹¹ entries, so 8 kbit.

The ones would use a full 16 kbit table.

That's 29 kbit in total. That's nearly nothing. A series 7 Xilinx FPGA has blockram cell with I think 38 kbit. Which really means you're not even occupying a single blockram with this. So, if you have a device where block ram units are >= 29 kbit, do the lookup; won't get smaller, faster, easier, lower power than that.
^{(Not that for operations that happen at most a couple times a second, speed or power consumption could ever matter, but still, it's the optimal solution in these disciplines.)}

optimized multi-staged lookup, still easy

We can be even smarter, easily, too, and save more RAM.

Let's look at the ones:
if we set the last bit of our input to 0, then we always get an even digit. And if we set it to 1, we get the odd digit that comes after the even – always. So, we don't actually need to store all 10 possible digits, the 5 even ones suffice. Technically, that's easy: just set the last bit of the input to 0, get the value from the table, and add the (original) last input bit to it to get the same result. That means we don't have to store 2¹¹, but only 2¹⁰ values. And because there's only 5 even digits, we don't need 4 bits to store them – 3 bits suffice.

That reduces the size of our ones table from 16 kbits to 3 bits · 2¹⁰ in memory.

user110971 · Answer 2 · 2020-11-28T22:33:39.117

1

Your problem can be split into two parts. Firstly, converting the binary number \$b\$ to a decimal number \$d\$. Secondly, converting each digit \$d_n\$ into the seven segment representation.

Remember that a decimal number can be expressed as

$$b = \sum_{i = 0}^n d_i 10^i.$$

To get the least significant digit all you need do is divide by 10 and look at the remainder. So to obtain all the digits you have to do the following:

i = 0
while i < number of bits
    d[i] = b % 10
    b = b / 10
    i = i + 1

In an FPGA you can implement the above by loading your number into an accumulator register and connecting the accumulator to a divider that feeds back to the accumulator. Each clock cycle you can save the output of the remainder in a 4-bit (you need 0 to 9) shift register. An FSM will be needed to start and stop the process.

Once the decimal digits have been stored in the 4-bit registers displaying them on the 7-segment display is straight forward. You can use a look up table that converts each digit to its 7-segment representation. Put one such look up table on each 4-bit register.

edited Nov 28 '20 at 22:33

answered Nov 28 '20 at 22:28

user110971

6,067
1
15
23

1

While such an algorithm works nicely in software, it's not well suited for an FPGA. Division in FPGAs is an expensive operation both in terms of resource usage, and Fmax. – Tom Carpenter Nov 28 '20 at 22:39
@TomCarpenter I expect it to be in the low hundreds for LUTs. See the Radix-2 dividers by [Xilinx](https://www.xilinx.com/support/documentation/ip_documentation/ru/div-gen.html). They can run at hundreds of MHz. If you utilize some RAM, you can implement a 12-bit divider with about a 100 LUTs. – user110971 Nov 28 '20 at 23:04
1

@TomCarpenter : division needn't be expensive in FPGA. As speed isn't THAT important I suggested division by repeated subtraction above. – Nov 28 '20 at 23:14
1

@TomCarpenter Of course, there other ways to do this. You don’t need it to be that fast, since it’s a display. You can create a decimal counter with 4-bit binary counters that reset when they reach a count of 10, i.e ...8, 9, 0. When they reset you generate a carry for the next counter in the chain. There are many ways to implement this. – user110971 Nov 28 '20 at 23:16
1

@BrianDrummond Indeed. This is subtractive division. Or you can simply count up to the number with a decimal counter as outlined in my comment above. There are many solutions to this problem. – user110971 Nov 28 '20 at 23:26

Spehro Pefhany · Answer 3 · 2020-11-29T00:09:37.147

You want to convert the binary number to BCD and then apply the BCD number to a BCD->7-segment conversion. The latter is pretty easy, a lookup table or some (20 or 30) gates.

You can do this serially, in 12 clock pulses (or 32 if you want to be able to convert any possible unsigned 32-bit number) plus a couple cycles for overhead. One algorithm is to left-shift the number into a BCD result register. For each clock pulse you double the BCD result and add the new bit from the binary register. You must correct the value if the BCD digit was >4, and take care of any generated carries and correct higher value digits. A single BCD digit logic looks like this:

Photo and reference from here

Of course if you want your FPGA to work hard, and not smart, you could simply make a binary down counter and a BCD up counter and decrement one while incrementing the other until the binary gets to zero, the BCD count starting from 0 would be the BCD equivalent. That takes 4000 cycles maximum, which is still way faster than required for visual display (more than a few updates a second is a waste, and may be harmful) with typical FPGA clocks.

score 1 · Answer 4 · answered Dec 03 '20 at 02:11

The maths of it simple, i.e.:

Repeatedly divide the binary number by 10 until it is zero.
After each division, use the remainder as an index into an array of seven-segment codes.

But VHDL requires a bit more effort...

Dividing by 10 is the same as multiplying by 0.1, so convert 0.1 to Q16 format:

$$ 0.1 \cdot 2^{16} = 6554 $$

...to the nearest whole number. Be careful of rounding error here.

After multiplying the binary number by 6554, the remainder is in bits 15 down to 0, but we only need 5 bits of the remainder for good enough accuracy, so use these 5 bits as an index into our array of seven-segment codes, something like this:

    type TQ5SegmentCodes is array(0 to 31) of TSegment;  -- Declared in a package which maps a Q5 remainder to a 7-segment code.
    ...
    binary   : in natural range 0 to 4000;             -- Binary number. Entity port.
    segments : out TSegments(NUM_DIGITS - 1 downto 0)  -- Seven-segment digits. Entity port.
    ...
    constant Q: natural := 16;
    constant Q_ONE_TENTH: natural := natural(round(0.1 * 2**Q));
    constant Q5_SEGMENT_CODES: TQ5SegmentCodes := GetQ5SegmentCodes;  -- Calculate the map at compile time.

    type TBinary is array(0 to NUM_DIGITS) of natural;
    signal binaries: TBinary;
    signal q_binaries: TBinary;
    signal q5_remainders: TBinary;
    ...
    binaries(0) <= binary;

    S7D:
    for i in 0 to NUM_DIGITS - 1 generate
        q_binaries(i) <= binaries(i) * Q_ONE_TENTH;
        q5_remainders(i) <= Q5RemainderFromQ(q_binaries(i), Q);
        segments(i) <= Q5_SEGMENT_CODES(q5_remainders(i));
        binaries(i + 1) <= q_binaries(i) srl Q;
    end generate;

This generates a multiplier, adder and mux for each digit in combinational logic. Alternatively, you could put it in a clocked process to reuse the logic for each digit.

Binary to 7-Segment Code

Here's the full code with test bench.

library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;
use ieee.math_real.all;

package PSegments is

    subtype TSegment is std_logic_vector(6 downto 0);
    type TSegments is array(natural range <>) of TSegment;

    type TSegmentCodes is array(0 to 9) of TSegment;

    constant SEGMENT_CODES : TSegmentCodes :=
    (
        "0000001",  -- 0
        "1001111",  -- 1
        "0010010",  -- 2
        "0000110",  -- 3
        "1001100",  -- 4
        "0100100",  -- 5
        "0100000",  -- 6
        "0001111",  -- 7
        "0000000",  -- 8
        "0000100"   -- 9
    );

    type TQ5SegmentCodes is array(0 to 31) of TSegment;

    function GetQ5SegmentCodes return TQ5SegmentCodes;

end package;



library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;
use ieee.math_real.all;

package body PSegments is

    function GetQ5SegmentCodes return TQ5SegmentCodes is
        variable segment_index: natural;
        variable result: TQ5SegmentCodes;
    begin
        for i in 0 to 31 loop
            segment_index := natural(round(real(i) / 32.0 * 10.0));
            if segment_index > 9 then
                segment_index := 0;
            end if;
            result(i) := SEGMENT_CODES(segment_index);
        end loop;
        return result;
    end function;

end package body PSegments;



library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;
use ieee.math_real.all;
use work.PSegments.all;

entity BinaryTo7Segment is
    generic
    (
        NUM_DIGITS: natural := 4
    );
    port
    (
        reset    : in std_logic;
        clock    : in std_logic;
        binary   : in natural range 0 to 4000;             -- Binary number.
        segments : out TSegments(NUM_DIGITS - 1 downto 0)  -- Seven-segment digits.
    );
end entity BinaryTo7Segment;

architecture V1 of BinaryTo7Segment is

    function "srl"(constant a, q: in natural) return natural is
        variable u: unsigned(30 downto 0);
    begin
        u := to_unsigned(a, u'length);
        return to_integer(u(30 downto q));
    end function;

    function Q5RemainderFromQ(constant a, q: in natural) return natural is
        constant Q_ONE_SIXTY_FOURTH: natural := natural(round(1.0 / 64.0 * 2**Q));
        variable u: unsigned(30 downto 0);
    begin
        u := to_unsigned(a + Q_ONE_SIXTY_FOURTH, u'length);  -- Round off to the nearest one thirty second.
        return to_integer(u(q - 1 downto q - 5));
    end function;

    constant Q: natural := 16;
    constant Q_ONE_TENTH: natural := natural(round(0.1 * 2**Q));
    constant Q5_SEGMENT_CODES: TQ5SegmentCodes := GetQ5SegmentCodes;

    type TBinaries is array(0 to NUM_DIGITS) of natural;
    signal binaries: TBinaries;
    signal q_binaries: TBinaries;
    signal q5_remainders: TBinaries;

begin

    binaries(0) <= binary;

    S7D:
    for i in 0 to NUM_DIGITS - 1 generate
        q_binaries(i) <= binaries(i) * Q_ONE_TENTH;
        q5_remainders(i) <= Q5RemainderFromQ(q_binaries(i), Q);
        segments(i) <= Q5_SEGMENT_CODES(q5_remainders(i));
        binaries(i + 1) <= q_binaries(i) srl Q;
    end generate;

end architecture V1;



library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;
use ieee.math_real.all;
use work.PSegments.all;

entity BinaryTo7Segment_TB is
end entity BinaryTo7Segment_TB;

architecture V1 of BinaryTo7Segment_TB is

    constant clock_period : time := 20 ns;

    signal halt_sys_clock: boolean := false;

    signal reset: std_logic := '0';
    signal clock: std_logic := '0';

    constant NUM_DIGITS: natural := 4;
    signal binary: natural range 0 to 4000;
    signal segments: TSegments(NUM_DIGITS - 1 downto 0);

    component BinaryTo7Segment is
        generic
        (
            NUM_DIGITS: natural := 4
        );
        port
        (
            reset    : in std_logic;
            clock    : in std_logic;
            binary   : in natural range 0 to 4000;  -- Binary number.
            segments : out TSegments(NUM_DIGITS - 1 downto 0)  -- Seven-segment digts.
        );
    end component;

begin

    ClockGenerator:
    process
    begin
        while not halt_sys_clock loop
            clock <= not clock;
            wait for CLOCK_PERIOD / 2.0;
        end loop;
        wait;
    end process ClockGenerator;

    Stimulus:
    process
    begin
        -- Do reset.
        wait for CLOCK_PERIOD / 4;
        reset <= '0';
        wait for CLOCK_PERIOD / 2;
        reset <= '1';
        wait for CLOCK_PERIOD / 2;
        reset <= '0';

        for i in 0 to 4000 loop
            binary <= i;
            reset <= '1';
            wait for CLOCK_PERIOD / 2;
            reset <= '0';
            wait for 2 * CLOCK_PERIOD;
        end loop;

        -- Halt simulation.
        halt_sys_clock <= true;
        wait;
    end process;

    DUT: BinaryTo7Segment
        port map
        (
            reset => reset,
            clock => clock,
            binary => binary,
            segments => segments
        );

end architecture V1;

Wow, very detailed answer, thank you. There are a lot of ways to solve this problem. I am working on another part of my project and will come back to this topic to ask more questions or let you know what method I’ve chosen — David777, Dec 03 '20 at 10:45

score 0 · Answer 5 · answered Nov 30 '20 at 08:43

0

If you want to make it simple you can use Instant SoC.

Instant SoC has a segment display class.

There is a project using this on Digilent Project Vault: RISC-V on Nexys

answered Nov 30 '20 at 08:43

Holminge

149
1
2

Downvoted this as the answers should be standalone, not links to some other product which may or may not exist in the future. – awjlogan Nov 30 '20 at 08:53