Or a binary to bcd converter but again would be slow
um, no. Your Human eye can read at most say 4 different numbers a second. Your FPGA will be able to do this in the millions per second.
Speed doesn't even remotely matter in this application.
My architecture is 32 bit so I am just storing this value in a 32 bit register.
That's not how FPGAs work – they don't have fixed-size registers. Your task as designer is to use the bits you need.
In this case: numbers up to 4000 only need 12 bits. So, no matter what solution you go for, the first "step" you take is ignore the topmost 20 of your 32 bits. They're all zeros, or your value is not 4000 or below.
Only way I can think of is a huge look up table
And FPGAs are good at these,
but this is really tedious
why? All that it would take you is writing a program in a for
loop in any scripting/programming language of your choice. You could even learn a new language for this – being able to print such a table would be what I'd expect people can do after the first day of learning Python, matlab, BASIC, C, or C++ for example.
If you don't know any programming language yet, do take 4 hours to learn the basics of Python. You'll thank yourself later.
LUT approach
size of the naivest table possible
a huge look up table
How huge is "huge", anyways?
So, we've got 4000 values. That's > 2¹¹ and < 2¹², so we use a table with 2¹² entries, as power-of-two sizes make it easy to address the table directly with the value you want to display.
So, each digit has 10 possible values. Need 4 bits to save these. When reading them out, you just read four 4-bit values, and convert each of these binary representations of a decimal digit to their respective 7-segment-output (that's another lookup, in essence, but it's so small that your FPGA will need 2 or maybe 3 logic cells for that. Um, you were starting out with 32 bit registers for 20 bits of data, so I'm sure this isn't a problem. Your FPGA will have thousands of these.)
A most naive table implementation hence would need 4 digits · 4 bits · 2¹² entries = 2¹⁶ bits of memory. That's 64 kbit blockram. Your FPGA might really have no problem at all supplying you this.
In that case you're done here. Write a script that generates the content of a lookup table, grab second breakfast.
easy, less wasteful approach
Cleverer: Same lookup table idea, but instead of one large table with 4 digits long entries, make 4 tables, one for each digit. This helps a lot:
Assuming you have 0-3999 (and treat 4000 as a special case before even doing a lookup), the first digit (the thousand) only needs 2 bits, instead of 4. Also, since 1000 is divisible by 8, the least 3 of our 12 bits don't even matter – so, our table for the thousands only need 2 bits · 2⁹ = 1 kbit.
You do a lookup in that table by "shifting right" the input word by 3 bits (that is, only using the upper 9 bits) and using that as address.
The hundreds, by the same logic, don't care about the last 2 of 12 bits, so that table only needs 4 bit for each of 2¹⁰ entries, so 4 kbit. Lookup by using the upper 10 bits. You get the idea.
Since 10 is divisible by 2, the tens don't care about the last bit, so they need only 2¹¹ entries, so 8 kbit.
The ones would use a full 16 kbit table.
That's 29 kbit in total. That's nearly nothing. A series 7 Xilinx FPGA has blockram cell with I think 38 kbit. Which really means you're not even occupying a single blockram with this. So, if you have a device where block ram units are >= 29 kbit, do the lookup; won't get smaller, faster, easier, lower power than that.
(Not that for operations that happen at most a couple times a second, speed or power consumption could ever matter, but still, it's the optimal solution in these disciplines.)
optimized multi-staged lookup, still easy
We can be even smarter, easily, too, and save more RAM.
Let's look at the ones:
if we set the last bit of our input to 0, then we always get an even digit. And if we set it to 1, we get the odd digit that comes after the even – always. So, we don't actually need to store all 10 possible digits, the 5 even ones suffice. Technically, that's easy: just set the last bit of the input to 0, get the value from the table, and add the (original) last input bit to it to get the same result. That means we don't have to store 2¹¹, but only 2¹⁰ values. And because there's only 5 even digits, we don't need 4 bits to store them – 3 bits suffice.
That reduces the size of our ones table from 16 kbits to 3 bits · 2¹⁰ in memory.