Jan's Razor: In a chip multiprocessor design, strive to leave out all but the minimal kernel set of features from each processing element, so as to maximize processing elements per die.
-- Jan Gray
If your application really does need to do lots of work with 32-bit numbers, then the "minimal kernel set of features" for that application might need to include 32-bit operations.
On the other hand, as Chris Stratton pointed out,
if you need to do lots of work where 8 bits are adequate and only rarely are 32-bit numbers needed, then lots of 8 bit cores will likely give you higher net performance
than using a few 32-bit cores.
I see you are currently considering
- one or a few 32-bit cores
- several 16-bit cores
- many 8-bit cores.
There are several other possibilities that in some situations give better performance than any of the above:
- One 32-bit core, and many smaller cores
- Dynamic reconfiguration: reconfigure the soft microprocessors in a FPGA to get one or a few 32-bit processors at times when lots of 32-bit calculations are necessary, and reconfiguring the FPGA to get lots of 8-bit processors when that will give adequate precision and better performance.
- processing elements with lanes narrower than 8 bits.
While there are many multicore systems that include only 32-bit cores, and many that include only 8-bit cores, I see that the Wikipedia: multi-core processor article mentions many chips that include both a 32-bit processor and a bunch of 16-bit or 8-bit processors.
As I mentioned earlier --
Cheapest FPGAs? :
Simple (i.e., without a MMU) 32-bit CPUs require about 4 times the FPGA resources of an 8 bit CPU.
Full-fledged Linux requires a CPU with a MMU (such as the NIOS II/f). A 32-bit CPU with a MMU requires about 4 times the FPGA resources of a 32-bit CPU without a MMU.
By the way, 8 bits is not the "minimum".
You may be surprised to learn that all computers built before the 1951 Whirlwind operated on less than 8 bits at once.
Most of the early massively parallel processors operated on less than 8 bits at a time -- the Goodyear Massively Parallel Processor, the Connection Machine CM-2, the 2003 VIRAM1 chip, etc.
The most recent report I've seen shows that 4-bit CPUs still outsell (by volume) 32-bit CPUs. Have you seen a more recent report?
( Do 4-bit CPUs still outsell 32-bit CPUs in unit volume? )