2

I was lurking into wikipedia, looking about processors with the highest number of cores, and obviously when you need cores to communicate between each other it can be problematic since it require much more wiring.

I was also curious about how word length affect a core size in general (not considering printing resolution): if a core is square, theoretically, would it mean (I'm making wild assumptions here) a single 32bit core takes as much room as 16 8bit cores or 4 16 bits cores ?

If I would design a processor with as many as hundreds or thousands or cores, would 8 bit cores be more interesting to increase the number of cores at the expense of word length ?

(I'm considering 8 bit cores to see if constraining the programming to its minimum to maximize parallel processing.)

jokoon
  • 135
  • 4
  • 1
    This really depends on the calculation size required by your operation. If you are mostly adding 8-bit numbers, two 8-bit cores likely beat a single 32-bit one. If you are multiplying 32-bit ones, no chance. Designing specialized computing machinery starts with figuring out what operations need to be performed; if you want fast but generic, just get a cluster of multicore conventional machines and hope for the best. – Chris Stratton Oct 27 '12 at 17:00
  • Dave Tweed has an excellent answer. Intel is also working on multiple core architecture: [Massive Multi-Core Xeon Phi Inherits Proven Ring Topology ](http://goparallel.sourceforge.net/massive-multi-core-xeon-phi-inherits-proven-ring-topology/) Also working on the Operating Systems for multi-core. For me it's "Re-inventing the wheel". – Optionparty Oct 27 '12 at 14:18

2 Answers2

1

First of all, the area required for a core goes up linearly with the word width, not the width squared, all other things kept the same. In other words O(N), not O(N2). There may be some functional blocks such as barrel shifters or multipliers that are O(N2), but these are very regular structures and do not usually dominate the area of a core.

But keep in mind that the control circuits for a core are independent of word size, so this represents a fixed amount of "overhead" for each core. This means that four 8-bit cores will take up slightly more room than a single 32-bit core.

Also, if you need to work on 32-bit data, something that takes N clock cycles on a 32-bit core is going to generally take more than 4×N cycles on an 8-bit core, because of the software overhead.

There is no perfectly general-purpose parallel processor. You need to fit the architecture of the cores themselves, the memory hierarchy and the network that ties them all together to the set of problems you intend to address. For example, just look at the differences between a multi-core chip used as a CPU for a PC and the multi-core chip used for its GPU.

Dave Tweed
  • 168,369
  • 17
  • 228
  • 393
  • 1
    Linearly is probably incorrect - overall likely more like n log n or perhaps higher. Consider for example a shifter. And that's before we get to multipliers. Control circuits to handle sub-word vs multi-word access would be an interesting tradeoff. – Chris Stratton Oct 27 '12 at 16:54
  • "Also, if you need to work on 32-bit data, something that takes N clock cycles on a 32-bit core is going to generally take more than 4×N cycles on an 8-bit core, because of the software overhead." If you use 32 bit integers or float I'd agree with you, but you could use 8 bit integer instead, if that's not too much of a constraint. – jokoon Oct 27 '12 at 17:36
  • An equal-time adder would also tend to be n log(n), right? –  Oct 28 '12 at 01:26
  • If one is designing a multi-core CPU without a huge budget, what would you think of the idea of something like a "propeller", where if an instruction would take e.g. 8 cycles to work through the pipeline one arranges things so that one would have eight discrete execution threads each of which took eight cycles per instruction? It would increasing the number of pipeline stages would allow one to reduce them to being a few gates each, thus allowing very high clock speeds. – supercat Apr 30 '15 at 23:04
  • @supercat: That's also what's known as "hyperthreading" in i86 CPUs. It's certainly a viable approach, but the main thing that it saves you is the need to detect and handle data dependencies among the pipeline stages -- since each instruction in the pipe comes from a different thread, there are no dependencies by definition. How fine-grained you make the pipeline is a separate issue -- it can be helpful to a point, beyond which it just adds excessive overhead. – Dave Tweed May 01 '15 at 10:57
  • @DaveTweed: Hyperthreading is, from my understanding, a bit more complicated since the scheduling is determined dynamically. The approach I was suggesting would be to have a fixed rotation. Not quite as useful, but likely much easier to implement. – supercat May 01 '15 at 15:16
0

Jan's Razor: In a chip multiprocessor design, strive to leave out all but the minimal kernel set of features from each processing element, so as to maximize processing elements per die. -- Jan Gray

If your application really does need to do lots of work with 32-bit numbers, then the "minimal kernel set of features" for that application might need to include 32-bit operations. On the other hand, as Chris Stratton pointed out, if you need to do lots of work where 8 bits are adequate and only rarely are 32-bit numbers needed, then lots of 8 bit cores will likely give you higher net performance than using a few 32-bit cores.

I see you are currently considering

  • one or a few 32-bit cores
  • several 16-bit cores
  • many 8-bit cores.

There are several other possibilities that in some situations give better performance than any of the above:

  • One 32-bit core, and many smaller cores
  • Dynamic reconfiguration: reconfigure the soft microprocessors in a FPGA to get one or a few 32-bit processors at times when lots of 32-bit calculations are necessary, and reconfiguring the FPGA to get lots of 8-bit processors when that will give adequate precision and better performance.
  • processing elements with lanes narrower than 8 bits.

While there are many multicore systems that include only 32-bit cores, and many that include only 8-bit cores, I see that the Wikipedia: multi-core processor article mentions many chips that include both a 32-bit processor and a bunch of 16-bit or 8-bit processors.

As I mentioned earlier -- Cheapest FPGAs? :

Simple (i.e., without a MMU) 32-bit CPUs require about 4 times the FPGA resources of an 8 bit CPU. Full-fledged Linux requires a CPU with a MMU (such as the NIOS II/f). A 32-bit CPU with a MMU requires about 4 times the FPGA resources of a 32-bit CPU without a MMU.

By the way, 8 bits is not the "minimum". You may be surprised to learn that all computers built before the 1951 Whirlwind operated on less than 8 bits at once. Most of the early massively parallel processors operated on less than 8 bits at a time -- the Goodyear Massively Parallel Processor, the Connection Machine CM-2, the 2003 VIRAM1 chip, etc.

The most recent report I've seen shows that 4-bit CPUs still outsell (by volume) 32-bit CPUs. Have you seen a more recent report? ( Do 4-bit CPUs still outsell 32-bit CPUs in unit volume? )

davidcary
  • 17,426
  • 11
  • 66
  • 115