Why is a datapath in microcontrollers always a power of 2 wide?

Question

Microcontrollers data paths are always a power of 2 wide: 4 bit, 8, 16, 32 bit, etc. Even PICs which use 12-bit wide instructions are 8-bit controllers. Why? Is there any design advantage to this? What's wrong with a 12-bit databus, or a 7-bit controller?

edit
The 7-bit doesn't make much sense, but it's what made me think of the question. The answers refer to the tradition of 8-bits. But 16-bit isn't 8-bit, and 24-bit can handle 8-bit data as well as 16-bit, right? Why did they skip 24-bit to go to 32-bit?

@m.Alin the horror! Before you know it people will go back to bit slice designs and writing their own assemblers. — kenny, Jul 27 '12 at 12:05
He should be sentenced to 3 years of using only old 5-bit code teletypes (these had separate codes for "shift" to access more letters, among other joys). — Olin Lathrop, Jul 27 '12 at 13:27
@OlinLathrop and if one of those shift characters got corrupted, such as in RTTY, you can easily end up with complete gibberish. — W5VO, Jul 27 '12 at 13:29
The modern-day analogue to shift-in/shift-out would be...banked memory! — gbarry, Jul 27 '12 at 19:51
All the links to "7 bit controller" end up at a 404 page. So they don't exist? — gbarry, Jul 27 '12 at 19:54
@gbarry - The question was deleted. OP asked what kind of controller he would need for a specific task, and Olin made the joke "Actually 8 bit sounds like overkill. Try a 7 bit". — stevenvh, Jul 28 '12 at 05:52
They didn't skip 24 bit processors. The Motorola/freescale 56000 and 563xx families of processors all have a 24 bit data path. This exception proves that your postulated rule is erroneous. — uɐɪ, Aug 01 '12 at 07:18
@Ian - the 56000 doesn't count as microcontroller, it's a DSP, targeted at audio applications. For 16-bit calculations you had 8 bits headroom, just like you had 8 bit headroom for 24 bit x 24 bit MAC operations --> 56 bit result. — stevenvh, Aug 04 '12 at 17:35
@shimofuri - bits are 2^0, 2^1, 2^2, etc. I don't see a link with the 8 = 2^3 byte length. — Federico Russo, Aug 04 '12 at 17:40

score 8 · Accepted Answer · answered Jul 27 '12 at 12:35

Tradition has a strong pull. But so does interoperability. Pretty much every existing file format and communications protocol operates on bytes. How do you handle these in your 7-bit microcontroller?

The PIC gets away with it by having the instruction space entirely seperate and programmed in advance from outside. There is some value in bit-shaving the instruction set, as it's the one thing you get to control yourself as a microprocessor designer.

If you want an extreme architecture, you could Huffman code the instruction set, giving you variable length bitness.

score 7 · Answer 2 · edited Jan 26 '13 at 03:56

Because most of the world has converged on storing, communicating, and otherwise handling computer data in chunks of 8 bits. It's not a official standard, but it is a very strong ad-hoc one.

In the past there have been machines that handled multiples other than 8 bits in their data paths. Examples include the CDC Cyber 6000 and 7000 series and the PDP-8. The CDC machines used 6-bit bytes, and the PDP-8 had a 12-bit wide word with no special way of dealing with just 8-bit quantities. There were certainly other machines in this category too. The reason you don't hear about them much today is because people have decided they want machines that can handle their 8-bit bytes nicely, and that's what manufacturers make. How well do you think a 7-bit microcontroller would sell? Whoever made one would get soundly ridiculed and then find few customers. It would be a stupid business proposition.

You can see some additional evidence of non-8-bit "bytes" if you look at the internet standards. They deliberately use the term "octet" because back then there wasn't universal agreement that a byte was always 8 bits. Nowadays the meaning of byte has converged to mean 8 binary bits and you'd get laughed out of town if you tried to use it differently.

Russell McMahon · Answer 3 · 2012-07-27T15:25:07.847

7

4 bits sensible minimum:
0-9 Numeric data needs 4 bits
0-9 = 10 words.
Next highest binary word size = 4 bits = 16 possible words.
So BCD data (binary coded decimal) = 4 bits

8 bits logical next jump
0-9, a-z, A-Z = 10+26+26 = 62 words.
Could handle with 7 bits = 128 words.
8 is about as easy as 7 and allows 2 x 4 bits so numeric data can be packed 2 per 8 bit byte.

Then 12 bits (not 16) ?:
Next logical size up = 12 bits and the early and very successful PDP-8 used 12 bits. 12 bits used for data and program allow 2^12 = 4096 address locations. As Bill Gates may possibly have once said "4K of memory should be enough for anyone".

The following PDP-11 family used 16 bits.

Doubling for compatability.
If you wish to interoperate with systems at lower and higher levels and if you want to have more capable devices in the same family, then being able to handle 2 words of the smaller system within the larger system word makes much sense.

BUT

The exceptions that prove the rule:

"Always" is such a strong word :-)
1-bit, 12-bit, 18-bit, 36-bit examples below.
18 & 36 bit machines were never microcontrollers.
1 & 12 bit ones were.

The one-bit system mentioned below is really a "random bits as you see fit" system. The one bit data word is essentially a go/no-go flag produced by computation and is used to enable or disable program activity. The program counter is an up counter that advances through memory cyclically with code being enabled or disabled as required. Very very very nasty indeed. By the time it arrived on the market the 8 bit processors of the day were quite mature and the 1-bit processor never really made much sense. I do not know how much use it ever got.

1-bit !!!:

Motorola MC14500B I got an honourable mention by Jack Gansell for best description of this device :-)

Datasheet - click page for PDF download.

enter image description here

12-bit:

Harris HM-6100 aka Intersil IM6100 - 12 bit minicomputer wannabee](http://www.classiccmp.org/dunfield/other/i6100cfs.pdf)

Based on the vastly successful DEC PDP-8 12 bit minicomputer.

Overview

Program memory and data memory occupy the same memory space. The total size of directly addressable memory is 4 K words. Word size is 12 bits. The 6100 doesn't have stack memory.

Program memory size is 4 K words. All conditional instructions allow the processor to skip the next instruction only. To go conditionally to arbitrary address in memory when certain condition is met the code should execute "skip if the condition is not met" instruction first and put direct or indirect unconditional jump instruction after the skip instruction. Unconditional instructions can be used to jump directly within current page (127 words), or jump indirectly within the whole memory space (4 K words). The 6100 supports subroutine calls, but, due to lack of stack memory, the return address for subroutines is stored in memory. There is no "return from subroutine" instruction - the subroutine should use indirect jump to return back to the caller.

Data memory size is 4 K words. The data can be accessed directly within zero page (0000h - 007Fh) or within current 127-word page. The data can be accessed indirectly anywhere in 4 K words of memory.

Wikipedia - Intersil 6100

The PDP-8 & Intersil 6100 had 16 very rich instructions. There is NO substract instruction.
The ADD instruction is named TADD to remind you that it's 2's-complement add so we don't need no ... subtract instruction.

18 bit, 36 bit other - the PDP family:

Wikipedia Programmed Data Processor

PDP1 - 18 bit

PDP2 - 24 bit died aborning

PDP3, PDP6 - 36 bit

PDP-12 User Handbook (preliminary - Wow.
Despite numbering this is pre pre PDP16 - a PDP-8 on steroids with analog I/O capability - and engineering lab machine. I could have had one for free if I'd wanted, but it would not have fitted anywhere sensible - or insensible.
First computer game I ever played was on one of these.
Space War.
Machine was in two small-room sized cabinets.
You'd open a door and walk inside to do stuff to its internals.

edited Jul 27 '12 at 15:25

answered Jul 27 '12 at 14:21

Russell McMahon

147,325
18
210
386

If you want a production 24bit part, ATI's R300/420 GPUs (9500-X600, and X700-850 cards) fit the bill. – Dan Is Fiddling By Firelight Jul 27 '12 at 17:42
The idea of a one-bit micro seems interesting, but how useful was a chip like the MC14500B in practice? I would think that having input data control a program address bit (using a latch and a 2:1 multiplexer) would be more effective than trying to use instructions to manage state. I found a 1990's-era data sheet for the part, which suggests that it stayed around for awhile, but I'm curious how such a chip would actually be used. – supercat Jul 27 '12 at 23:49
@supercat - I found one reference where a designer said thay had an old wirewrap discrete design that thy replaced with one based on a MC14500B and over some years the client bought a number of them. That said, a more crippled and incapable device would be hard to imagine. – Russell McMahon Jul 28 '12 at 02:18
I can't really imagine the design psychology of the 1970's, but Steve Wozniak's floppy-drive controller (featured in the Apple ][) demonstrates that it doesn't take much discrete logic to build a "computer". After looking at the manual for that MC14500B, I thought about how I might design a system to run code from a PROM using just discrete logic (outside of the PROM itself). I think the guts of my machine would be an 8-bit transparent latch, an 8-bit flip flop, and a quad NAND, and possibly an RC delay (depending upon setup/hold requirements for the I/O). One 16-bit instruction per cycle. – supercat Jul 29 '12 at 16:49
Seven bits from the flip flop would drive the upper address bits of the PROM, and the lower bit would be driven by the clock. During the first half of each cycle, the transparent latch would sample the top six output bits (I/O address), and the device would perform an I/O read. The I/O input would drive the seventh transparent latch bit, and 3 NANDs would act as a multiplexer using the latched input to select Q0 or Q1 from the PROM to feed the remaining transparent latch bit. During the second half of each cycle, perform an I/O write with the latched address and data. – supercat Jul 29 '12 at 16:53
At the end of the second half of the cycle, the transparent latch would grab Q0-Q5 from the PROM, along with the output of the NAND mux. The effect would be that the instruction would be "Read I/O address A, and either set, clear, toggle, or leave it unchanged. Then advance to a specified state, with the LSB of the state being uncoditionally high, unconditionally low, equal to the value that was read, or the opposite of the value that was read. No need for a "program counter" as such. The same 3-NAND mux would operate on both data to be written, and next-state selection. – supercat Jul 29 '12 at 16:58
Something like adding or subtracting a constant to/from an N-bit binary number would take 2N instructions (two bytes each) and N cycles. Adding two N-bit binary numbers stored in registers would take 6N instructions and 2N cycles. One could expand ROM and I/O space by using some addressable latches to control upper address bits. ROM-based state machines can be very powerful, even without a "processor". – supercat Jul 29 '12 at 17:13
@supercat - You are more or less describing an "Arithmetic State Machine" which has great capability for specific well defined programs. The MC14500 cripples its throughput by the lack of a program counter which can be "jumped" either unconditionally to a computed address in a few cycles. It needs to cycle through all available memory sequentially until it reaches the point where execution becomes valid. This may be acceptable for eg relay emulation but rapidly becomes vastly too slow (assuming "rapid slowness" is conceptually logical :-)) for many real world tasks. – Russell McMahon Jul 30 '12 at 00:29
@supercat - I loved the Apple II diskette controller - which later transmogrified into the single IC based IWM ("Integrated Woz Machine") , when I first met it and have referred various people to it over the years as an example of minimalist brilliance. [Original circuit here](http://www.laughton.com/Apple/woz-hw.gif) (poor quality copy) and [whole Apple II diagram here ](http://dreher.net/projects/CFforAppleII/images/CFforAppleII_schematic.jpg) – Russell McMahon Jul 30 '12 at 00:37
@RussellMcMahon: The 14500B doesn't confine program execution that way, in that it can be wired up with any kind of program-counter circuit an implementer desires; Motorola suggests using circuits with "jump" and even "call/return". I think a bigger limitation is the separation of load and store. I don't see much way to fix that without effectively ignoring the MC14500B altogether. – supercat Jul 30 '12 at 13:21
You have not mentioned the Motorola 56000 and 563xx series of processors that have been in production since the early 1980's. These have a 24 bit data path. – uɐɪ Aug 01 '12 at 07:14
@Ian - "You have not mentioned ..." -> True. Asymptotically infinitely eclectic answers cost more. – Russell McMahon Aug 01 '12 at 08:55
I still haven't managed to understand the mindset that the "this answer is not useful" downvoter must have to downvote answers like this. Best answer going? - possibly not. Better answer possible? - of course. Useful? - Are you mad !!!? :-) Maybe they are :-) – Russell McMahon Aug 04 '12 at 15:57
+8/0 +8/0 +8/-1 +6/0 +2/0 xx/0 xx/0 xx/0 - Interesting. – Russell McMahon Aug 04 '12 at 15:59

score 2 · Answer 4 · answered Jul 27 '12 at 14:42

There's a little bit of efficiency mixed in with a lot of backwards compatibility as the reason for this common design choice.

If my datapath is 7 bits wide, I need 3 bits to represent any given line of that path. Since I'm going to waste three bits, then I might as well use them up fully, both for efficiency and to eliminate a dead path that might result in a crashing bug.

Most common data types are based on a 4 bit nibble, and most of those are based on an 8 bit byte. By choosing to use an alternate base, you may have to resort to odd and inefficient code to deal with common data types. For instance, my 7 bit computer would require 5 memory spaces to deal with any of the 32 bit numbers, including floating point, that are very common in today's industry.

If my machine was not dependent on outside data, I could probably get away with it, but motor controllers, encoders, temperature sensors, and most real world interface devices and sensors support such standard units.

It's not that it's impossible to interface a 7 bit computer to a USB port, but you're going to be doing a lot of extra testing and run many more instructions processing all those 5 unit transactions for 32 bit data types than you would if you added one bit more to your data path and fell in line with the rest of the industry.

It largely started and coalesced to the current form due to the efficiency of bit addressing, though, so that's the root cause. If, for instance, you were to create a trinary computer (3 states rather than 2 states per bit) then you would see the most efficient bit sizes at 3, 9, 27, 81, etc. You would also see less efficient attempts at 18, 24, 33, and 66 in an effort to provide closer compatibility with binary systems.

"Most common data types are based on a 4 bit nibble". I don't think that's true. Text data (there's *lots* of it) is byte based, and numeric data (including compressed/encoded) is often 16-bit or 32-bit based. The only use of 4-bit data I can think of is BCD, but IMO that's *very* limited. — stevenvh, Jul 27 '12 at 14:53
@stevenvh I think he refers to the large use of HEX to group data...I'd say it's true, in architectures you often see this decomposition — clabacchio, Aug 01 '12 at 07:16

score 1 · Answer 5 · answered Jul 27 '12 at 13:20

1

It's easier because it allows you to specify a number of bits in a number of bits. This may seem like a parlor trick, but instruction sets do it all the time. Think for example of a "shift left" instruction:

SHL R1,4

If you have a number of bits that is a power of two, you can encode the operand in a fixed number of bits without any waste.

answered Jul 27 '12 at 13:20

drxzcl

3,735
2
30
32

1

I thought about that too, but I'm not sure that's enough reason to base a complete architecture on it. – stevenvh Jul 27 '12 at 13:31
1

A bigger factor, I think, is the ability to store and access data in bit arrays. The use of such arrays doesn't represent a huge part of what computers do, but they're hardly uncommon. On a machine with e.g. 8-bit bytes, element i of an array can be read via `arr[i >> 3] >> (i & 7)` or by testing whether `arr[i >> 3] & mask[i & 7]` is non-zero (some CPUs favor one approach over the other). Doing such a thing with a non-power-of-two word size would be much harder. – supercat Jul 27 '12 at 17:26
@supercat Not much harder, only much slower. One just replaces a shift with a division. Which I'm sure you already know, but just thought it's worth pointing out. – Roman Starkov Aug 04 '12 at 12:39

score 0 · Answer 6 · answered Aug 01 '12 at 06:47

0

As far as my knowledge in digital electronics is concerned, the reason seems to be quite obvious. The thing is, for all the digital systems we utilize the concept of binary number system. It means, we have just two levels of operation i.e. 0 or 1. So any combination which is possible at the hardware level has to be a combination of 0s and 1s. So if we need to perform 4 different tasks then we need 2 variables,for 32 tasks we need 5 variables and so on (the logic being 2^n where n is the number of variables). So, as we are dealing in just two levels so we are bound to have a combination as a power of 2 only, which consequently accounts for having 2,4,8,16,32,64,128,256 and so on........

answered Aug 01 '12 at 06:47

naved

1

1

This doesn't explain the powers of 2 for the word length. You mention a length of 5 yourself as sufficient for 32 combinations. OP asks why there aren't 5-bit controllers. – stevenvh Aug 01 '12 at 06:52
@stevenvh:That is absolutely fine sir. I got it. Perhaps I did not focus upon the question properly. Thank you. – naved Aug 01 '12 at 07:23
I've moved your answer to a comment, that you can use from the box below. Answers are targeted to provide content on the question, the rest is for comments. – clabacchio Aug 01 '12 at 07:52

score 0 · Answer 7 · answered Aug 01 '12 at 10:51

I'll admit I've only skimmed the other answers, but one key detail seems only indirectly addressed: logic speed and compactness.

If you were to pack 24-bit values contiguously, and also be able to access them in a byte addressed fashion, then the processing logic would need to divide by 3 for word access. Division is rather expensive to do in logic (just check your favourite processor reference - the division instruction is slow), unless it's specifically by a power of 2; in that case, you just ignore the lowest bits (which software does by bit shifting). That is the same fundamental reason we prefer aligned accesses.

It is of course possible to design a processor around these limits, perhaps even encoding the word accesses as the fourth address value (since two bits are needed to select among three bytes), and I wouldn't be terribly surprised to see such in a DSP (such as a GPU); but it's not the norm for CPUs. It would also end up with a weird 4/3rd stepping for byte array accesses, which would need handling similar to BCD numbers. It thus becomes much more efficient to handle an array of 3x8bit vectors instead of 8bit bytes.

Why is a datapath in microcontrollers always a power of 2 wide?

7 Answers7