ISA opcodes---Where do they come from?

Question

When engineers are designing an instruction set architecture, by what procedure or protocol, if any, do they follow when designating certain binary codes as instructions. For example, if I have an ISA that says 10110 is a load instruction, where did that binary number come from? Was it modeled from a state table for a finite state machine representing a load operation?

Edit: After doing more research, I believe what I'm trying to ask concerns how the opcodes for the various CPU instructions are assigned. ADD might be designated with an opcode of 10011; a load instruction might be designated as 10110. What thought process goes in to assigning these binary opcodes for the instruction set?

Monte Dalrymple's, "Microprocessor Design Using Verilog HDL," provides a very detailed design approach for the Z80 CPU and from it I think you'd learn a lot about your question. But there are many considerations that go into a specific choice, including statistical analysis of other instruction sets, compiler outputs, etc. I'd recommend starting with that book, though. Although it starts with a known design, he goes into intimate detail about it and I think you'd pick up a few things. Good book. — jonk, Sep 05 '17 at 07:12
Or, perhaps, you are asking about the execution engine design and wondering how the bits in the instruction might play into that? Not sure from your wording. — jonk, Sep 05 '17 at 07:14
@Steven Think about it. If *you* had to design an ISA, what would *you* think about? If your instructions weren't all of the same length, how would *you* pick shorter or longer instruction words, for which instructions? If you had to design a *decode stage*, what would you *wish* for your ISA to look like? I think the question is unnecessarily broad (and thus, near impossible to answer completely), but you can improve it a lot by putting some more own thought into it and asking a *precise* question that wouldn't require us to write a book to answer it. — Marcus Müller, Sep 05 '17 at 09:05
The [RISC-V specifications](https://riscv.org/specifications/) talk about the design decisions they made at all levels, including a fair bit about the encoding of machine instructions. (This is unusual for a processor manual; RISC-V is an academic exercise first and a CPU architecture second, unlike most.) — zwol, Sep 05 '17 at 21:25
For a 10k-30k foot view, check out the book Code by charles petzold — galois, Sep 06 '17 at 15:01

score 22 · Answer 1 · 2017-09-05T09:53:37.717

22

It depends how old the ISA is.

In the early days of hand design, and even more so when CPUs were assembled from discrete logic, the logic design would have come first, and been extensively minimised, and then the ISA bit patterns would have been whatever values were required to make that minimal logic work.

So there may be a particular pattern of control signals that enable some multiplexers to connect the ALU output to the input of the GP register file, a few more control signals that instruct the ALU to add, subtract, AND, OR etc, and a few address bits into the register file. These three groups of signals will form fields within the instruction. Each group will be kept together, and their detailed meaning arises out of the design for that unit (ALU etc) but the groups may be in any order, up until you design the instruction decoder. (the x86 is old enough that you can detect some of this if you look in the right place - it wasn't a totally new design, but drew from the older 8080)

Later ISAs may be "cleaned up" and made more regular and simpler to use, with hardware to translate between them and the actual hardware-level control signals, sometimes via "microcode". These are called "CISC" or "Complex Instruction Set Coding". The x86 "Rep" instruction prefix is a simple example of this - it causes the following instruction to be repeated a number of times, to save having to write a FOR loop.

Later still (in the 1980s) came a movement back to a simpler style of direct encoding (RISC - Reduced Instruction Set Coding) which you can see in the ARM processors. This was driven by the small size of ASICs at the time, and the desire to put 32-bit CPUs on them, hence there was no spare capacity for complex instruction set decoders, to get the complete CPU down to about 20,000 gates. (There was also a temporary performance boost, because people hadn't developed techniques to make CISC decoders fast yet - that came about 1995 with the Pentium Pro)

And nowadays it doesn't matter - CPUs read several instructions at once, and devote millions of transistors to decoding them, re-ordering them, and executing as many as possible at once, to speed up programs that may have been written for the oldest style of ISA.

edited Sep 05 '17 at 09:53

answered Sep 05 '17 at 09:48

2

I'm not sure I'd really call CISC "easier to use". That may have been the original intention, but 30 years later they're kinda the antithesis of "easy to use" (compared to RISC ISAs, at least). – tonysdg Sep 05 '17 at 14:41
2

There are respects in which they were easier to use ... either regularity (orthogonality was a big topic) back when compilers were relatively trivial programs, or through supporting higher level operations directly, requiring less translation from the compiler. But that was a LONG time ago and any surviving CISC has so many layers of revisions on top of its original instruction set. Compilers have changed out of all recognition too - the thousand or so optimisation passes performed by gcc would have been unthinkable back then. So what was "easy" then and now bear very little relation. – Sep 05 '17 at 14:49
4

The distinction has been both eroded ("RISC" sets adding more instructions) and superceded by new, even more complex architectures such as VLIW; really the only consensus is that x86 (16 and 32 bit) is hard to use – pjc50 Sep 05 '17 at 14:49
Is microcode an attribute exclusively or primarily of CISC processors, as is suggested here? – Wayne Conrad Sep 05 '17 at 21:49
1

@tonysdg: There are hard to use RISC and hard to use CISC. A good comparison of "programmer friendliness" is to compare 68k vs ARM. The ARM was designed for a compiler so you had to do a lot of manual work to get data from RAM and write back to RAM. The 68k was designed for assembly programmers and allows you to operate directly on data in RAM. If you look at 68k ISA you'll find that it looks a lot like modern RISC ISA with one exception - you can operate directly on RAM whereas RISC only allows you to operate on registers. – slebetman Sep 06 '17 at 03:59
1

Microcode is primarily a CISC attribute. However you could implement CISC without microcode : the instruction decoder would be more complicated. You'll also see some CISCs from the Pentium-Pro onwards described as RISC internally; translating each CISC instruction into one or more internal RISC ops : another name for microcode (though the distinctions get blurred in superscalar execution units) – Sep 06 '17 at 08:40
1

@slebetman Also, the x86 has been pretty much the same since about the 486 (and even more so, Pentium), except for keeping the old instructions for compatibility. Operating on registers instead of memory operands allowed the Pentium to fully utilise its super-scalar architecture and the 486 to efficiently handle the instruction pipelining with dependencies involved (in fact, even register access lagged for certain combinations of instructions). x86 has pretty much been a RISC with few registers and some leftover (and usually much slower) CISC instructions; and x86-64 fixed that as well. – Luaan Sep 06 '17 at 10:03
@Luaan which much slower CISC instructions do you mean? – Ruslan Sep 06 '17 at 15:37
@Ruslan With the first Pentium, even arithmetic directly using a memory operand was still slower; e.g. `cmp reg, mem` takes twice the cycles of `mov reg, mem; cmp reg, reg`. `add mem, reg`could be three times as slow than the equivalent with intermediate registers. Even worse are more complicated instructions like `xchg`, `xlat` or `enter`/`leave`. Of course, how much this impacts your code depends on a lot of other factors as well :) They're all replaceable with equivalent RISC code that's faster and usually more scalable (though that might change if you can't spare the registers). – Luaan Sep 07 '17 at 18:13
@Luaan hmm, you say that x86-64 fixed that, but all these instructions are still present in native 64-bit mode, unlike those like `aad`, `bound`, `das`, `into` etc.. – Ruslan Sep 07 '17 at 18:26

pjc50 · Answer 2 · 2017-09-05T11:28:23.630

9

If you group similar instructions together, patterns will emerge. This is very obvious in ARM, where the ISA manual actually shows you which bit of an instruction word correspond to function, register choice, etc. But it can also be inferred for X86.

Ultimately the "function" part of opcodes go into some binary-to-onehot decoder that actually activates a particular function or sequence of pipelined operations. They are not usually related to the contents of any state machine, unless we're considering variable-length instructions which require a state machine to decode.

edited Sep 05 '17 at 11:28

answered Sep 05 '17 at 09:15

pjc50

46,540
4
64
126

You're basically saying they are gunning for lowest possible transistor count on the chip. I totally agree in the context of OP's question, where they can't afford hundreds of extra transistors for a neater instruction set. The million-transistor CPUs don't have nearly as much of a reason to care, but of course many retain it for backward compatibility. – Harper - Reinstate Monica Sep 05 '17 at 16:42
@Harper There's still reason, because while the transistors got smaller, they still have a size - and clock rates increased a lot in the meantime. So an instruction decoder that's too big can still be a bottleneck for performance (one of the reasons many CPUs opted to *pre*-decode instructions, ahead of time). It's not (just) about the transistor count, but more about clock rate in combination with die area. Information still takes time to propagate, and while modern CPUs aren't running at the speed of light, they're not far enough from the speed limit to expect significant improvements. – Luaan Sep 06 '17 at 10:09
@Luaan: Actually, "what do we do with all these transistors" is a real question nowadays. Look at all the L2/L3 caches thrown around nowadays. That is a silent admission we don't have a better use for all those millions of transistors. The latest Xeon's dedicate over 2 _billion_ transistors to cache! – MSalters Sep 06 '17 at 12:46

score 6 · Answer 3 · answered Sep 05 '17 at 09:33

6

Someone at some point sat down and defined them.

A good ISA will make the decoder as simple as possible.

For example with an ALU instruction you could let some bits of the opcode be sent directly into the control lines of the ALU.

answered Sep 05 '17 at 09:33

ratchet freak

2,803
14
12

Thanks to all for the excellent answers. You've all helped me understand this a lot better. – Steven Sep 06 '17 at 04:28
4

There are actually quite a few factors other than decoder simplicity to take into account. Depending upon circumstances and intended use, others (e.g., code density) may be more important than decoder simplicity. In a modern processor, code density probably outweighs decoder simplicity in *most* cases. – Jerry Coffin Sep 06 '17 at 04:37

score 6 · Accepted Answer · answered Sep 06 '17 at 12:42

In a lot of cases, the choice is pretty arbitrary or based on "wherever it fits best" as ISAs grow over time. However, the MOS 6502 is a wonderful example of a chip where the ISA design was heavily influenced by trying to squeeze as much as possible out of limited transistors.

Check out this video explaining how the 6502 was reverse engineered, particularly from 34:20 onwards.

The 6502 is an 8-bit microprocessor introduced in 1975. Although it had 60% fewer gates than the Z80 it was twice as fast, and although it was more constrained (in terms of registers etc.), it made up for that with an elegant instruction set.

It contains just 3510 transistors, which were drawn out, by hand, by a small team of people crawling over some large plastic sheets which were later optically shrunk down, forming the various layers of the 6502.

As you can see below, the 6502 passes the instruction opcode and timing data into the decode ROM, then passes it into a "random control logic" component whose purpose is probably to overrule the ROM's output in certain complex situations.

At 37:00 in the video you can see a table of the decode ROM which shows what conditions the inputs must satisfy to get a "1" for a given control output. You can also find it on this page.

You can see that most of the things in this table have Xs in various positions. Let's take for instance

011XXXXX 2 X RORRORA

This means the first 3 bits of the opcode must be 011, and G must be 2; nothing else matters. If so, the output named RORRORA will go true. All the ROR opcodes start with 011; but there are other instructions which start with 011 also. These probably need to be filtered out by the "random control logic" unit.

So basically, opcodes were chosen so that instructions which needed to do the same thing as each other had something in common across their bit pattern. You can see this by looking at an opcode table; all the OR instructions start with 000, all the Store instructions start with 010, all the instructions which use zero-page addressing are of the form xxxx01xx. Of course, some instructions don't seem to "fit", because the aim is not to have a completely regular opcode format but rather to provide a powerful instruction set. And this is why the "random control logic" was necessary.

The page I mentioned above says that some of the output lines in the ROM appear twice, "We assume this has been done because they had no way of routing the output of some line where they wanted, so they put the same line at a different location again." I can just imagine the engineers hand-drawing those gates one by one and suddenly realising a flaw in the design and trying to come up with a way to avoid re-starting the whole process.

score 5 · Answer 5 · answered Sep 05 '17 at 09:04

Typically, you would split your ISA into functional groups. It makes sense (either for logic optimisation or just being tidy) that complimentary pairs are differentiated by a single bit change (load vs store), and that you have some hierarchy of bits which affect the decode decision tree.

At the end of the day, an arbitrary allocation of bits for the function block (as opposed to placing the 'data' fields in the instruction will only have a small impact on your overall design efficiency - but you have plenty of choices about how to 'optimise' your ISA encoding depending on what you feel is an important parameter.

score 1 · Answer 6 · answered Sep 06 '17 at 11:35

Instruction encoding is an ugly compromise between.

Making the decode simple, for this you want a simple set of fields each of which can be decoded seperately and routed to a seperate part of the execution engine.

Packing as much functionality as possible into a limited size of instruction word. This leads to things like special constant formats that can encode a variety of common numbers.

Forward and backwards compatibility. If you assign functionality to every possible opcode you give yourself no room to expand the architecture later. If you are adding to an existing architecture you have to slot your new instructions into the spare opcodes.

score 1 · Answer 7 · answered Sep 06 '17 at 14:05

Randy Hyde's excellent (if somewhat dated) The Art of Assembly goes into the x86 instruction set in some detail in chapter 3.3.4 The Control Unit and Instruction Sets and following.

Programs in early (pre-Von Neumann) computer systems were often "hard-wired" into the circuitry. That is, the computer's wiring determined what problem the computer would solve. One had to rewire the circuitry in order to change the program. A very difficult task. The next advance in computer design was the programmable computer system, one that allowed a computer programmer to easily "rewire" the computer system using a sequence of sockets and plug wires. A computer program consisted of a set of rows of holes (sockets), each row representing one operation during the execution of the program. The programmer could select one of several instructions by plugging a wire into the particular socket for the desired instruction.

He then demonstrates quite catchy and at length how the first couple of plugs stand for the instruction, the next plugs encode source and destination. Of course, today nobody "plugs" anymore, but for the really old ISA's, the bits in the opcode basically do the same job as the plugs before.

You end up with something like this:

Thank you for the link from Hyde! It's very informative and he seems to have an excellent teaching style. — Steven, Sep 06 '17 at 22:38

ISA opcodes---Where do they come from?

7 Answers7