TL;DR
What procedure is followed when selecting bytes to represent opcodes? Are byte(s) for opcodes just randomly chosen, and them mapped to mnemonics?
I recently learned from this answer that bytecode usually consists of instructions which have one opcode, which consists of a fixed number of bytes, and operands. Specifically this snippet from ratchet freak's answer:
The bytecode itself is very often a very simple syntax. Where the first few bytes indicate what operation has to be performed and what operands are needed. The bytecode will be designed so that when reading byte per byte there is a unambiguous interpretation of the instructions.
I took that advice and began designing my bytecode instruction set. But I soon came to a problem. Before I asked this question, I had tried to create opcodes using methods such as:
# pseudo code
opcodes = {
'PUSH': convertToBytes(5),
'POP': convertToBytes(10),
'ADD': converToBytes(15),
etc...
}
As you can probably tell, in the above example I used integers that were multiples of five, and converted them to byte form. I was trying to create a way in which I could orderly map my opcodes, to something such as integers. This of course did not work because as each integer became larger, So did the number of bytes relative to each integer. Which meant that my opcodes would've been variable lengths.
I then began to wonder if I was going about this the wrong way. I did some research to see how other languages design there opcodes.
I found this webpage about CPU opcodes that said:
The x86 CPU has a set of 8-16 bit codes that it recognizes and responds to. Each different code causes a different operation to take place inside the registers of the CPU or on the buses of the system board.
Here are three examples showing the bit patterns of three actual x86 opcodes, each followed by their one or more bytes of operands:
And proceed to give an example:
Bit Pattern ; operation performed by CPU
----------- -------------------------------------------------------
1. 10111000 ; MOVe the next two bytes into 16-bit register AX
2. 00000101 ; ...the LSB of the number (goes in AL)
3. 00000000 ; ...the MSB of the number (goes in AH)
1. 00000001 ; ADD to the BX register
2. 11000011 ; ...the contents of the AX register
1. 10001001 ; (2-byte opcode!) MOVe the contents of BX to
2. 00011110 ; ...the memory location pointed to
3. 00100000 ; ...by these last
4. 00000001 ; ...two bytes
This leads me to my question: What procedure is followed when selecting bytes to represent opcodes?. As in the above example, each instruction consists of one byte. How though, was that specific pattern of a byte picked?. Are byte(s) for opcodes just randomly chosen, and them mapped to mnemonics? eg:
# pseudo code
opcodes = {
'PUSH': selectRandomByte(),
'POP': selectRandomByte(),
'ADD': selectRandomByte(),
etc...
}
Note: Let me clarify: When I say opcode, I am referring to the opcodes found in Virtual Machine bytecode, not CPU's. I apologize if this was not clear before. The example I gave with the CPU opcodes was only for illustration purposes only.
Sources