2

I cannot make sense of the difference between 'Instruction size' and 'Instruction encoding' specially about ARM and Thumb ISA's as explained here:

Can we say that Instruction size is 32 bits but its encoding is 16-bits for Thumb-1 ISA and 32 bits for Thumb-2?

IS 'encoding' related to the binary code generated by the Assembler or is it related to the MCU internal architecture and not visible to the software developer?

ARM Instruction size

alt-rose
  • 1,441
  • 14
  • 29
  • 1
    "Instruction Encoding" is a reference to the specific definition of what opcode does what; in other contexts it might be called the "Instruction Set" and as you see in the broad ARM family there are several. – Chris Stratton Aug 05 '19 at 15:40

2 Answers2

3

From my point of view they are both "interconnected". The old style CPU instruction sets were all the same size, now with this Thumb mode the instruction size is variable, where some of the bits (encoding) of the instruction determines also its size . It is explained in the link you provided: the CPU fetches the first half (16bit) of the instruction, then it branches wheather it is 32bit instruction it fetches the second half (additional 16bits) of it, if it's only 16bit instruction it executes it without fetching the second half.

With this kind of instructions also the programming memory is smaller compared to fixed size of instructions.

Marko Buršič
  • 23,562
  • 2
  • 20
  • 33
-1

thumb is 16 bit instructions.

thumb2 are extensions to thumb the first 16 bits are decoded and seen to be a thumb2 extension an additional 16 bits of instruction is then used. They are all 32 bit. These do not have to be 32 bit aligned, along with thumb they have to be 16 bit aligned.

encoding means the machine code. see examples below

arm instructions both the aarch32 traditional arm instructions and the aarch64 bit instructions are 32 bits and must be 32 bit aligned. they are completely incompatible instruction sets to each other.

The 32 bit arm instructions and 32 bit thumb2 extensions are not compatible. The features of each are not completely compatible there may be as we see below a three register xor, but the encoding is not the same. The unified instruction set (assembly language) makes this more complicated not less.


thumb instructions are all 16 bit (yes even bl, which is two independent instructions).

thumb2 extensions are formerly undefined thumb instructions they are all 32 bits total but consider them as variable length two halfword instructions. Do not need to be aligned on 32 bit boundaries.

armv6-m added the first couple-three dozen thumb2 extensions.

armv7-m (and armv7) added over 100 more.

armv8-m not to be confused with armv8 add security features but fall back anywhere from a whole to partial subset of the thumb/thumb2 instructions. What instructions are supported is core specific.

armv7 and older are the traditional arm instructions.

armv8 64 bit instructions are 32 bits but use the 64 bit registers.

The armv8 processors with 64 bit instructions have a 32 bit armv7 compatibility mode aarch32 and support the armv7 instruction sets.

The aarch64 (32 bit) and aarch32 (32 bit) instructions are incompatible with each other, arm did a do over with instruction sets.

armv4t is where arm (advanced risc machines) took over from acorn. thumb instructions were introduced with armv4t which is the arm7tdmi basically, also arm is supported in this core.

armv5t also supported the thumb instruction set as well as arm. think arm9

armv6 supports arm and thumb but not thumb2. think arm11

armv6-m is for microcontrollers the cortex-m0 first set of thumb2 extensions added here. does not support arm thumb with thumb2 extensions only

armv7 is where the cortex-a's start, support arm and thumb with thumb2 extensions.

armv7-m full complement of thumb2 extensions added here, no arm support thumb with thumb2 extensions only. cortex-m3,m4,m7

armv8 is where the 64 bit instruction set starts, in general supports traditional arm, thew new aarch64 bit instructions (both are 32 bit) and through aarch32 thumb with thumb2 extensions. There are supposedly 32 bit only armv8s which are only the aarch32 part and not the aarch64 part. have not followed up on these to see if they exist.

armv8-m the new security microcontrollers. a whole or subset of thumb and a whole or subset of thumb2 extensions, depends on the core. see cortex-m22,m23,m32 some numbers like that.

arm instructions were added at almost all levels from arm1 to armv7.

then there are the floating point instruction sets which have varied over time some are just different syntaxes for the same instructions, some are not.

encoding is the machine code and it is very much visible to the software developer if he/she chooses. Many folks never look, although sometimes they should, but should be aware that they can and how to do it.

for example for this function

unsigned int fun ( unsigned int a, unsigned int b, unsigned int c )
{
    return (b^c);
}

these are different possible implementations in the various instruction sets.

traditional arm armv7 and older, 32 bit instructions

00000000 <fun>:
   0:   e0210002    eor r0, r1, r2
   4:   e12fff1e    bx  lr

traditional thumb (doesnt have a three register xor, no room), 16 bit instructions (most compatible, works on most cores)

00000000 <fun>:
   0:   4051        eors    r1, r2
   2:   0008        movs    r0, r1
   4:   4770        bx  lr

thumb2 extension, 32 bits, decoded initially as a 16 bit, from that sees it is variable length (the eor is thumb2 extension, bx lr is thumb)

00000000 <fun>:
   0:   ea81 0002   eor.w   r0, r1, r2
   4:   4770        bx  lr

aarch64, 64 bit instruction set, 32 bit instructions, incompatible with prior arm instruction set.

0000000000000000 <fun>:
   0:   4a020020    eor w0, w1, w2
   4:   d65f03c0    ret
old_timer
  • 8,203
  • 24
  • 33
  • 3
    I suspect the correct answer is hiding in here *somewhere* but while I don't quite agree with the downvote, this needs some improvement in organization, clear presentation of the actual answer, and overall readability before it could be considered a *good* answer. – Chris Stratton Aug 05 '19 at 15:42
  • 1
    the answers were written before the code at the end and still are. re-written. – old_timer Aug 06 '19 at 02:00
  • 3
    No, they aren't. You've got a broad survey, but pay precious little attention to the **specific questions actually asked** A good answer highlights the response to the actual questions, it might supplement with additional information, but not hide the answers to the degree which you have. The overall quality here is very low. – Chris Stratton Aug 06 '19 at 02:04
  • 1
    you are welcome to rewrite it the questions are thumb 16 bit I answered yes, are thumb2 32 bit, answer yes. what does encoding mean. answered as well as showing examples. please post your answer – old_timer Aug 06 '19 at 02:11
  • 1
    encoding not visible to the programmer, not true it is but the programmer needs to know how to find/see it. – old_timer Aug 06 '19 at 02:12
  • 1
    pretty much covered the question marks from the OP. what did I miss? – old_timer Aug 06 '19 at 02:13