thumb is 16 bit instructions.
thumb2 are extensions to thumb the first 16 bits are decoded and seen to be a thumb2 extension an additional 16 bits of instruction is then used. They are all 32 bit. These do not have to be 32 bit aligned, along with thumb they have to be 16 bit aligned.
encoding means the machine code. see examples below
arm instructions both the aarch32 traditional arm instructions and the aarch64 bit instructions are 32 bits and must be 32 bit aligned. they are completely incompatible instruction sets to each other.
The 32 bit arm instructions and 32 bit thumb2 extensions are not compatible. The features of each are not completely compatible there may be as we see below a three register xor, but the encoding is not the same. The unified instruction set (assembly language) makes this more complicated not less.
thumb instructions are all 16 bit (yes even bl, which is two independent instructions).
thumb2 extensions are formerly undefined thumb instructions they are all 32 bits total but consider them as variable length two halfword instructions. Do not need to be aligned on 32 bit boundaries.
armv6-m added the first couple-three dozen thumb2 extensions.
armv7-m (and armv7) added over 100 more.
armv8-m not to be confused with armv8 add security features but fall back anywhere from a whole to partial subset of the thumb/thumb2 instructions. What instructions are supported is core specific.
armv7 and older are the traditional arm instructions.
armv8 64 bit instructions are 32 bits but use the 64 bit registers.
The armv8 processors with 64 bit instructions have a 32 bit armv7 compatibility mode aarch32 and support the armv7 instruction sets.
The aarch64 (32 bit) and aarch32 (32 bit) instructions are incompatible with each other, arm did a do over with instruction sets.
armv4t is where arm (advanced risc machines) took over from acorn. thumb instructions were introduced with armv4t which is the arm7tdmi basically, also arm is supported in this core.
armv5t also supported the thumb instruction set as well as arm. think arm9
armv6 supports arm and thumb but not thumb2. think arm11
armv6-m is for microcontrollers the cortex-m0 first set of thumb2 extensions added here. does not support arm thumb with thumb2 extensions only
armv7 is where the cortex-a's start, support arm and thumb with thumb2 extensions.
armv7-m full complement of thumb2 extensions added here, no arm support thumb with thumb2 extensions only. cortex-m3,m4,m7
armv8 is where the 64 bit instruction set starts, in general supports traditional arm, thew new aarch64 bit instructions (both are 32 bit) and through aarch32 thumb with thumb2 extensions. There are supposedly 32 bit only armv8s which are only the aarch32 part and not the aarch64 part. have not followed up on these to see if they exist.
armv8-m the new security microcontrollers. a whole or subset of thumb and a whole or subset of thumb2 extensions, depends on the core. see cortex-m22,m23,m32 some numbers like that.
arm instructions were added at almost all levels from arm1 to armv7.
then there are the floating point instruction sets which have varied over time some are just different syntaxes for the same instructions, some are not.
encoding is the machine code and it is very much visible to the software developer if he/she chooses. Many folks never look, although sometimes they should, but should be aware that they can and how to do it.
for example for this function
unsigned int fun ( unsigned int a, unsigned int b, unsigned int c )
{
return (b^c);
}
these are different possible implementations in the various instruction sets.
traditional arm armv7 and older, 32 bit instructions
00000000 <fun>:
0: e0210002 eor r0, r1, r2
4: e12fff1e bx lr
traditional thumb (doesnt have a three register xor, no room), 16 bit instructions (most compatible, works on most cores)
00000000 <fun>:
0: 4051 eors r1, r2
2: 0008 movs r0, r1
4: 4770 bx lr
thumb2 extension, 32 bits, decoded initially as a 16 bit, from that sees it is variable length (the eor is thumb2 extension, bx lr is thumb)
00000000 <fun>:
0: ea81 0002 eor.w r0, r1, r2
4: 4770 bx lr
aarch64, 64 bit instruction set, 32 bit instructions, incompatible with prior arm instruction set.
0000000000000000 <fun>:
0: 4a020020 eor w0, w1, w2
4: d65f03c0 ret