How is the code written in ARM7 compatible with ARM9?

Question

I'm studying ARM families,

In the above image, it is written that the code by supported ARM7 can be migrated to ARM9 and others.

ARM7 uses the Von-Neumann architecture (single bus for data and instructions) and 3-stage pipeline (i.e., fetch, decode and execute). ARM9 and others use the Harvard architecture (separate bus for data and instructions) and 5-stage (fetch, decode, execute, memory and write (for ARM9)). Also, ARM7 doesn't support a memory management unit, but others do.

How can the code be compatible if the processors are using different architectures and pipelines? Won't there be any affect of architectures to the codes?

I'm assuming, as ARM9, ARM10, ARM11 have same architectures, code can be compatible, but ARM7 is different from other processors. Hence, one must do some changes in code before migrating because of different architectures. I'm wondering if it is correct or not.

I wish everyone would stop getting worked up about harvard vs von neumann. In the field a true Harvard is somewhat useless, it is an academic exercise. Modified Harvard which most processors are just means some control signals on the shared bus indicate this transfer is a fetch and that is data. Of course the memory/I/O bus has absolutely nothing to do with the processor core and instruction set nor does the nature of the pipeline. Look at x86 if nothing else, but mips, arm, pretty much every core that has seen more than one product. — old_timer, Dec 31 '17 at 15:06
Just look at MIPS, every other engineer around the world had to do some sort of mips simulator or verilog implementation, sometimes serial and sometimes parallel as well, all looking at a binary spec for the machine code, pipeline depth, signal names, etc are all up to the programmer/engineer. Take 100 engineers give them the armv4t instruction set and tell them to impement that you will get somewhere between 1 and 100 different solutions, all perfectly valid. one thing has nothing to do with the other. — old_timer, Dec 31 '17 at 15:08
the armv4t instructions have been supported up through the armv8 implementations running in aarc32 (armv7 compatible mode). if/when you just read the arm docs you will see that over time new instructions are added. They discourage the use of swp in multi-core implementations when the core shares memory, but dont think they actually removed the instruction, most folks dont actually read about ldrex/strex or how to use it and misunderstand the swp thing. — old_timer, Dec 31 '17 at 15:11
some of the bug and ip protection ("unpredictable results") in the ARM7TDMI have been fixed as you will see when you actually read the manuals over time. the arm9 might have added some new ones due to the nature of the technology at the time, but since then validation has gotten better. — old_timer, Dec 31 '17 at 15:14
@WayneConrad certainly not, they are still super popular. avrfreaks preceeded the arduino community, which really has no rival. sadly both are under microchip. ateml was very good with docs and their community and microchip has been pretty bad about that. they are complete products not some foreign processor ip that they pulled in as is and glued stuff around. Perhaps they still glued stuff on sure who knows where their ip originally came from but they are still a force in the market. — old_timer, Jan 01 '18 at 21:57

dim · Accepted Answer · 2017-12-31T18:08:16.400

19

A processor is said to be code-compatible with another if their instruction set is compatible. That is all there is to it.

Now, instruction sets can be made compatible whatever their pipeline architecture is. It is right that pipelining can have a consequence on the instructions if you only target execution speed and core silicon area, but there are always workarounds if you need to ensure some compatibility with some existing processor. It may complexify the core, but there is always a way. Look at how the architecture evolved from the 8086 to the newest Pentiums. Yet old code can still be executed.

Regarding the Von Neumann / Harvard differences, it can also be made to have minimal impact if the code and data busses actually end up to the same physical memory blocks with the same adresses (which is the case on all ARM implementations I have seen, except maybe for peripherals memory zones). There may be an impact on corner cases like the need to call specific instructions when the code is self-modifying, but in normal cases, you won't notice.

Regarding memory management, that is another story. This has an impact on the OS level. The MMU is like an additional peripheral, whose configuration has an impact on memory layout, but it doesn't change the instruction set. An algorithm is coded the same way whether there is an MMU or not.

edited Dec 31 '17 at 18:08

answered Dec 31 '17 at 07:39

dim

15,845
3
39
84

That corner case you mention also happens when simply generating code at runtime, which is common when using JIT compilers, so depending on the use case it might be a very common scenario. – Voo Dec 31 '17 at 20:39
@voo that is right. But developing a JIT compiler isn't exactly a very common scenario. Executing something in JIT is indeed much more common, but the developer of the interpreted code doesn't have to be aware of this, only the developer of the JIT compiler. – dim Dec 31 '17 at 21:10
1

@Voo: Every "Harvard" architecture processor I looked at turned out to be really could execute code in RAM but there was more ROM than RAM, so you can't do too much of it. – Joshua Jan 01 '18 at 03:37
@Joshua The distinction between Harvard and van Neumann architectures is largely academic and not really relevant to practical architectures these days. Real CPUs are a mixture of both. For example any CPU that has separate instruction and data L1 caches (so almost all of them) would be a Harvard architecture. This is also why you need special instructions here: invalidating instruction caches. – Voo Jan 01 '18 at 13:44

score 13 · Answer 2 · answered Dec 31 '17 at 07:45

The idea that "code can be migrated" means that the instructions will produce the same end result. The architecture or number of pipeline stage do not affect that. e.g. the code for the instruction:

add r0,r1,r2

Will be the same on both machines and will produce the same result: r0 ends up to be the sum of r1 and r2. The latency may be longer. e.g. on an ARM7 it takes 3 cycles and on an ARM 9 it takes 5 cycles. But that will be the case for all instructions so the net result will be the same.

The real time depends on the clock speed. Thus the ARM9 may be faster despite taking 5 clocks because e.g. the ARM7 may be running at 100MHz and the ARM9 at 3 GHz.

The MMU on the ARM9 will be 'transparent' after a reset thus you will not notice that it is present. At least as long as you don't program it which ARM7 code will not do as there should be no code to touch the MMU.

A Harvard architecture does not mean code is executed differently. In fact you still need to decode the instruction before you know which data to fetch/write. It only allows the next instruction to arrive an be decoded at the same time as the data is read/written.

Having said all that, I remember there was an issues which branch instruction but that may have been when I transferred assembler to an A53 core.

Yes, in most cases code that only targets the features of the older architecture should work fine. The exceptions would be code that makes timing assumptions (hand optimized, delay loops, etc) or things like self-modifying code which may rely on consistency broken by the addition of distinct code and data caches, such that it requires explicit cache flush instructions to work correctly. Fortunately these are not novel issues - portable code like an operating system kernels likely already has provisions for this on some architectures, and skilled programmers are familiar with them. — Chris Stratton, Dec 31 '17 at 18:34
re: branch prediction + speculative execution: this question came up on Stack Overflow recently: [A newer ARM was speculating past a `pop {r4, pc}` when branch prediction was disabled, and speculatively loading from an address which hung the system](https://stackoverflow.com/questions/46118893/arm-prefetch-workaround), because the data following the function decoded as a load instruction. (And the MMU wasn't configured to disallow speculative loads from that address range.) — Peter Cordes, Dec 31 '17 at 21:29

score 5 · Answer 3 · answered Dec 31 '17 at 07:38

5

ARM9 and other uses Harvard architecture (separate bus for data and instructions)

This is inaccurate. A Harvard architecture CPU has separate memories for code and data; this is not the case in any implementation of the ARM architecture. There are separate busses for instructions and data in some implementations, but they are always connected to the same memory.

Won't there be any affect of architectures to codes?

The pipeline is an implementation detail. It does not affect the programmer's model of the CPU -- in almost all circumstances, the same code will run the same way on both implementations. (The exceptions are all unusual, like self-modifying code.)

answered Dec 31 '17 at 07:38

1

Since there are two separate busses, it would be equally inaccurate to say they are Von Neumann, though. I believe the case where there are two busses (eventually with a separate cache for each) going to the same physical memory is called "modified Harvard". Still Harvard to me... – dim Dec 31 '17 at 08:27
2

The claim that both busses are always connected to the same memory misses the key point. The only way you get a benefit from having two busses is if you are accessing *different* memories with them - either different caches in a large system, or flash for code and RAM for data in an MCU type one. If you need to make both types of access to the same memory, then you suffer an efficiency hit since they have to take turns. What is true is that these processors have a unified address space - you can use the same *instructions* to access both types of memory (and also typically I/O) – Chris Stratton Dec 31 '17 at 18:24
Hence it's most accurate to say that they have a Von Neumann programming model, with optional Harvard-style optimizations that the informed system designer should architect to take advantage of. – Chris Stratton Dec 31 '17 at 18:29
@ChrisStratton: So nobody ever builds an ARM where code + data have separate address spaces, i.e. a function and a global could have the same numeric address? I wouldn't be surprised if there are systems where it would fault if you `LDR` an instruction word or jump to some bytes you just stored, but not that you'd get different bytes depending on the type of access. [This question came up for MIPS the other day (mostly as beginner confusion](https://stackoverflow.com/q/47971462/224132)), and I wasn't 100% sure that it was impossible to build a MIPS with separate address spaces. – Peter Cordes Dec 31 '17 at 21:17
1

@PeterCordes Not that I'm aware of. It'd be incompatible with Thumb code -- standard practice is to store constants at the end of a function and use PC-relative memory references to load them. With separate address spaces, that wouldn't work. – Dec 31 '17 at 22:31
@duskwuff: ah right, so almost certainly not for ARM. It's not totally inconceivable for MIPS, I think, but I don't think anyone actually does build them that way because the MMU and any caches would normally assume unified address space. Of course some microcontrollers like AVR are true Harvard, and might actually have separate address spaces. – Peter Cordes Dec 31 '17 at 22:47
1

@PeterCordes AVR is _almost_ a true Harvard architecture. The LPM/SPM instructions make it a (minimally) modified Harvard architecture. – Dec 31 '17 at 22:52
1

@PeterCordes - no. On either ARM or MIPS are any other unified address space machine, the only way you get different results for legal access to the same *hardware* address is due to cache inconsistency - which can be a *real* problem when software is not written correctly. – Chris Stratton Dec 31 '17 at 22:56
Really every CPU that has separate instruction and data caches and doesn't make this completely invisible to the programmer by definition cannot be a van Neumann CPU. ARM certainly isn't since it requires specific instructions to invalidate the i-cache. I doubt any modern CPU that uses caches could be argued to be a van Neumann architecture - even x86 which does its hardest to hide the i-cache from you requires handling under specific circumstances. – Voo Jan 01 '18 at 20:52

score 4 · Answer 4 · answered Dec 31 '17 at 07:37

4

All that "architecture" refers to here is how the CPU is wired to memory - most of the instructions and their encodings and the registers are the same between the two chips - the only way you might to run into trouble with the difference between a harvard and other architectures, or the number of pipe stages might be if you were writing code into memory then executing it ... you just have to be careful about invalidating caches

Some things like exceptions and the like may be different

answered Dec 31 '17 at 07:37

Taniwha

1,636
9
8

Writing code as data and then executing it is impossible on a Harvard architecture. It's possible on [a *modified*-Harvard (i.e. a normal CPU with split L1D / L1I caches)](https://en.wikipedia.org/wiki/Modified_Harvard_architecture#Modified_Harvard_architecture). On ARM it takes an `isync` or something, because the split caches aren't coherent. (As opposed to x86, where the instruction cache is separate but unfortunately has to be coherent with data caches, costing transistors and power to snoop the icache and pipeline for store addresses to detect when to flush the pipeline.) – Peter Cordes Dec 31 '17 at 21:23

How is the code written in ARM7 compatible with ARM9?

4 Answers4