Which is faster, Executing a program from ROM or RAM?

Question

Most of us who are from electronics background knows that SRAM is faster than DRAM. But when it comes to comparing RAM with ROM, i am unsure.

My question is related to micro-controller : "If a code is executing directly from RAM/ROM, whose performance will be better ?? 1) execution from RAM or 2) execution from ROM or 3) both will perform equal"

Also considering the fact that ROM are designed to have higher READ speeds. whereas for RAM, there is a trade off of read speed for having write capabilities.

Read the datasheet (thoroughly) is the best way. Sometimes it's faster to run a program from RAM than flash memory. Some micros can't run programs from RAM at all and others may run at the same speed. — Spehro Pefhany, Dec 01 '15 at 18:39
Many current ARM Cortex-M parts are prime examples of those which *can* execute from SRAM but are *slower* when doing so as the dedicated instruction path to flash can't be used. Conversely data access to flash can be slower than to RAM — Chris Stratton, Dec 01 '15 at 21:40
I can't add a comment yet, just trying to be helpful. Depends if the ROM is in fact faster then the RAM your using. Are they of equal speeds? — OzzieSpin, Dec 01 '15 at 19:33
But with slightly older ARM7 chips (my experience was with LPC2106 and LPC2148) excution from RAM is often FASTER than from FLASH. Which, together with Chris' answer, proves that the only thing we can say is "it depends". — Wouter van Ooijen, Dec 01 '15 at 22:32
If you already have the hardware, the easiest way is to simply run the two scenarios and compare. If not, datasheets are your best bet. — Luaan, Dec 02 '15 at 08:43

score 17 · Answer 1 · answered Dec 01 '15 at 19:02

17

The datasheet should tell you how long each instruction takes, and what differences there are, if any, between executing from RAM or ROM.

For microcontroller that offer the option of executing from RAM, that is probably faster, likely being the main point of using additional RAM space to execute code from. There may also be some fetch overlap issues. In some cases it might be faster to execute from ROM because it is a separate memory and RAM access can be going on concurrently.

Again, the only way to know for any particular micro is to READ THE DATASHEET.

answered Dec 01 '15 at 19:02

Olin Lathrop

310,974
36
428
915

It's even faster to execute from register. – Joshua Dec 01 '15 at 23:01
@Joshua Do you have any examples of what you might do with a program executed from registers? It seems wickedly clever, but limited to a rather small program size. I've heard of 64kB graphics demos, but a 16 register demo? =) – Cort Ammon Dec 02 '15 at 00:13
3

@CortAmmon: I have one sitting on my desk with 512 registers, 400 of which contain program code. The RAM is 3 times slower, and the ROM is so slow it gets copied to RAM on startup (which takes 100s of milliseconds). I have a SD card initializer/reader that fits in 300 registers, with no hardware support beyond GPIO pins. The writer takes another 100 or so registers so the whole thing doesn't fit in register, (this would leave not enough to do anything interesting) but I don't need the initializer anymore, so overwrite. – Joshua Dec 02 '15 at 04:30

score 9 · Answer 2 · answered Dec 01 '15 at 19:24

It depends entirely on the memory and CPU architecture. As a rule of thumb, SRAM is faster than flash, particularly on higher-speed MCUs (>100 MHz). SRAM bit cells produce a (more or less) logic-level output, while flash memory has to go through a slower current sensing process.

How much faster (if any) again depends on the architecture -- the word size of the memories, the number of wait states on each, the presence of caching, the size of the CPU instructions, etc. If you're running at a low enough frequency, you could have zero wait states on flash and RAM, so they might run at the same speed.

The code also matters. If your code is strictly linear (no branching), the flash could prefetch instructions fast enough to keep the CPU saturated even at higher frequencies. As Olin said, a Harvard architecture CPU with separate program and data read paths could perform differently when code and data are in different memories.

Metal ROMs (and other nonvolatile memories such as FRAM) have their own characteristics, and may or may not be as fast as SRAM. The ability to write doesn't necessarily make a difference; it's more about the characteristics of the bit cell output and sensing circuits.

The datasheet will give you a rough idea of the speed difference, but the only way to know for sure is to profile your code.

score 1 · Answer 3 · answered Dec 01 '15 at 21:24

"Running a program" requires a CPU with a synchronous clock. Slow memory can be accommodated by either running the entire system at a slow enough clock, or by inserting wait states (extra do-nothing clock cycles between the fetch and decode phases), active only for certain address ranges (see the ancient 8085 for example).The CPU instruction fetch doesn't know or care exactly when the data is settled to its final value, just as long as it does not change during the setup/hold interval.

A microcontroller usually has all of its memory on-chip, so unless stated otherwise I'd assume the memory system is all zero-wait-state. (but read datasheet to confirm). Typical microcontrollers are meant to be simpler, single-chip solutions compared to a desktop, so wait states are unlikely in a microcontroller. So it's unlikely that a microcontroller would have mismatched on-chip memory speeds.

Faster memory generally costs a premium (higher voltage, lower capacitance, more demand). An 80xx86 has fast SRAM in L2 cache and even faster SRAM in L1 cache, and lots of slower DRAM off-chip attached to a memory controller. This kind of system is a lot more complicated than a microcontroller, and is beyond the scope of the question. (But of great interest to a computer engineer!)

Actually a perfectly matched design is not possible without constraints. A processor either segregates instruction and data memory, underutilizes the memory speed in non memory-data instructions, suffers wait states, or uses multi-port memory. — Chris Stratton, Dec 01 '15 at 21:44
Wait states are pretty common in higher-performance microcontrollers. Flash is slow. — Adam Haun, Dec 01 '15 at 22:34
@AdamHaun: On the flip side, many internal flash arrays can read many words at once; if code jumps to some arbitrary location in flash, it may take a couple cycles to fetch the first instruction, but once that is fetched the next few instructions may be available without further delay. In many cases, accessing something near the end of the buffer will prepare the system to load the next set of words. — supercat, Jun 02 '16 at 17:42

Which is faster, Executing a program from ROM or RAM?

3 Answers3