12

I was reading about arduino and the AVR architecture and got stuck at the point that how does pipeline stall or bubbling is solved by Harvard architecture introduction in the AVR.I mean what Harvard does is just provide different storage location to data memory and program memory which makes it possible to load program without an operator.But how does it help solve the above problem?

Ayush
  • 127
  • 1
  • 8
  • 2
    This is a bit of a guess so I won't post as an answer, but I guess Harvard architecture helps because there isn't the possibility of self-modifying code by a previous instruction in the pipeline. – PeterJ Feb 02 '13 at 10:56
  • 1
    i am not sure if i get u...do u mean to say that once the instruction has been "fetched" it cannot be modified or thrown back ? – Ayush Feb 02 '13 at 10:58
  • 1
    Yes that's right, for non-Harvard because the code can change itself there is the possibility that the instruction before could be modifying the instruction that follows. But wait a while someone will probably have a more definitive and clearer answer. – PeterJ Feb 02 '13 at 11:00

4 Answers4

9

The Harvard architecture, which incidentally was used long before AVRs were ever invented, does indeed have separate address spaces for the program memory and for the data memory. What this brings to the party is the ability to design the circuit in a way such that a separate bus and control circuit can be used to handle the information flow from the program memory and the information flow to the data memory. The use of the separate buses means that it is possible for the program fetching and execution to continue without disruption from an occasional data transfer to the data memory. For example in the simplest version of the architecture the program fetching unit can be busy fetching the next instruction in the program sequence in parallel with data transfer operation that may have been part of the previous program instruction.

At this simplest level the Harvard architecture has a limitation in that it is generally not possible to put program code into the data memory and have it be executed from there.

There are many variations and complexities that can be added on top of this simplest form of the architecture that I have described. One common addition is adding instruction caching to the program information bus that allows the instruction execution unit faster access to the next program step without having to go off to slower memory to fetch the program step each time it is required.

Michael Karas
  • 56,889
  • 3
  • 70
  • 138
  • thanks a lot....really helped me get it,but just one more thing...cant we have different buses but same memory and work at the same time? – Ayush Feb 02 '13 at 12:04
  • @Ayush - If you had two buses going to the same memory space then two memory transaction requests that arrived at the memory at the same time would still have to contend for memory access. One has to wait for the other to complete!! Now that said some designers have "solved" that problem by designing the memory to operate at twice the normal speed and then let one bus have access to the memory alternating with accesses from the other bus. Ie things are designed such that first bus is always synced with the odd access slots to the memory and the (cont next comment) – Michael Karas Feb 02 '13 at 17:01
  • (cont from prev comment) second bus synced to the even access slots of the memory then both buses can proceed to operate at speed without memory access contention. – Michael Karas Feb 02 '13 at 17:03
  • @MichaelKaras: One could do that. On the other hand, in most cases, the primary limiting factor for overall system speed is memory speed. If one had a memory that could run twice as fast as would be needed just for data or just for code, splitting the memory system into one each for data and code would allow things to go twice as fast. – supercat Feb 02 '13 at 17:54
5

Some notes in addition to Michaels answer:

1) the Harvard architecture does not require that there are two separate spaces for data and code, just that they are (mostly) fetched over two different busses.

2) the problem that is solved by the Harvard architecture is bus contention: for a system where the code memory can just about provide the instructions quickly enough to keep the CPU running full speed, the additional burden of data fetches/stores will slow the CPU down. That problem is solved by a Hardvard architecture: a memory that is (a bit) too slow for the speed of the CPU.

Note that caching is another way to solve this problem. Often Harvarding and caching are used in interesting combinations.

Harvard uses two busses. There is no inherent reason to stick to two, in very special cases more than two are used, mainly in DSPs (Digital Signal processors).

Memory Banking (in the sense of distributing memory accesses to different sets of chips) can be seen as a sort of Harvarding inside the memory system itself, not based on the data/code distinction, but on certain bits of the address.

Wouter van Ooijen
  • 48,407
  • 1
  • 63
  • 136
  • 5
    Actually a "pure" Harvard architecture *does* require separate memories (address spaces) for instructions and data. However, since this prevents a computer from booting itself, many machines implement a "modified" Harvard architecture, in which writes to instruction memory are allowed. – Dave Tweed Feb 02 '13 at 12:47
  • Memory banking does not help unless there are two (or more) busses between the CPU and each of the memory banks. – Dave Tweed Feb 02 '13 at 12:49
  • @Dave 2: banking does help in certain circumstances, for instance if the problem is the memory timing AND the bus to the memory is non-blocking (multiple transactions can be outstanding). – Wouter van Ooijen Feb 02 '13 at 13:17
  • @Dave1: can you give a reference? – Wouter van Ooijen Feb 02 '13 at 13:17
  • [Wikipedia](http://en.wikipedia.org/wiki/Harvard_architecture), also [Princeton University](https://www.princeton.edu/~achaney/tmve/wiki100k/docs/Harvard_architecture.html) (which is really just a copy of the Wikipedia page). Also, most single-chip microcontrollers are Harvard architecture, and many datasheets actually discuss how providing a mechanism to self-write the code flash memory creates a modified Harvard architecture. – Dave Tweed Feb 02 '13 at 14:16
  • 1
    Memory banking with multiple outstanding transactions may improve the bandwidth for a high-latency memory, but does not hide the fact that instruction and data accesses must contend for the same bus. Calling it "a sort of Harvard" only muddies the waters, rather than making anything clearer. The Harvard architecture exploits a certain kind of parallelism, but you can't start calling all forms of parallelism "Harvarding". – Dave Tweed Feb 02 '13 at 14:23
  • @DaveTweed: If a machine used a four-phase clock where phase 1 output an instruction address which would be latched by external circuitry, phase 2 latched a data address, phase 3 read an instruction, and phase 4 read or wrote a byte at the indicated address, I would not consider the sharing of the same bus pins as precluding the system from being called a "Harvard Architecture" if the multiplexed bus was expanded to distinct address/data buses, and there was a rigid subdivision that phase 1 is always code address, phase 2 is always data address, etc. – supercat Jun 15 '15 at 16:20
  • @supercat: I'm really not sure what the point is that you're making. Harvard vs. von Neumann architecture is not about the bus timing, it's about having separate memory spaces for I and D. Even the lowly 8051 shares the same pins for both kinds of access, but it has an additional pin that indicates which memory space is being accessed on any given cycle. – Dave Tweed Jun 15 '15 at 21:53
  • @DaveTweed: The term "Harvard Architecture" has hardware and software aspects; for the hardware aspect, I would say that a defining characteristic would be that data fetches can be performed without delaying code fetches. On the 8x31, the MOVX instruction will cause the bus which would normally be used for code fetches to be tied up processing the data fetch, so with regard to MOVX I would not consider the 8x51 to be a Harvard architecture. The primary data space of the 8x51, however, is an internal RAM which can be accessed independently of the code bus. – supercat Jun 15 '15 at 22:38
2

One aspect that has not been discussed is that for small microcontrollers, typically with only a 16-bit address bus, a Harvard architecture effectively doubles (or triples) the address space. You can have 64K of code, 64K of RAM, and 64k of memory-mapped I/O (if the system is using memory-mapped I/O instead of port numbers, the latter already separating the I/O addressing from the code & RAM spaces).

Otherwise you have to cram the code, RAM, and optionally I/O address all within the same 64K address space.

tcrosley
  • 47,708
  • 5
  • 97
  • 161
2

A pure Harvard architecture will generally allow a computer with a given level of complexity to run faster than would a Von Neuman architecture, provided that no resources need to be shared between the code and data memories. If pinout limitations or other factors compel the use of one bus to access both memory spaces, such advantages are apt to be largely nullified.

A "pure" Harvard architecture will be limited to running code which is put in memory by some mechanism other than the processor that will run the code. This limits the utility of such architectures for devices whose purpose isn't set by the factory (or someone with specialized programming equipment). Two approaches may be used to alleviate this issue:

Some systems have separate code and memory areas, but provide special hardware which can be asked to briefly take over the code bus, perform some operation, and return control to the CPU once such operation is complete. Some such systems require a fairly elaborate protocol to carry out such operations, some have special instructions to perform such a task, and a few even watch for certain "data memory" addresses and trigger the takeover/release when an attempt is made to access them. A key aspect of such systems is that there are explicitly-defined areas of memory for "code" and "data"; even if it's possible for the CPU to read and write "code" space, it is still recognized as being semantically different from data space.'

An alternative approach which is used in some higher-end systems, is to have a controller with two memory buses, one for code and one for data, both of which connect to a memory arbitration unit. That unit in turn connected to various memory subsystems using a separate memory bus for each. A code access to one memory subsystem may be processed simultaneously with a data access to another; only if code and data try to access the same subsystem simultaneously will either one have to wait.

On systems which use this approach, non-performance-critical parts of a program may simply ignore the boundaries between memory subsystems. If the code and data happen to reside in the same memory subsystem, things won't run as fast as if they were in separate subsystems, but for many parts of a typical program that won't matter. In a typical system, there will be a small part of code where performance really does matter, and it will only operate on a small portion of the data held by the system. If one had a system with 16K of RAM that was divided into two 8K partitions, one could use linker instructions to ensure that the performance-critical code was located near the start of the overall memory space, and the performance-critical data was near the end. If overall code size grows to e.g. 9K, code within the last 1K would run slower than code placed elsewhere, but that code wouldn't be performance critical. Likewise, if code were e.g. only 6K, but data grew to 9K, access to the lowest 1K of data would be slow, but if the performance-critical data were located elsewhere, that wouldn't pose a problem.

Note that while performance would be optimal if code were under 8K and data were under 8K, the aforementioned memory-system design would not impose any strict partition between code and data space. If a program only needs 1K of data, code could grow up to 15K. If it only needs 2K of code, data could grow to 14K. Much more versatile than having an 8K area just for code and an 8K area just for data.

supercat
  • 45,939
  • 2
  • 84
  • 143