20

This is what I guess would happen:

  1. If two cores tried to access the same address in RAM, one would have to wait for the other to access the RAM. The second time that each core would try to access the same address, they may still have that RAM cached, so they could access their respective caches simultaneously.

  2. If two cores tried to access different addresses in the same RAM, one would have to wait for the other to access the RAM.

In other words, I would imagine that for RAM intensive programming tasks, multiprocessing won't help much unless it involved reading from the same address in RAM multiple times per core.

So, can multiple CPU's / cores access the same RAM simutaneously, or is what I'm saying correct?

Lost Hobbit
  • 303
  • 1
  • 2
  • 8
  • I can't speak to the hardware level you're referring to, but I can say ram intensive tasks can be aided by multiprocessing by simply splitting up the usage; that is to say if you have 500mb of data in ram you need processed, give out 250mb of that data/ram to one proc and 250mb to another and you've effectively doubled your *possible* throughput (ram bandwidth restrictions not withstanding). Aside from whether or not the hardware can do it, having multiple processors accessing the same ram address is a genuinely bad idea, and most multi-proc code painstakingly tries to avoid it. – Jimmy Hoffa Jan 15 '13 at 15:35
  • 1
    @JimmyHoffa But RAM bandwidth restrictions are precisely what he's talking about (as the assumption is that the task is memory-bound). –  Jan 15 '13 at 15:36
  • @Jimmy I don't see any problem with two processors trying to read from the same RAM address. I would only see a problem if they tried to write to it at the same time. – Lost Hobbit Jan 15 '13 at 15:43
  • @delnan Ah, I didn't get that at all from the question. I don't see how concurrent ram address access however has any effect on ram bandwidth. Say for instance you give 1-200 to one proc and 201-400 to another proc, if you fill both ranges with identical data then concurrent address access is subverted. Whether or not there is concurrent ram access at all disregarding addresses is however a question. I would presume yes but I know near nothing about hardware behaviour at this level. – Jimmy Hoffa Jan 15 '13 at 15:44
  • @JimmyHoffa Note that these days the memory controller is built into the CPU die itself and is shared between physical cores. Depending on the access pattern and RAM bandwidth it could be a case that a single thread is bandwidth bound. In that case adding another thread working on memory intensive tasks won't help because there's a physical limitation. It would help if there was another physical processor present in the machine with it's own memory controller, but even still the bandwidth of the link(AMD's HT and Intel's QPI) between CPU and RAM could become a bottleneck. – zxcdw Jan 15 '13 at 15:57
  • 1
    at particular multicore processor I used to work with, cores didn't "know" anything beyond their local caches; stuff that needed to sync with shared cache was being done transparently to these in specified number of processor cycles; programmer willing to take this into account just manually added needed amount of `nop`s in their assembly code – gnat Jan 15 '13 at 15:58
  • 2
    Short answer: depends on your system bus architecture, cache coherence protocol, number of ports in your DDR controller and number of DDR controllers. Long answer is in your system's datasheet. – SK-logic Jan 15 '13 at 17:49
  • Despite I find this an extremely interesting question, it is also a duplicate: http://stackoverflow.com/questions/12630151/can-multiple-cores-simultaneously-read-the-same-ram-location http://stackoverflow.com/questions/516940/can-multiple-cpus-simultaneously-write-to-the-same-ram-location (...and they are the first hits on google) – dagnelies Jan 15 '13 at 16:11

5 Answers5

13

Summary: it's generally possible for a single core to saturate the memory bus if memory access is all it does.

If you establish the memory bandwidth of your machine, you should be able to see if a single-threaded process can really achieve this and, if not, how the effective bandwidth use scales with the number of processors.


The details will depend on the architecture you're using. Assuming something like modern SMP and SDRAM:

  1. If two cores tried to access the same address in RAM ...

    could go several ways:

    • they both want to read, simultaneously:

      • two cores on the same chip will probably share an intermediate cache at some level (2 or 3), so the read will only be done once. On a modern architecture, each core may be able to keep executing µ-ops from one or more pipelines until the cache line is ready
      • two cores on different chips may not share a cache, but still need to co-ordinate access to the bus: ideally, whichever chip didn't issue the read will simply snoop the response
    • if they both want to write:

      • two cores on the same chip will just be writing to the same cache, and that only needs to be flushed to RAM once. In fact, since memory will be read from and written to RAM per cache line, writes at distinct but sufficiently close addresses can be coalesced into a single write to RAM

      • two cores on different chips do have a conflict, and the cache line will need to be written back to RAM by chip1, fetched into chip2's cache, modified and then written back again (no idea whether the write/fetch can be coalesced by snooping)

  2. If two cores tried to access different addresses ...

    For a single access, the CAS latency means two operations can potentially be interleaved to take no longer (or perhaps only a little longer) than if the bus were idle.

Useless
  • 12,380
  • 2
  • 34
  • 46
  • Another list item is when one core initiates a DMA transfer while anpther core pokes on the target area. – ott-- Apr 10 '15 at 20:38
7

So, can multiple CPU's / cores access the same RAM simutaneously, or is what I'm saying correct?

There are many different machine architectures out there, each with its own set of features. One category of multiprocessing machines is called MISD, for Multiple Instruction Single Data, and such machines are designed to provide the same data to several processors all at the same time. A related class of machines known as SIMD architectures (Single Instruction Multiple Data) are much more common and also provide access to the same memory at the same time, but the memory contains instructions instead of data. In both MIMD and SIMD, "access" means read access -- you can imagine the trouble you'd have if two units tried to write to the same location at the same time!

Caleb
  • 38,959
  • 8
  • 94
  • 152
5

Although most answers approach from the side of software and/or hardware model, the cleanest way is to consider how the physical RAM chips work. (The cache is located between the processor and the memory, and simply uses the same address bus, and its operation is completely transparent for the processor.) RAM chips have one single address decoder, which receives the address of the memory cell, arriving on the address bus (and similarly a data bus, either in or out). The present memories are built in the "single processor approach", i.e. one processor is connected through one bus to one memory chip. In other words, this is the "von Neumann bottleneck", since every single instruction must reference the memory at least once. Because of this, on one wire (or wires, aka bus) only one signal may exist at a time, so the RAM chip may receive one cell address at a time. Until you can assure the two cores put the same address to the address bus, the simultaneous bus access by two different bus drivers (like cores) is physically not possible. (And, if it is the same, it is redundant).

The rest is the so called hardware acceleration. The coherence bus, the cache, SIMD access, etc. are just some nice facades in front of the physical RAM, your question was about. The mentioned accelerators may cover the fight for using the address bus exclusively, and the programming models have not much to do with your question. Also note that simultaneous access would also be against the abstraction "private address space".

So, to your questions: the simultaneous direct RAM access not possible, neither with the same nor with different addresses. Using cache might cover this fact and might allow apparently simultaneous access in some cases. It depends on cache level and construction, as well as the spatial and temporal locality of your data. And yes, you are right: multi(core) processing without enhanced RAM access, will not help much for RAM-intensive applications.

For better understanding: just recall how Direct Memory Access works. Both the CPU and the DMA device can put address to the bus, so the have to exclude each other from the simultaneous using of the bus.

katang
  • 161
  • 1
  • 2
1

You don't care about physical RAM, you care more about virtual memory and address space of processes or threads (all the threads of the same process share a common address space) in practice.

Of course if you are coding a multi-core operating system kernel, you care about RAM and cache coherence a big lot.

Most multi-core processors have some form of cache coherence mechanism. Details are processor specific. Since processor are using CPU caches, they sometimes behave as if several cores of the processor are accessing the same memory location simultaneously.

Recent standards of industrial languages like C11 or C++11 have some (multi-thread aware) memory model.

Basile Starynkevitch
  • 32,434
  • 6
  • 84
  • 125
0

Modern CPUs are physically tied to their external memory devices in order to obtain maximum data transfer bandwidth. This is due to signal integrity requirements (trace length, termination, clock skew, etc) necessary to sustain the high transfer rates. For example, on a multi-CPU motherboard, each CPU has a dedicated set of DIMM slots. Regardless of what software programmers might think, one CPU cannot simply access external memory data of another CPU. A system's memory management software, whether at the level of OS kernel, Hypervisor, data plane cores, or otherwise, handles inter-CPU memory data transfer.

  • 1
    this post is rather hard to read (wall of text). Would you mind [edit]ing it into a better shape? – gnat Mar 26 '15 at 12:55