Is the registry file made from SRAM?

Question

I study computer engineering and I read Hennessy's book about Computer Organization where it's described how the microprocessor does pipelining and that the microproceossor has on-chip cache, as much as 8 MB on-chip cache in a modern microprocessor such as AMD's Opteron. Is that on-chip cache made from SRAM or what are the physical characteristics of a modern registry file and its 2 caches, instruction and memory? Is it the same material in the L1, L2 and L3 caches?

score 7 · Answer 1 · answered Jun 11 '13 at 19:10

A SRAM memory which is many kilobytes or megabytes in size will generally be constructed in such a way as to minimize the surface area per bit. A typical design will have memory cells aranged on a grid. Each memory cell will have four transistors to hold each bit, and two "access enable" transistors to connect each bit to normal and inverted buses which are used for writing and reading. Typically, all of the "access enable" transistors on a row will be switched together, and all the memory cells in a column will be tied to the same two buses. The net effect is that at any given time, only memory cells on a single selected row may be read or written.

Register files are generally tiny by comparison. The ARM register file is probably somewhere around 200 bits (128 for the main register file, but some parts of various shadow registers as well). Reducing the physical footprint of each memory bit is far less important than maximizing its speed. At minimum, it should be possible to read two arbitrarily-selected registers while writing a third. It should also be possible to simultaneously read the value of a register and write a new value to that register, with a guarantee that the write operation will not affect the value seen by the simultaneous read. A conventionally-laid-out SRAM will not be able to do those things. Instead, register files are often constructed using discrete flips flops or latches with hard-wired enable or multiplexing logic. Chip designers will likely lay out register files in some sort of tiled arrangement, rather than laying out each bit's circuitry independently, but from a functional standpoint the bits of a register file will be implemented using a lot more circuitry for each bit than would be typical in an SRAM array.

+1 for mentioning the multiple read ports and separate write port of the [register file](http://en.wikipedia.org/wiki/register_file). — davidcary, Jun 11 '13 at 20:36

score 4 · Accepted Answer · edited Aug 23 '14 at 02:49

4

They are going to implement it such that it uses the minimal number of transistors possible while still meeting performance targets. For microprocessors, this usually means that, yes, it will be a form of SRAM/latches. I'm being ambiguous because there are so many different ways of implementing latches. The only reason why you would not call SRAM and latches the same thing is that they are optimized for different performance and that subtly affects the transistor layout. But you could design and layout the whole design only using latches for everything.

Also, most logic flows and design in microprocessors use latch based double clocked schemes for performance and timing reasons. So latches are abundant in the cell library.

There are processes available that integrate DRAM with standard logic but these tend not to be used in microprocessors due to cost and yield issues (due to the complexity of additional process steps)

Is it the same material? - yes this is all on the same Si substrate the question should be it is the same cell library. Yes.

edited Aug 23 '14 at 02:49

answered Jun 11 '13 at 15:27

placeholder

29,982
10
63
110

One thing I've wondered about: in processors that allow the contents of two data buses to be written to different (e.g. 32-bit) registers simultaneously, is it better to have one group of 32 memory cells for each register, and have that group of memory cells be capable of grabbing data from either bus, or is it better to have two groups of 32 memory cells for each register, one of which sits on each bus, and then have an extra bit which says which group was written last? I would think the latter approach might allow faster operation... – supercat Jun 11 '13 at 18:53
...since the value to be clocked into each register would appear at that register's inputs as soon as it was on the bus, without having to go through a mux delay first. I'm not aware of such designs being used in practice, but on the software side such approaches can be pretty common. – supercat Jun 11 '13 at 18:56
1

@supercat soft cores implemented in FPGAs with single-port block rams can provide two read ports by replicating the register file (the write is done in a separate phase and applied to both copies). A similar technique is used in some superscalar processors--e.g., Alpha 21264--to reduce port count (and register file size and routing complexity); two register files with four read ports and four write ports are smaller than one register file with eight read ports and four write ports, e.g. – Jun 11 '13 at 20:12
@PaulA.Clayton: Adding read ports by duplicating content is pretty easy. Further, even when using standard SRAM cells, doubling up read capability only requires having two select wires per row; each cell should have one of its select transistors controlled by each wire. One select wire would only be able to write zeroes and one would only be able to write ones, so for most practical purposes only one write at a time could be performed, and it would have to be separate from any reads. What I was curious about was how memory designers add write ports. – supercat Jun 11 '13 at 21:04
It depends on the size of the buffers, the number of attached cells to the bus and the signalling complexity of the bus. the major effort lately in SOC's has been on bus signalling, like ARM's AMBA . However, your question is more localized. Local muxing does not incur much of a delay as on the transistor level that would be a transmission gate. these structures will be very tightly hand crafted and matched. However, it would be hard to talk in generalities because of process differences. For example two S/D's from a TG has way less loading than 1 Gate load from a logic input (2 Tr's) – placeholder Jun 11 '13 at 23:35

Is the registry file made from SRAM?

2 Answers2