AI hardware architecture

Question

I am looking for more info on AI hardware architectures, but I am a bit confused. Here are my questions:

Does it all come down to MACs(Multiply And Accumulate) units?
Do MACs usually integrate into ALU like this:

or the ALU in this case IS MAC like shown here

In case anyone is interested I got this from MIT ISCA 2017 (page 124 and down)presentation.

Thanks!

score 0 · Accepted Answer · answered Aug 29 '18 at 13:17

0

You seem to be mixing different presentations and different domains.

"MAC" in this context means Multiply-Accumulate, also known as Fused Multiply and Add (FMA). It's indeed the core operation for highly-connected neural networks including fully-connected networks and CNN's. The other main operation is the non-linear operation done on the result of all those MAC's. But since that's often a ReLu these days, that operation is practically free. (essentially one gate delay)

ALU (Arithmetic Logical Unit) is an outdated CPU concept. Modern CPU architectures don't have them anymore; the logical equivalent would be the FP Execution Units. They're definitely not connected to memory as in figure 1; they operate on registers. Memory wouldn't be able to keep up, eliminating the benefit of a dedicated MAC operation.

The second picture is a high-level overview of a new non-CPU architecture. This makes sense; you were discussing AI hardware and not general-purpose hardware such as CPU's.

answered Aug 29 '18 at 13:17

MSalters

561
3
7

I was indeed confused because the slide above mentioned ALU and i though that it is integrated. Thanks you for solving my confusion! – Aleksandar Kostovic Aug 29 '18 at 13:20
1

Why we don't have ALU in modern CPU architecture anymore? – Kindred Dec 09 '18 at 00:47
We do. Especially in specialized processors like the stream processors in GPGPUs, there's specific units for integer operations. – Marcus Müller Dec 09 '18 at 00:51
1

@ptr_user7813604: The ALU concept dates back to the 1970's. It closely ties a single register to the transistors that perform the math operations. In fact, it's often named the "A" register, for instance in Intel designs. We now have a 64 bit RAX register, its lineage dates back to the A register in the 4004. However, the RAX register no longer is special; every x64 chip can perform math equally well on RBX, or R15 for that matter. – MSalters Dec 10 '18 at 09:14
@MSalters: Thank you sir, thanks for your kindness. I will do some search to understand your words. – Kindred Dec 10 '18 at 15:24

score 0 · Answer 2 · answered Aug 30 '18 at 04:32

0

Symbolic AI needs lots of pointer-chasing.

DeepLearning neural systems needs lots of multiply-accum math, and pointer chasing to access the various synapse strengths and their connections.

answered Aug 30 '18 at 04:32

analogsystemsrf

33,703
2
18
46

Actually, deep nets don't need pointer chasing. If you implemented it like that'you did it wrong. A good programmer will organize all connections in the order that they're needed for the calculations. That means the only pointer operation needed is a simple increment by a fixed size, typically 4 bytes. This sequential access is heavily optimized in hardware, to the point that many CPU's will auto-detect the pattern and prefetch the connection weights before they are needed. – MSalters Dec 10 '18 at 16:03

AI hardware architecture

2 Answers2