what inside a graphical processor?

Question

Preamble

Due to the super-secret nature of commercial GPU, I didn't find anything interesting yet.
The best thing I can do is asking that to someone which has more experience than me (everyone). It's quite a huge question, I hope it'll fit the rules.
Also because GPU aren't directly related to this network, but there are many others, so I tried.
Because quotes parallel processing and because older GPUs (from retro console and other) uses architectures very different from the modern one, this question is about recent GPUs (the ones supporting GPGPU). To semplify the question, think about NVidia and AMD ones.

Question

My idea of GPU is quite confusing: I know it's a particular piece of hardware, which primary components are tons of cores working parallel, used to elaborate images or general signal, and a quite high amount of RAM used to buffer thing and make elaboration faster
Too much "Wikipedia" for me

These are some of my thesis and information I got (HELP section says to share your research), feel free to confute them:

Point 1: What happens when data send form the CPU enter inside the GPU?

You can't write GPU programs in it own assembly: a program written in, for example, Cuda is PTX, an intermediate language that is elaborate (compiled? interpretated?) internally by the GPU. If there is an internal language, a sort of assembly (maybe a microcode like architecture?), it is super-secret
There are no official open source drivers. May you want to find something reversing the drivers? Good luck. But there is a good news: some GPU has a quite good doc.

Point 2: Pipeline and shaders

What is a shader? Difficoult to answer. But someone tried. However, shader should be a piece of software that is elaborate by the GPU to render all the beauty 3D scenes. There are more type of shaders, each one is elaborate in a particular step of the GPU pipeline.
But what's that pipeline? Seems to be a sequence of step that goes from receving input from the cpu to the render in output. DirectX and OpenGl should give an astraction of that implementation, but each model/family has a different implementation? Is that a super-secret or it's freely shared with us, poor community? (Or maybe with NDA?)

Point 3: Metaphore

I want to avoid something like "This question is too long. We hate you. Question closed"
Probably you are right, but I still try to save my question.

The faster way to answer this question is making an example: can the GPU look like another piece of hardware? Maybe one not conceptually sophisticated.
In my opinion, it can be viewed partially like an FPGA because of the parallel execution. Many cores = many IP = FPGA? Maybe. But obviously that is a completely different hardware, much more flexible and suitable for different scenarios.

Can you make other examples? Any help is really appreciated from the bottom of my hearth.
If you want, you can also post any resource-book-readings you think it's related to the topic. I really like to go in depth

I know you said you don't want this answer, however it really is a very large topic. I have done — Jarrod Christman, Jul 13 '14 at 22:04
I give up... trying to type a reply on an android tablet is impossible! I'll type a reply when i have my laptop, lol. — Jarrod Christman, Jul 13 '14 at 22:12
there are open source/publically available API for interfacing with GPU, it's called CUDA. You can do ram/shared memory buffering to the GPU from the CPU and create your own processing pipeline. — KyranF, Jul 14 '14 at 00:50
@KyranF I heard about CUDA but it's an abstraction of what is inside a GPU. Helps programmer to develop applications, but it doesn't say anything more, unfortunatly — incud, Jul 14 '14 at 08:21
@JarrodChristman ok I'll wait for your knowlegde :) [edit for wrong tag] — incud, Jul 14 '14 at 11:47
@JarrodChristman you seem to know lots of things on this topic. Do you have any website or blog? — incud, Jul 14 '14 at 18:09
I don't know a lot when it comes to the circuitry other than what I can glean from my knowledge of general processor design. However, I do know a lot when it comes to 3D programming and art. I used to do freelance video game artwork and programming. My site isn't that full of information on the programming portion, but it does cover some parts of the artwork, jarrodchristman.com (visit clever I know, haha). You can read about a few of the visual arts technical aspects of it there. — Jarrod Christman, Jul 14 '14 at 18:33
I will add though, that as the others have mentioned, not much is super secret, other than say the actual GPU architecture (just from a company intellectual property standpoint). OpenGL is a very common rendering API that is completely open source. Generally the CPU will read data into the main systems RAM and load all scene objects (3D models defined in terms of arrays of vertices), these are sent to the graphics card for renderings where it breaks down the vertices into polygons and wraps them in textures based on coordinates. — Jarrod Christman, Jul 14 '14 at 18:37
@JarrodChristman ok thanks I'm watching it now. Guides section is very interesting — incud, Jul 14 '14 at 18:40
You then have various buffers that it generates that represent depth, transparency, lighting (post process effects), and others which are part of a larger process that rasterizes the vector format into something you're probably more familiar with (pixel data). All of this occurs in a secondary buffer (the primary contains the last frame), and then the frames are swapped and the whole process repeats. This is such a complex topic, without even getting into GPU architecture, that you will have to do a lot of reading yourself on it. — Jarrod Christman, Jul 14 '14 at 18:40

score 4 · Accepted Answer · answered Jul 14 '14 at 10:08

4

1) It's not entirely secret: http://docs.nvidia.com/cuda/cuda-binary-utilities/index.html#axzz37R1Qdkjl

There probably isn't a complete documentation of the architecture and some details of the mnemonic->opcode mapping may be obfuscated, but nor is it completely invisible. The invisible part is the supervisory processor on the card which handles transfers in and out of memory and schedules the other processors. That will be running one of the "binary blobs" loaded by the drivers.

Generally manufacturer want to discourage low-level applications because they will break if the manufacturer changes the architecture of the card. Compatibility is bad enough without people targeting specific model numbers of hardware.

The shader processors are best thought of as comparable to MMX or other systems for having one control flow affect an array of registers.

2) "Pipeline" is a concept. Yes, it's made out of software like everything else. The exact division of work between the host CPU and the graphics CPU(s) varies but it's usually set up for stream processing:

Represent scene as a set of triangles, with various attributes on the triangle and vertex
Pass each vertex to a vertex shader program (running in parallel on the various units) to produce a modified vertex
Pass vertexes through "rasterise" to produce a stream of "fragments" (screen location + depth + other attributes like texture and texture coordinates)
Pass each fragment through a shader program, which may write a colour and depth value to the framebuffer
Optionally do the whole thing again with different attributes (e.g. shadow stencil rendering)
Finally hand the entire framebuffer over to the output hardware (streams pixels from RAM). Due to double or triple buffering, this will happen while the next frame is being rendered.

answered Jul 14 '14 at 10:08

pjc50

46,540
4
64
126

Good answer. Just I didn't understand why manufacturers want to hide their supervisory processor software. Isn't it right that low level + good programmers = fastest software (and hardware who run the fastest software is the best) ? – incud Jul 14 '14 at 11:30
2

Low level + anyone but best programmers = broken software, quite often. And low level != portable; you'd get games that *only* work on card 9876GTZ revision B or whatever. Besides, the supervisory software isn't the bottleneck. You're far better off fixing the standardish parts (e.g. http://fgiesen.wordpress.com/2013/02/17/optimizing-sw-occlusion-culling-index/ ) than fiddling around at the low level. – pjc50 Jul 14 '14 at 12:06
1

(also, there's often a certain amount of either trade secrets or ugly hacks or both in the secret software, which they want to keep hidden. There was notoriously a set of drivers which switched to faster but incorrect rendering if the calling process was QUAKE3.EXE) – pjc50 Jul 14 '14 at 12:10
You are completely right. However, it might be interesting going deep in the platform, to obtain the best from your 9876GTZ :) I'm sure that game programmers can't have this approach, but think at something more like the raspberry – incud Jul 14 '14 at 12:13
about QUAKE3, was Ati or Nvidia? It looks like a wonderful idea :) – incud Jul 14 '14 at 13:28
Have a look at https://github.com/hermanhermitage/videocoreiv/wiki/VideoCore-IV-Programmers-Manual and related material on github. – pjc50 Jul 14 '14 at 13:57
Ok thanks :) I think I've enough material for two month or so :) – incud Jul 14 '14 at 17:05
Oh yes I read about that :) very userful (the soc isn't completely open, is it? there is another binary blob for the DSI interface that can't be used yet) – incud Jul 14 '14 at 17:09