Where to start when considering making a GPU?

Question

I saw this video the other day and it got me thinking about how to go about and design something like the GPU. Where would you begin? I'm more just interested in reading about how they work and not making one out of TTL(yet anyway).

I know this sounds like a 'how do you make a programming language' question but any starting points would be good as I have no idea where to start looking.

Are you interested in "high speed 3D graphics", or "how to drive a CRT/LCD" — Toby Jaffey, Jun 22 '11 at 21:00
@Joby atm just displaying something on a display. A square of colour would be nice. — Dean, Jun 22 '11 at 21:02
Can someone explain to me why I got a down vote? So i can resolve any issues with the question. — Dean, Jun 22 '11 at 21:27
The difficulty I see with this question is that there is a LOT of ground between generating just a monochrome 80x25 character display, what might once have been called a video display generator, and what is meant by 'GPU'. The hint that you might want to make one 'out of TTL' puts you much closer to the old 80x25 display generator end of things. — JustJeff, Jun 23 '11 at 00:19
@JustJeff, Ok I didn't know what else they were called, why are they so different then if they do a similar job? — Dean, Jun 24 '11 at 08:41
Also why did I get another downvote could you explain why you downvoted? — Dean, Jun 24 '11 at 08:44

score 17 · Accepted Answer · 2011-06-22T21:20:36.690

That's kinda like going to your collage final exam for science class and having this as your question: Describe the universe. Be brief, yet concise. There is no way possible to answer that one in any practical way-- so I'll answer a different question.

What are the kinds of things I need to know before attempting to design a GPU?

In a rough chronological order, they are:

Either VHDL or Verilog.
FPGA's (useful area to play with writing digital logic).
Basic data-path stuff, like FIFO's.
Bus interfaces, like PCIe and DDR2/3 interfacing
Binary implementations of math functions, including floating point, etc.
CPU design.
Video interfacing standards.
High speed analog stuff (the analog side of high speed digital)
PLL's and other semi-advanced clocking stuff.
PCB design of high speed circuits.
Low voltage, high current DC/DC converter design.
Lots and lots of software stuff.
And finally, ASIC or other custom chip type design.

I will also dare say that you won't be making this kind of thing out of TTL logic chips. I doubt that you could get a reasonable DDR2/3 memory interface working with normal TTL chips. Using a big FPGA would be much easier (but not easy).

Going up to step 6 will probably be "good enough to quench your intellectual thirst". That could also be done within a reasonable amount of time-- about a year-- to set as a short-ish term goal.

EDIT: If all you want to do is spit out a video signal then it's relatively easy. It is, in essence, a chunk of memory that is shifted out to a display at 60-ish Hz. The devil's in the details, but here's a rough outline of how to do this:

Start with some dual port RAM. It doesn't have to be true dual port ram, just some RAM that a CPU can read/write and that your video circuit can read. The size and speed of this RAM will depend on what kind of display you're driving. I personally would use DDR2 SDRAM connected up to the memory interface of a Xilinx Spartan-6 FPGA. Their "memory interface generator" core (MIG) makes it easy to turn this into a dual-port RAM.

Next, design a circuit that will control how this RAM is read and spit this data out a simple bus. Normally you just read the RAM sequentially. The "simple bus" really is just that. It's some bits with the pixel value on it-- and that's it. This circuit will need to do two more things: it will have to go back to the beginning of RAM every video frame and it will have to "pause" the output during the horizontal/vertical retrace periods.

Thirdly: make a circuit that will output the video control signals (HSync, Vsync, etc.) as well as tell the previous circuit when to pause and restart. These circuits are actually fairly easy to do. Finding the appropriate video standard is harder, imho.

And Finally: Connect the control signals and video pixel data bus to "something". That could be a small color LCD. It could be to a video DAC for outputting a VGA compatible signal. There are NTSC/PAL encoders that would take these signals. Etc.

If the resolution is really small you might get away with using the internal RAM of the FPGA instead of an external DDR2 SDRAM. I should warn you that if DDR2 SDRAM is used then you'll probably require a FIFO and some other stuff-- but that too isn't terribly difficult. But with DDR2 SDRAM you can support fairly high resolution displays. You can also find FPGA development boards with integrated VGA DAC's and other forms of video outputs.

Wow not a short task then. I understand there was no concise answer. But you have given me a good starting point and I will be having to do this in my very limited spare time. But should be an interesting experience. — Dean, Jun 22 '11 at 20:32
@Dean Hmmm... There are THREE different things here: CPU's, GPU's, and something to spit out a video signal. It's easy to make something to spit out a video signal. A GPU is more like a CPU that is designed to do video/graphics related processing: 3-D graphics, 2-D graphics acceration, etc. If you just want something to spit out a video signal then you're set. If you want 3-D graphics or even semi-advanced 2-D then you'll need to go through my list. — , Jun 22 '11 at 20:57
How is it easy to spit out a video signal? I think this would make a better first step. — Dean, Jun 22 '11 at 21:03
@Dean I edited my answer to include stuff on how to spit out a video signal. — , Jun 22 '11 at 21:21
@David: Yes, I think that's all someone can hope to do on a breadboard. I used to design display controllers for Raster Technologies, Apollo Computer, and helped ATI get into 3D. These things have been custom or semi-custom silicon for a long time now. Some modern GPUs have more transistors than the CPU driving them. Lighting, shading, texture mapping, interpolation, perspective correction, and rasterizing take a lot of computes. — Olin Lathrop, Jun 22 '11 at 21:25
@Olin "and helped ATI get into 3D" -----> =O You need to write an all encompassing EE textbook that just has everything you know in it. I would do absurd things to get my hands on it. — NickHalden, Jun 23 '11 at 18:10
I did write a book on computer graphics once (ISBN 0-471-13040-0), but its very introductory. Back in the 1990s when ATI only had their MACH64 chips and wanted to get into 3D they hired me as a consultant teach them some of the concepts, get them going, and help with the architecture. The result were the first RAGE chips. I was a graphics guy back then. Check out US patent 5097427 if you don't believe me. However, I think the quadratic interpolation patent (US 5109481) was more important but less flashy. You might recognize some other names on those ;-) — Olin Lathrop, Jun 23 '11 at 18:41

score 9 · Answer 2 · edited Mar 04 '17 at 05:31

9

Racing the Beam is a detailed look at the design and operation of the Atari VCS. It has a thorough treatment of the Television Interface Adapter.

The TIA is about the simplest, practical, GPU.

Understanding a small, but complete, working system can be a good way to learn a new subject.

Complete schematics are available, as is a technical manual.

edited Mar 04 '17 at 05:31

Alexis Tyler

103
4

answered Jun 22 '11 at 20:45

Toby Jaffey

28,796
19
96
150

Atari 2600 rules! Most game systems use hardware to generate the display, but the 2600 does it all by magic. Compare something like Combat or even Asteroids to something like Toyshop Trouble (Asteroids and Toyshop Trouble are both 8K). Combat shows two single-color objects with 2-line resolution; Toyshop Trouble shows 16 objects with single-line resolution and per-line coloring (and no flicker). No extra hardware for Toyshop Trouble beyond a bank-switcher to allow 8k of code. Just some clever coding and some magic. – supercat Jun 22 '11 at 21:50
BTW, 2600 programming may be obscure, but one PSOC-based video-overlay design I did for a customer felt rather 2600-ish. Configure the on-chip hardware to generate some of the timings, and use code to feed data to an SPI slave so it can get clocked out as pixels. – supercat Jun 22 '11 at 21:57
unbelievable that all the game code had to execute during the beam retrace times – JustJeff Jun 23 '11 at 00:20

score 5 · Answer 3 · answered Jun 23 '11 at 00:58

If you just want to put some stuff on the screen, and think you might really, really enjoy wiring, you could aim for an early 1980-ish character graphics system. If you can hit the timing for RS-170A, you might even be able to push the signal into a spare AV input on a 50" plasma television, and go retro in a big way.

Some early systems used their 8-bit CPUs to directly generate the display, examples being the 6507 in the Atari 2600 and the Z-80 in the Timex Sinclair ZX-81. You can even do the same sort of thing with modern microcontrollers. The advantage this way is that the hardware is simple, but the software generally has to be in assembler, and is very exacting, and the results will be truly underwhelming. Arguably the 2600 employed extra hardware, but the TIA didn't have much of a FIFO, and the 6502 (well, 6507, really) had to dump bytes to it in real time. In this approach, there is no standard video mode; every application that uses video has to be intimately combined with the needs of keeping the pixels flowing.

If you really want to build something out of TTL, the next level of complexity would be to go for character-ROM based text display. This allows you to put any of, say, 256 characters in any of for example 40 columns and 25 row positions. There are a couple ways to do this.

One way - do what the TRS80 Model I did. A group of 74161 counters with an assortment of gates generated the video address; three 74157s multiplexed 12 bits of the CPU address with the video address, to feed an address to a 2K static RAM. RAM data was buffered back to the CPU, but fed un-buffered as address to the character set ROM. There was no bus arbitration; if the CPU wanted video RAM, the video system got stepped on, resulting in the 'snow' effect. The muxed video address was combined with some lines from the counter section to round out the low addresses; character ROM output was dumped into a 74166 shift register. The whole thing ran off divisions from a 14.31818MHz crystal. In this approach, you'd have exactly one video mode completely implemented in hardware, like 40x25 or 64x16, etc., and whatever character set you can put in the ROM.

Another way - dig up a so called CRTC chip like a 6845. These combined most of the counter and glue logic, and provided the processor with a control-register interface so you could reprogram some of the timing. Systems like this could be made somewhat more flexible, for example, you might get 40x25 and 80x25 out of the same hardware, under register control. If you get clever about the clock frequencies, you might be able to let your CPU have free access to the video RAM during one half the clock, and the video address generator access during the other half the clock, thereby obviating the need for bus arbitration and eliminating the snow effect.

If you want to go for real graphics modes, though, you'll quickly find that rolling your own is problematic. The original Apple 2 managed it, but that system had something like 110 MSI TTL chips in it, and even so there were some funny things to deal with, like non-linear mapping of the video buffer to the display, and extremely limited color palettes, to name two. And Woz is generally recognized as having had a clue. By the time the '2e' came along, Apple was already putting the video system in a custom chip. The C-64, out about the same time, owed its graphics capabilities to custom chips.

So .. I'd say there about two ways to do it. One way - get your bucket of old TTL out and aspire for an 80x25 one-color text display; the other way - get yourself a good FPGA evaluation board, do the whole thing in VHDL, and start with an 80x25 text display.

score 2 · Answer 4 · answered Jun 22 '11 at 19:54

You would need to start with some computer architecture fundamentals, and in parallel, get started with basic ASIC design using VHDL or other description language.

Once you've learned the basics of computer architecture, I would recommend venturing in to computer graphics, perhaps starting with some simple OpenGL projects. The main take-away here would be getting an idea of the graphics pipeline rendering architecture.

The next step would be thinking of ways this rendering pipeline could be accomplished with dedicated hardware rather than in software.

In terms of actually building a GPU and hooking it up to your computer, I don't think this is feasible to do on an enthusiast's budget, but maybe there is something very basic you can try with with an embedded ARM-linux platform (which exposes a system bus) and an FPGA (the FPGA in this case being your GPU written in VHDL) outputting to a low resolution VGA display as a tie-it-all-together project. This would require writing drivers as well. If you can do it, it would be killer on a resume.

score 1 · Answer 5 · answered Jun 22 '11 at 19:55

1

Look at the high level block diagrams of GPUs from AMD and NVidia. You will probably find quite a bit of info from the opengraphics folks, who are designing graphics hardware that is open source, with open source drivers.

Then you need to look at what you want.

Output, HDMI, DVI or VGA?
Vertex Transformations?
Texturing?
Pixel Shading?
Triangle clipping and rasterization?
Any Texturing?
Raster Operations?

If you haven't done any programming using GPU features, that might be also a good thing to know.

I think Leon has it nailed as well. I'd use Verilog if I did this.

If you are only wanting compsite video, as in the video you posted. There are many examples out there. Heck, look at Woz's implementation of the Apple II. :)

answered Jun 22 '11 at 19:55

Joe

3,609
20
23

1

Has @Leon left a comment? If so I can't view it. – Dean Jun 22 '11 at 19:59
I deleted it. I suggested using an FPGA to implement a simple CPU. I did it some years ago with a design from a book, written in VHDL, that I modified for my hardware. – Leon Heller Jun 22 '11 at 20:11
Ahh ok then thats why i can see it. – Dean Jun 22 '11 at 20:12

score 1 · Answer 6 · answered Jun 23 '11 at 16:47

Sounds like you are not looking to make a GPU (in the sense of 3d and shading an all that) so much as a video generator. Many FPGA eval boards have a connector on them for a VGA or other type monitor and sample projects either from the manufacturer or other users for displaying things on that monitor. There are also some boards with build in LCDs but they tend to be in the $300 and up class, while the basic ones that can drive a standard monitor go for $60-120.

Most FGPAs don't have enough internal memory to do more than a small display, but then many of the boards have external memories with more capacity. A lot of them drive analog VGA monitors digitally, ie, R G and B are either full on or full off, though a few give you two levels and you can probably find one with a video DAC or connector for a digital monitor interface.

Where to start when considering making a GPU?

6 Answers6