How does my screen driver handle so much data?

Question

I just did some quick calculations:

On my MacBook I have a resolution of 2560x1440 multiplied by 24 Bit for colors we get 11.05MB for a single picture or 663MB per second at 60 fps.

I guess there is some compression, but for example when I move with three fingers over my touch pad thats quite random, what happens next on the screen and almost every pixel changes. Same as with almost every other interaction.

Please explain if my calculations are wrong and how is this data transported from my graphics card to my screen? How wide are buses between my graphics card and my screen? Perhaps explain in a nutshell how a display does store pixels? Shift registers? Cache?

There are a few bus protocols, but for laptops I guess the most common is the 4 lane LVDS(http://www.sharpsma.com/download/LVDS-Interfacing-AppNotepdf). No compression. The display usually don't store or buffer the data. The image goes from the bus directly to the LCD cells. — user3528438, Jan 08 '17 at 22:46
"Perhaps explain in a nutshell how a display does store pixels?" The display doesn't actually store the pixel data, that's all handled by the RAM on the graphics card (or system RAM for integrated graphics). The RAM easily has several GB/s bandwidth. Frames are sent to the display following a protocol, so the data is available as it is required for display. — ks0ze, Jan 09 '17 at 03:27
Why do you think there is compression? What do you think happens when the data is not compressible? Do you think the display falls behind? — user541686, Jan 09 '17 at 06:18
@Mehrdad If the display falls behind, you (probably) would see [Screen Tearing](https://en.wikipedia.org/wiki/Screen_tearing). — Ismael Miguel, Jan 09 '17 at 09:42
@IsmaelMiguel: I mean, I wasn't saying it's impossible to hack around it, I was just saying it seems a little weird to design a display whose correct operation depends on what you're displaying. — user541686, Jan 09 '17 at 10:37
You can sometimes get compression in the link; that's how HDMI extenders (wired and wireless) work. e.g. https://blog.danman.eu/reverse-engineering-lenkeng-hdmi-over-ip-extender/ using MJPEG. — pjc50, Jan 09 '17 at 10:40
@Mehrdad I thought about compression like that used with JPEG, don't understand me wrong, it's not like I thought it could be the exact same thing but some sort of algorithm, coordinate transformation to reduce the amount of data transmitted. — Felix Crazzolara, Jan 09 '17 at 11:52
@Aresloom: Ah, I see. Lossy compression would be one way, yeah. :) I thought you were thinking of lossless! — user541686, Jan 09 '17 at 12:30
@ks0ze some newer panels do store the last screen they were send. Caching it there uses less power than the GPU pumping a static image out 60 times a second. http://www.anandtech.com/show/7208/understanding-panel-self-refresh — Dan Is Fiddling By Firelight, Jan 09 '17 at 19:05
Do modern laptop displays still use LVDS? A half dozenish years ago the GPU companies were planning to phase it out at the same time as VGA (which is gone from the current generation of GPUs native output) in favor of embedded displayport. — Dan Is Fiddling By Firelight, Jan 09 '17 at 19:06

Tom Carpenter · Accepted Answer · 2017-01-09T00:57:50.250

Your calculations are correct in essence. For a 1440p60Hz signal, you have a data rate of 5.8Gbps once you allow for blanking time as well (non-visible pixel border in the image output).

For HDMI/DVI, a 10/8b encoding is used, which means effectively although you have say 24bit of colour data per pixel, actually 30bit is sent as the data is encoded and protocol control words added. No compression is done at all, the raw data is sent, so that means you need 7.25Gbps of data bandwidth.

Again looking at HDMI/DVI. It uses the "TDMS" signalling standard for data transfer. The HDMI V1.2 standard mandates a maximum of 4.9Gbps for a Single-Link (3 serial data lines + 1 clock line), or in the case of Dual-Link DVI a maximum of 9.8Gbps (6 serial data lines, I think). So there is more than sufficient bandwidth to do 1440p60 through a Dual-Link DVI, but not through a HDMI V1.2.

In the HDMI V1.3 standard (most devices actually skipped to V1.4a which is the same bandwidth as 1.3), the bandwidth was doubled to around 10Gbps which would support 1440p60, and is also enough bandwidth for UHD at 30Hz (2160p30).

DisplayPort as another example has 4 serial data streams, each capable (in V1.1) of 2.16Gbps per stream (accounting for encoding), so with a V1.1 link you could do 1440p60 easily with all 4 streams. They have also release a newer standard, V1.2 which doubles that to 4.32Gbps/stream allowing for UHD @ 60Hz. There is a newer version still which they have pushed even further to 6.4Gbps/stream.

Initially those figures sound huge, but actually not so much when you consider USB 3.0. That was released with a data rate of 5Gbps over just a single cable (actually two, one for TX, one for RX, but I digress). PCIe which is what your graphics card uses internally nowadays runs at up to 8Gbps through a single differential pair, so it is not all that surprising that external data interfaces are catching up.

But the question remains, how is it done? When you think about VGA, that is comprised of single wires for R, G, and B data which are sent in an analogue format. Analogue as we know is highly susceptible to noise, and the throughput of DAC/ADCs is also limited, so that massively limits what you can push through them (having said that you can barely do 1440p60Hz over VGA if you are lucky).

However with modern standards we use digital standards which are much more immune to noise (you only need to distinguish high or low rather than every value in between), and also you remove the need for conversion between analogue and digital.

Furthermore the advent of using differential standards over single ended helps significantly because you are now comparing the value between two wires (+ve difference = 1, -ve difference = 0) rather than comparing a single wire with some threshold. This means that attenuation is less of an issue because it affects both wires equally and attenuates down to the mid-point voltage - the "eye" (voltage difference) gets smaller, but you can still tell whether it is +ve or -ve even if it is only 100mV or less. Single ended signals once the signal attenuates it might drop below your threshold and become indistinguishable even if it still has 1V or larger amplitude.

By using a serial link over a parallel one, we also can go to faster data rates because skew ceases to be an issue. In a parallel bus, say 32bit wide, you need to perfectly match the length and propagation characteristics of 32 cables in order for the signals not to move out of phase from one another (skew). In a serial link you have only a single cable, so skew can't happen.

TL;DR The data is sent at the full bit-rate you calculated (several Gbps), with no compression. Modern signalling techniques of serialised digital links over differential pairs make this possible.

Also, I think that some HDMI displays used 2 HDMI 1.2v links to get the image, effectivelly splitting the screen into 2. That would give a combined bandwidth of essentially 9.8Gbps. — Ismael Miguel, Jan 09 '17 at 09:39
Another benefit of using differential signals is rejection of common mode noise. In a single-ended signal, noise could cause a bit to flip. In a differential pair, the noise is picked up equally between each conductor, but the potential difference should still be the same (that `+ve` and `-ve` Tom mentions in his answer). — Steve, Jan 09 '17 at 21:22
I'd be grateful if you could add a bit more about, how data rates of 5GBit/s are possible since this 5GHz is the upper range of clock rates nowadays? Also when I read about TMDS I saw numbers like 340MHz, whats the actual clock frequency used? I hope this is not out of scope of the actual question. — Felix Crazzolara, Jan 09 '17 at 21:51
@Aresloom 5GHz is the point at which pretty much all **CPUs** start to melt due to the sheer number of transistors switching simultaneously and generating massive amounts of heat. It doesn't mean 5GHz is the highest clock for *everything*, it comes down to heat (and what material you use - silicon isn't always best). The best example I can think of is a Keysight Infinnium DSAX96204Q in which each of the four front ends has an Indium Phosphide sampler that ticks away at *80GHz!* But that sampler only has a couple dozen transistors in it and it burns several watts (modern CPUs have *billions*) — Sam, Jan 09 '17 at 22:52
@Aresloom 340MHz would be the clock rate, the data rate is serialised such that (in TDMS for example) 10 bits are sent on a cable in each clock cycle - so a 340MHz clock would yield 3.4Gbps. It's only the cables and (de)serialisation (SERDES) hardware at the periphery of the video ICs that run at those serial data rates. After the SERDES hardware, internally there is a parallel bus again running back down at the lower clock rate. SERDES blocks can run very fast - PCIe is 8Gbps per lane, so the SERDES blocks run at 4GHz (using both clock edges for bits - DDR). — Tom Carpenter, Jan 09 '17 at 23:02
CPUs hit the limit around 5GHz or so but they are not doing calculations on a serial data bus, but rather a parallel data bus - 5GHz on a 64bit data bus is 320Gbps, so much faster than your video signal. As Sam says they run into issues much beyond that speed due to the shear amount of heat as a result of having billions of switching transistors doing billions of calculations a second. The SERDES hardware used in these high bit rate data buses are not doing anywhere near as much stuff - that means fewer transistors and less heat generated, and as a result you are less constrained in speed. — Tom Carpenter, Jan 09 '17 at 23:06
I did some more research and found this: http://www.cedia.co.uk/cda_/images/Resources/4K-Industry-Whitepaper_ENG.pdf Seems to be a good extension to your explanation for novices like me to understand the correlation between Data/s and Clock frequency used. @Sam ok, so the actual sampling speed in the SERDES block is same as the bitrate. And 8b/10b is only used to have 0 DC current? — Felix Crazzolara, Jan 10 '17 at 07:32
8b/10b improves error rejection because it's easier to detect *transitions* than *states* in fast signals. That encoding guarantees that there won't be too long a run of consecutive ones or zeroes. — pjc50, Jan 10 '17 at 13:20
@Aresloom the next wifi frequency would be [60GHz](http://www.eetimes.com/author.asp?section_id=36&doc_id=1329486). And with proper cooling CPUs can easily execute at 6 or 7GHz. The CPU clock rate and data transfer frequency is not related to each other. Moreover you can transfer more than one bit in a clock with various techniques like multiple voltage levels, [quad-pumped](https://en.wikipedia.org/wiki/Quad_data_rate)... — phuclv, Jan 11 '17 at 11:23
+1 for a quality answer, and also helping me understand why almost everything (well, buses) has gone serial recently: skew. — curiousdannii, Jan 11 '17 at 13:50
@curiousdannii It's an interesting cycle isn't it. First we started off with serial (e.g. UART) which was way too slow (lets say 115kbps max). Then we went to parallel buses like IDE which topped out at about 66MHz @ 16bit, so 1Gbps or so. Then we went back to serial because it turned out we can go very quick with differential buses. But now serial isn't fast enough again, so we go with a sort of parallel serial - multiple lanes of individual serial buses which can be are essentially handled completely separately and then any lane to lane skew is corrected with FIFOs. — Tom Carpenter, Jan 11 '17 at 14:02

score 20 · Answer 2 · edited Jun 11 '20 at 15:10

Modern computers are surprisingly fast. People will happily load up full HD 30fps videos without realising that that involves billions of arithmetic operations per second. Gamers tend to be slightly more aware of this; a GTX 1060 will give you 4.4 TFLOPS (trillion floating point operations per second).

Please explain if my calculations are wrong and how is this data transported from my graphics card to my screen?

How wide are buses between my graphics card and my screen?

Another answer has addressed the multi-gigabit nature of HDMI, DisplayLink etc.

Perhaps explain in a nutshell how a display does store pixels? Shift registers? Cache?

The display itself stores, in theory, no image data.

(Some displays, especially televisions, store a frame or two to apply image processing. This increases latency and is unpopular with gamers.)

The graphics subsystem of a computer stores pixels in ordinary DRAM. It doesn't usually redraw the whole thing from the processor every frame, but hands some of the functionality off to dedicated subsystems and a compositor. A compositor will allow e.g. each window on the desktop to be stored as a distinct set of pixels, which can then be moved, scrolled or zoomed by the dedicated hardware. This becomes quite obvious with scrolling on mobile devices - you can go a short way until you run out of "offscreen" pre-computed pixels and the software has to stop and render some more to the compositor's buffers.

Games are redrawn every frame, and there's plenty of literature on how a scene is built up. This is built up into a framebuffer on the graphics card which is then transmitted out while the next frame is drawn into a different buffer.

Video decoding is usually given to dedicated hardware too, especially H.264.

score 11 · Answer 3 · edited Mar 20 '17 at 10:18

The link between display card and LCD panel is carried over several high-speed differential pairs using TMDS signaling, usually called "lanes". Typically four lanes are used, so one can say that the bus is 4-bit wide. For some more details there is a stackhexchange answer.

Each LCD panel model is usually produced with several interface incarnations, so one needs to be careful and look at suffixes when trying to replace a broken panel. Most modern digital link (HDMI 1.4) has 10.2 Gbps, or just 2.5 Gbps per lane. In your calculations (663 MBps) it totals to 1.2 Gbps per lane (assuming 4 lanes), which is not that much (for example SATA3 has 6Gbps).

ADDITION on LCD panels. The active-matrix LCD actually tries to store the frame image (pixel data) in capacitors associated with "Twisted Nematic Cells" (the one that controls film polarization). The problem is that the size of analog storage caps must be a trade-off between storage time and speed of pixel switch. So it can't be made large, loses stored potential fast, and therefore requires periodic refresh. Each pixel cell is connected with data and address lines via a transistor ("active" element), see this Tomshardware article. The LCD driver-controller muliplexes data and address lines in line-by-line fashion thus maintaining the displayed image. The image itself is stored in a frame buffer (RAM) inside the graphics controller.

How does my screen driver handle so much data?

3 Answers3