Low latency TCP sockets in .NET

Question

Is it possible to optimize a .NET application running on a server version of Windows for near-zero latency TCP communication? Or will there always be unpredictable/unavoidable delays?

For example, while searching for low-latency open source apps I found this OpenPDC application on Github which seems to be the de-facto standard application for "high-performance data collection" used by numerous utilities around the world. It's even used in academic articles as the tool for assessing network delays for the measurement devices. And yet, the application is written in .NET, and runs on Windows.

I am aware of "techniques" used to reduce GC in general (less garbage generation, static allocation, and similar stuff), but I still had the general idea that that Windows is "not a real-time OS", and that nothing can prevent the GC from pausing your app - you can only delay the inevitable.

Has something changed in recent .NET/Windows Server versions which would allow this kind of applications to run with near zero latency? Is it possible that this and other similar applications are written in a way which completely prevents "stop-the-world" garbage collection/blocking due to Windows non-real-time nature, or is it unrealistic to expect this to be guaranteed?

score 8 · Accepted Answer · answered Apr 07 '17 at 22:08

For example, while searching for low-latency open source apps I found this OpenPDC application on Github which seems to be the de-facto standard application for "high-performance data collection" used by numerous utilities around the world. It's even used in academic articles as the tool for assessing network delays for the measurement devices. And yet, the application is written in .NET, and runs on Windows.

The number of applications that have hard real time requirements may be fewer than you think. Also keep in mind that high-performance may mean high throughput, not low latency. These are two different metrics, sometimes at odds.

If you prefer low-latency, disable Nagle's algorithm: https://msdn.microsoft.com/en-us/library/system.net.sockets.socket.nodelay(v=vs.110).aspx .

I am aware of "techniques" used to reduce GC in general (less garbage generation, static allocation, and similar stuff), but I still had the general idea that that Windows is "not a real-time OS", and that nothing can prevent the GC from pausing your app - you can only delay the inevitable.

Concurrent collectors can collect some garbage in background threads without pausing the world. Many GCs also give you some ability to control when GC pauses can happen, including .NET's:

https://msdn.microsoft.com/en-us/library/system.gc.trystartnogcregion.aspx

Even before this, unless your generating a lot of garbage, GC pauses are often short enough to not be a problem - especially in the context of networking where 100ms+ delays just routing your packets over the internet through several routers are not uncommon. The GC potentially pausing your app for even a few milliseconds in this context is probably not a problem. General purpose operating systems can suspend your threads for longer, just to share your CPU with other processes.

Has something changed in recent .NET/Windows Server versions which would allow this kind of applications to run with near zero latency? Is it possible that this and other similar applications are written in a way which completely prevents "stop-the-world" garbage collection/blocking due to Windows non-real-time nature, or is it unrealistic to expect this to be guaranteed?

There's a big difference between "running with near zero latency most of the time", and "guaranteed to run with near zero latency all the time, under any and all circumstances."

I'd argue the former has been quite possible in Windows and .NET for quite some time - the most recent change I'm aware of being the adoption of concurrent GCs - whereas the latter might mean TCP itself hasn't ever been an option for you, for not guaranteeing enough.

Thanks, these are all good points. The issue that I was facing (with my app) is that when communicating with the equipment connected directly to a switch (i.e. no complex routing from device to PC), the .NET app would frequently experience delays which are not visible in Wireshark running on the same machine. I.e. 1% of packets are delayed by up to 50ms, and (say) 0.1% go up to 100ms (and once in a while even more). I also understand the throughput vs latency difference, that's why I was surprised to see OpenPDC being used to evaluate TCP latency so I thought there was something more to it. — user7834712, Apr 08 '17 at 09:14
When sending 20, 30 or 100 measurements per second, this 1% may mean "once per second", with longer delays being seen every minute, so that's why "most of the time" doesn't seem like "most of the time" in practice. Also, nagle algorithm is not the issue since transmitting devices don't use it. So my concern right now is whether to stick with .NET and do these tweaks, or to ditch it altogether and write in an unmanaged language. — user7834712, Apr 08 '17 at 09:21
50-100ms sounds pretty high - I'd want to profile exactly what's to blame, there may be some low hanging fruit that's easy to knock out. Some resources on GC-specific performance stuff: https://msdn.microsoft.com/en-us/library/ee851764(v=vs.110).aspx https://msdn.microsoft.com/en-us/library/ee787088(v=vs.110).aspx https://msdn.microsoft.com/en-us/library/ms973837.aspx — MaulingMonkey, Apr 08 '17 at 19:22

Low latency TCP sockets in .NET

1 Answers1