Efficiently "moving" data upward through a communication stack

Question

I have implemented an application protocol stack that moves an incoming stream of data upward through several layers, as follows:

copies a TCP segment from an OS buffer to my_buffer.
after identifying a record boundary, splits the record in my_buffer into tab-separated strings which are copied into deque<string> my_deque (rather than into a vector because I then have to immediately pop a couple of fields from the front)
copies my_deque to vector<string> my_records[n] where it is presented without further copying to the application.

I'm wondering whether it's best to stick to a clean architecture (layering) and pay the price (whatever that is) of copying payload from layer to layer, or if there are some simple optimizations that people use.

Is it customary to use separate buffers for separate layers, or is this something which can easily be optimized away without dangerously compromising the independence of the layers?

how hard would it be to pass references into my_buffer upwards (inserting `\0` or adding lengths where needed) and copy as late as possible — ratchet freak, Aug 25 '13 at 02:21
I was wondering whether there might be some way of doing this. my_buffer would have to be larger (right now I reclaim buffer space after step 2) and I'd have to write my own split algorithm (I think) rather than using `boost::algorithm::string::split.hpp`. That might be a good way of doing it. — Chap, Aug 25 '13 at 04:35

score 4 · Answer 1 · answered Aug 25 '13 at 05:30

I've had some experience in optimizing network code, both native and .NET. Things that I've fond that are important include:

How many messages/second are being transferred?
How many megabytes/second are being transferred?
Is this a garbage-collected language? (In your case, no)
What are your app requirements?

If 1 is "small" (50 per second or less?), then your network code frankly isn't what anyone will notice. You should instead optimize for touch-friendliness, or improve your customer's workflows of something; you'll get more bang for the buck. For server scale, you'll need to worry more about thread affinity than about memory copies. If it's large, a number of OSes now include special network APIs to handle the shear number of calls -- Windows, for example, has the RIO APIs in Winsock for just this.

If 2 is "big" (> 100 megabytes/second), then you need to worry about extra copying on servers but only because of the memory pressure. For a VM environment especially, that memory pressure will often be what keeps your VM from handling more clients, which instantly leads to having to run more VMs.

If 3 is true and #1 is "big" and you're on a server, you have to worry about memory pinning and object lifetime. But don't let that drive you to avoid GC languages; there are lots of big programs that do networking in GC languages.

And #4 is truly important. Do you have extreme latency requirements like high-speed stock traders? Have you measured your performance?

I rarely meet up with client-side programs where optimizing the number of buffers copied made any difference that users could see. For servers, I generally first see that there's an unexpected bottleneck that hits long before anything else. There's a bunch of techniques to help -- receive-side scaling, various async techniques, and special buffer-management calls.

TL;DR: if you're not a server, keep your code clean. If you are, measure first, optimize second. And "optimize" includes lots of techniques, of which "copy less memory" is one.

I expect (1) 700 messages/sec, and (2) under 1MB/sec. (3) C++ (non-GC). (4) We have to process 2.5M messages/hour, which is where I got 700 per sec. Totally agree on measure first / optimize second. ("TL;DR:" ?) — Chap, Aug 25 '13 at 06:10

score 1 · Answer 2 · answered Aug 25 '13 at 04:09

1

Rather than solve this problem again, I would highly recommend trying out http://kentonv.github.io/capnproto/. It is being developed by the person who was maintaining https://developers.google.com/protocol-buffers/ internally at Google, and is specifically designed to take advantage of all of the good ideas that he had while doing that about how to make things faster for network communication. (Protocol buffers themselves are already far more efficient than what most people do over the wire, and are extensively used internally at Google for exactly that reason.)

answered Aug 25 '13 at 04:09

btilly

18,250
1
49
75

I had a go at reading that link. Totally over my head. – Chap Aug 25 '13 at 04:26
@Chap It describes a library that you can think of as, "JSON for C++, except really freaking fast." If you need to send messages over the wire, you should consider using it. – btilly Aug 25 '13 at 07:33

score 0 · Answer 3 · answered Aug 25 '13 at 19:39

Avoid copying at all costs. If you look into the history of network protocol implementations, you'll find that, time and again, the real high-bandwidth optimizations come from building "zero copy" protocols and algorithms. Modern OSes work very hard to pass packets upstream from the interrupt handler through to the IP and sometimes higher layers without copying the data, for very good reasons.

Efficiently "moving" data upward through a communication stack

3 Answers3