5

I need to process 2 million messages per second (perhaps in a scale out configuration) and route each message to a N delegates or multicast delegates.

Question

How should I structure my application in C# so that I can achieve the best performance when receiving the message, and routing it to the correct delegate?

Additional Details

Each inbound message has the following properties in the format of a JSON array:

  • Customer
  • Category
  • TopicID
  • DateTime
  • Data[]

Each message will be processed by a delagate function that is interested one or more of those properties.

Simple examples of delegate processing needed:

  • A customer counter function may count the quantity of messages per customer,
  • A category counter may count the quantity of messages that counter
  • A customer-counter will count messages unique to that customer.
makerofthings7
  • 6,038
  • 4
  • 39
  • 77
  • Are the delegates regularly changing, or are they merely there for future use? Are most messages going to most delegates? Regardless, this seems tailor made for something like [Steam Insight](http://msdn.microsoft.com/en-us/library/ee362541.aspx) where you can specify/compile standing queries to run across an event/input stream. – Telastyn May 09 '12 at 15:53
  • @Telastyn What a great find! That seems to be exactly what I need.. – makerofthings7 May 09 '12 at 15:54
  • StreamInsight can handle up to 10,000 events per second. not 1M – Steven A. Lowe May 09 '12 at 15:55
  • I'm assuming I'll need to scale out regardless of my solution, and the aggregation of data will have a delay. I'm thinking of a MapReduce -ish type solution in addition to what comes up here. – makerofthings7 May 09 '12 at 16:04
  • 4
    @Steven: no single server will stand 200Gbps stream of data, it needs massive scale-out. – vartec May 09 '12 at 16:06
  • @StevenA.Lowe: 10,000 events per what? Per server? Per process? Based on what hardware assumptions? If it is per process, and can be fully parallelized, and you are talking about affordable server hardware today, it should be possible by 50 quad core servers, I guess? – Doc Brown May 09 '12 at 18:08
  • "Ability to handle up to 10,000 data events per second." http://msdn.microsoft.com/en-us/library/ee391416.aspx no hardware specified. be wary. – Steven A. Lowe May 09 '12 at 19:24
  • The key thing to state for this problem is what the data durability requirements are. One of the most constraining factors here is, what needs to hit disk, and when. For example, do you only need to persist the aggregated data, or do you need to store the raw messages? What are your requirements for latency and consistency? (Hint: global consistency, low latency, partition resistance: pick two). – James Youngman May 09 '12 at 23:16

1 Answers1

5

200Gb of data per second?

To do this you're not going to get too much help with C#, all the High Performance Computing applications are written in C/C++, so for a start, you should be looking to go this direction simply to leverage their expertise, libraries and frameworks.

That said, a lot of telecoms work is done using functional languages, and they deliver input streams of a magnitude similar to what you're expecting, so try Haskel too.

In any case, your first task is to split that input stream to several servers, then pass them to more servers for processing. In these cases, you don't need delegates, you need a full message passing architecture that is stateless and memory efficient. Try OpenMPI for some hints.

gbjbaanb
  • 48,354
  • 6
  • 102
  • 172