design for buffering or queuing data streams to replace database

Question

We have a system (ms stack, .net, sql) that receives data from thousands of remote devices (several independent readings/min). We currently save all the data to a db as it arrives and a second service reads and processes the data - using the database to 'buffer' the input.

We want the second process to be scaleable as it can take quite a while to complete. Ideally we should be able to run multiple instances of the second process but we have to make sure that the data is processed in a specific order (cannot have two processes working on data from the same remote device at the same time). We've been using the database to manage this by reading all the data from one device at a time and preventing other processes from reading data from that device until its complete.

This is stating to show performance problems and has high db traffic so we are looking for alternative architectures to using the DB as a buffer.

Can object caching systems like memcache allow us to retrieve all data for 1 device into one process and prevent that data being used by another process?

Or are there message queuing systems that would do this?

Or something else?

-EDIT-

Reading the data back to be processed needs to be done by machines on different servers so I'm looking for something that can cross application / process boundaries and maintain locks on some data items to preserve the order they are processed

Why do you do use a database? To handle synchronization issues? For persistence? If persistence is not needed, the synchronization can all be done in one process. You can put the data in memory and use WCF to extract it in locked chunks or pages per processing process. — Frank Hileman, Oct 02 '14 at 22:43
@Frank - we do need to persist the data and have tried putting the data in memory but wanted to be able to process the data we've read on different machines as even multiple processes on the same machine aren't always fast enough, so need something that can cross app domains - a WCF service approach may work though. — Matt, Oct 03 '14 at 12:02

Peter Ritchie · Accepted Answer · 2014-10-03T15:48:50.963

2

The problem with "a database" is that it is generally a single-point of failure/scalability. You can scale databases, but it's difficult and costly depending on what scale you're talking about.

Typically, what you describe are implemented as messages. Message queues would then store and forward the messages to multiple readers or clients. You can scale out the number of readers (in addition to the number of queues and communication amongst the queues) that is generally considered a very scalable architecture.

I've worked with RabbitMQ to do exactly these types of things; but there are many vendors with various features.

edited Oct 03 '14 at 15:48

answered Oct 02 '14 at 23:20

Peter Ritchie

181
1
7

What I'm wondering about message queues is if they have the ability to prevent certain messages being read. eg: process 1 reads a message(m1) from device A, process 2 shouldn't be allowed to read any message from device A before process 1 has completed its work on message(m1), but should be allowed to read a message from a different device. Sort of locking some messages in the queue. – Matt Oct 03 '14 at 12:07
1

I don't think that's an inherent ability of any message query perse. You might be able to model that with pub/stub, but certainly directly in the readers. I think what you really want it's to model that with messages thought (send message to process2 when 1 is done, for example. – Peter Ritchie Oct 03 '14 at 12:24
1

Excellent answer. On a project I work on, we use MSMQ (System.Messaging) for message queuing, and expose message queue operations via WCF fast interfaces (tcp and custom binary serialization). A message in MSMQ can simply be a byte array, so you can use your own optimized serializer. With MSMQ you can either lock a queue or let multiple things poll it. It is probably faster than a database, if you optimize both the use of MSMQ and the performance of WCF. Use private queues; don't use MSMQ to access queues remotely, for performance. – Frank Hileman Oct 03 '14 at 15:39
1

RabbitMQ for the win, I've also used it for similar tasks and it is great to work with. Having also used MSMQ i must say i prefered RabbitMQ. – user1450877 Oct 03 '14 at 15:45
@user1450877: I tried RabbitMQ as well, but it did not seem as well supported on the platform we were using (windows). Also MSMQ seems to be extremely robust, in terms of never losing messages. – Frank Hileman Oct 09 '14 at 18:43
@FrankHileman RabbitMQ is equally as robust but it also provides a lot more features, if you don't need those features then MSMQ is probably a bit easier to work with using visual studio and the .net framework. – user1450877 Oct 09 '14 at 19:20

design for buffering or queuing data streams to replace database

1 Answers1