Replacement for Queues in RTOS

Question

For Inter-task communication or to share data between two tasks of RTOS, We use Queues. But Problem with Queues is that they are slow.... They copy data in Buffer then Mutex Handling and then Data Transfer. It's irritatingly slow if you have to transfer large data. Another problem is if same queue is accessed by Multiple tasks. Then Picture becomes like this:- First Wait to get access to The Queue then Queue internal Mutex Handling then Data Transfer.

This increases overhead on the system. What could be the Efficient replacement for Queues?

(I guess this question is Independant of RTOS we use. Most of the RTOS handle Queues in this way only)

What do you mean by a queue accessed by multiple tasks? Do you mean posting to the queue or reading from the queue? Multiple tasks should be able to post to a queue with minimal overhead. The RTOS should handle the mutexing so that a post is an atomic operation. For 99% of tasks, you should have a loop that pends on a queue and processes the message. A queue should (usually) only be read by one task. You probably need to look at your design and how you are using queues instead of replacing them. — Erik, Apr 15 '11 at 16:06
@Erik : Sorry! I am using the mechanism you mentioned.... I wanted to say something else and I wrote different.... I'll edit that!! Thanks for Pointing out the mistake! I am waiting for the queue access in my code! — Swanand, Apr 18 '11 at 04:00

score 7 · Answer 1 · answered Apr 15 '11 at 06:58

One easy way is to put a pointer to the data on the queue and consume the data using the pointer.

Note that you're trading safety for performance this way as you have to make sure that:

the buffer remains valid until the consumer has consumed the data
someone deallocates the buffer

If you're not using dynamically allocated memory you don't have to deallocate it, but you still have to make sure that the memory area is not reused before the data has been consumed.

score 7 · Accepted Answer · answered Apr 15 '11 at 12:14

7

Queues operate that way because that is a thread-safe transaction model for inter-task communication. You risk data corruption and/or ownership issues in any less-stringent scheme.

Are you copying the data into a buffer in memory then passing a pointer with the queue elements, or trying to pass all the data in the queue elements themselves? If you're not passing pointers then you'll get an increase in performance doing that instead of passing one byte at a time through queue elements.

answered Apr 15 '11 at 12:14

AngryEE

8,669
20
29

2

I was going to say the same thing. If you just pass pointers to the data in queues you can increase speed, but make sure you do not end up with two threads trying to use and change the data. – Kortuk Apr 15 '11 at 19:59
As @Kortuk said, I need " make sure you do not end up with two threads trying to use and change the data"... Which means increase in overhead... I don't want much processing! :( – Swanand Apr 18 '11 at 03:58
So There is No as such replacement for Queues... Instead of Data Queue, I need to use Pointer Queue! – Swanand Apr 20 '11 at 04:05
1

@Swanand if plan your application such that queues are only unidirectional (ie, you never read the same queue in two tasks) and you process the data stored at the pointer immediately then free it you shouldn't have a problem with sharing the data. There will be increased overhead as you may have to create multiple queues to pass data reliably back and forth but this is the cost of doing business in a multi-tasking environment. – AngryEE Apr 20 '11 at 12:56

score 6 · Answer 3 · answered Apr 15 '11 at 13:44

Lock-free queues can be implemented for the single-producer/single-consumer case, and often you can architect your software to minimize the number of multiple-producer or multiple-consumer queues.

A lock-free queue can be constructed like so: Allocate an array of the elements to be communicated, and also two integers, call them Head and Tail. Head is an index into the array, where the next item will be added. Tail is an index into the array, where the next item is available to be removed. The producer task reads H and T to determine if there is room to add an item; writes the item in at the H index, then updates H. The consumer tasks reads H and T to determine if there is data available, reads data from index T, then updates T. Basically it's a ring buffer accessed by two tasks, and the order of operations (insert, then update H; remove, then update T) ensures that data corruption doesn't occur.

If you have a situation with multiple producers and a single consumer, or a single producer and multiple consumers, you effectively have a resource limitation of some kind, and there's nothing else for it but to use synchronization, since the performance limiter is more likely to be the lone producer/consumer than an OS overhead with the locking mechanism.

But if you have multiple producers AND consumers, it's worth spending the time (in design-space) to see whether you can't get a more coordinated communication mechanism; in a case like this, serializing everything through a single queue definitely makes the efficiency of the queue the central determinant of performance.

I was going to +1 this, but you're incorrect: lock-free queues are possible to implement for multiple readers and writers, they're just more complicated. (look at Michael + Scott's paper on Lock-Free Queues http://www.google.com/search?q=michael%20scott%20queue ) — Jason S, Jun 10 '11 at 12:37
@Jason S - does the Scott paper specifically claim re-entrancy for the lock-free insert and remove operations? If so, if you can extract that and post it, please do, it would be an invaluable asset for many. The reader should note that the cited paper makes use of special machine instructions, whereas my position in the above post assumed no such instructions. — JustJeff, Jun 10 '11 at 22:09
Yeah, the cost of the lock-free algorithms is usually a reliance on CAS or equivalent instructions. But how does re-entrancy come into play here? It makes sense for mutexes + locking structures, but not for data structure operations. — Jason S, Jun 10 '11 at 22:26

score 2 · Answer 4 · answered Apr 18 '11 at 16:12

One can get efficient operation in a lock-free multi-producer single-consumer queue if the queue itself holds items that are small enough to work with a load-store-exclusive, compare-exchange, or similar primitive, and one can use a reserved value or reserved values for an empty queue slots. When writing to the queue, the writer does a compare-exchange to try to store his data into the next empty slot; if that fails, the writer tries the following slot. Although the queue maintains a pointer to the next empty slot, the pointer value is "advisory". Note that if a system uses compare-exchange rather than load-store-exclusive, it may be necessary to have a 'family' of different 'empty slot' values. Otherwise, if between the time the writer finds an empty queue slot and attempts to write to it, another writer writes the slot and the reader reads it, the first writer would unknowingly put his data in a spot where the reader wouldn't see it. This problem does not occur in systems that use load-store-exclusive, since the store-exclusive would detect that the data had been written even though it was written back to the old value.

score 1 · Answer 5 · answered Jun 09 '11 at 20:48

You can access queues more efficiently by writing on top of the queue Normally most of the RTOS does give the support of adding to the front of the queue which doesn't require acquiring of mutex. But make sure you use adding to front of queue as minimal as possible where you just want to execute the data faster. Normally queue structures have max size limit so you may not put all the data in queue hence passing the pointer is always easy.

cheers!!

score 1 · Answer 6 · answered Jun 10 '11 at 12:43

1

Queues are not inherently slow. The implementation of them may be.

If you're blindly copying data and using a synchronous queue, you're going to see a performance hit.

As other posters have indicated, there are lock-free alternatives. The single-producer/single-consumer case is straightforward; for multiple producers and consumers, the lock-free queue algorithm by Michael and Scott (those are their last names) is the standard, and is used as the basis for Java's ConcurrentLinkedQueue.

It's possible to optimize out the need for queues in certain cases, but they provide concurrency guarantees that usually provide huge simplification benefits to systems by allowing you to decouple tasks.

answered Jun 10 '11 at 12:43

Jason S

13,950
3
41
68

From the Michael & Scott paper: "it is the clear algorithm of choice for machines that provide a universal atomic primitive (e.g. compare and swap or load linked/store conditional)". While this may not actually exactly *lock* a thread, there is a form of synchronization going on here. – JustJeff Jun 10 '11 at 22:27
you have a point; it may decrease the concurrency requirement from exclusive access to a memory barrier. – Jason S Jun 10 '11 at 22:38

Replacement for Queues in RTOS

6 Answers6