Lock-free queues can be implemented for the single-producer/single-consumer case, and often you can architect your software to minimize the number of multiple-producer or multiple-consumer queues.
A lock-free queue can be constructed like so: Allocate an array of the elements to be communicated, and also two integers, call them Head and Tail. Head is an index into the array, where the next item will be added. Tail is an index into the array, where the next item is available to be removed. The producer task reads H and T to determine if there is room to add an item; writes the item in at the H index, then updates H. The consumer tasks reads H and T to determine if there is data available, reads data from index T, then updates T. Basically it's a ring buffer accessed by two tasks, and the order of operations (insert, then update H; remove, then update T) ensures that data corruption doesn't occur.
If you have a situation with multiple producers and a single consumer, or a single producer and multiple consumers, you effectively have a resource limitation of some kind, and there's nothing else for it but to use synchronization, since the performance limiter is more likely to be the lone producer/consumer than an OS overhead with the locking mechanism.
But if you have multiple producers AND consumers, it's worth spending the time (in design-space) to see whether you can't get a more coordinated communication mechanism; in a case like this, serializing everything through a single queue definitely makes the efficiency of the queue the central determinant of performance.