Let's say I want to program a parallelized web crawler which has a shared FIFO queue (multi consumer/multi producer). The queue only contains URLs. How do I detect the end of the queue?
A worker process is always consumer and producer at the same time because it takes an URL from the queue, crawls it and adds any found URLs to the queue. I think there is no way to have separate processes for consumer and producer tasks in this scenario.
Since the amount of input data is unknown but not infinite it's impossible to use a 'poison pill' as sentinel in the queue, right?
Also, the queue size is not a reliable way to find out if the queue is empty (because of multiple consumers/producers).
Please enlighten me :-)