How to best design a job queue with constraints?

Question

Consider the following situation:

You have a program which creates numerous 'jobs' that need to be processed and places them into a queue.
You have other worker programs which grab the next 'job' in line so they can process that job.
Each job has a category.
There could be any number of categories.
Two jobs that have the same category cannot be processed at the same time by separate workers.
A worker can process one job at a time.

A traditional queue would not work in this situation because there is a chance that multiple jobs of the same category would be processed simultaneously, which is not allowed.

You could have the worker check the job it grabs and see if that jobs category has another worker that is currently processing against it, and if so re-submit the job into the queue to be processed at a later time. This seems like an inefficient way of solving this problem. Are there any data structures or design patterns that can solve this issue?

If you need further clarification, please let me know.

What's the reason behind the restriction according the topic? Maybe you can change that. — Andy, Apr 22 '16 at 17:59
@Andy I will respond to you here since I'm not sure how to integrate it into the question. It's really just a hardware restriction. Each category has a specific hardware resource it must interact with (connection on the same port), therefore two jobs cannot be interacting with the same device at the same time. — merp, Apr 22 '16 at 18:09
@merp Did you find something? I'm looking for something very similar, I need the jobs to declare shared/exclusive locks and/or semaphores. Your case is similar except you only need exclusive locks. — Guillaume86, May 02 '19 at 18:54

Ewan · Answer 1 · 2016-04-22T18:17:06.593

There are two parts to this problem.

One: the unknown list of possible categories.

Two: interprocess communication between the workers to prevent two jobs of the same category being processed simultaiously.

If you had a known list of categories you can have one queue and one worker per category.

With unknown categories, you can still have a queue per category, but having a queue worker per category requires you to monitor all queues and fire up new workers when a new category appears.

This can be achieved with a 'master' worker which hands out jobs

All jobs go in the 'master' queue.

category worker creates a private queue and registers with master as available for work.

master worker picks up jobs, checks category, checks for available workers and assigns the job to one of them by putting it in thier private queue.

The master can keep track of the category assigned to the worker.

master picks up next job, its the same category and the worker is still busy so it puts the jobs in a category specific holding queue

master gets next job, its a new category so it assigns it to another category worker.

Category worker completes job and reregisters for work

master checks holding queues and master queue next job and assigns to available category workers.

If a category worker crashes during a job it wont reregister. So the master can have some timeout logic where it will give up and start assigning the categories to another worker.

You also have to be carefull to only have a single master worker at any given time. This nescitates an exclusive lock on the master queue of some kind

score 2 · Answer 2 · answered Apr 23 '16 at 00:51

The downside of your inefficient proposal comes when there are 2 jobs for a category. Now one is working..and everyone else is doing a busy wait.

You can make this good enough by having workers scan through the queue for a next doable task, then return everything but that to the queue if they find one. Alternately return everything, then sleep. If the sleep has some randomness and exponential backoff, the "busy wait" won't be too busy.

For a more efficient approach, a worker that grabs a job is responsible for doing that category until nothing else from that category is left to do. Then you go back to being a regular worker. But there are some subtleties.

To see those, let's assume we can try and release locks (both non-blocking) and our queue operations are add, get and is_empty with get being a block and wait operation.

We'll assume a general queue, and then for each category a queue and a lock.

Here is the basic worker flow.

while obj = get from main queue:
    if try category lock:
        do obj job
        do_whole_category_queue()
    else:
        add obj to category queue
        if try category lock:
            do_whole_category_queue()

where

procedure do_whole_category_queue
    while not category queue is_empty:
        obj = get from category queue
        do obj job
    release category lock
    if not is_empty category queue:
        if try category lock:
            do_whole_category_queue()

Note the careful locking handshake here. The worker tests the lock, and if it is locked adds the job to the queue. But then you have to test the lock again to verify that it is still someone else's responsibility to actually do the job. Just in case the category worker finished while you were doing the queue manipulation.

(This is the sort of locking detail that people often screw up. The error will be impossible to reproduce reliably, but screws up randomly and undebuggably in production...)

If can be any number of categories Is going to be hard to scale it l. Overall if it's in a disteibuted environment. Plus In order to avoid 2 workers from different runtimes consuming the same job you wont be able to check categorty queues from all other runtimes. I would go for a master queue/worker as a servíce dispatching task or distributed MasterQueues with MasterWork + horizontal cache. Even the first approach (as a servíce) I would use a cache. Then scalability remains on how to make this cache balanced and synchronized. Blocking queue is not a problem if you set a timeout to requeue — Laiv, Apr 25 '16 at 19:53
Queues are fairly cheap. Ones that are likely to remain short are even cheaper. If you've got under, say, 100k categories then this is not going to be an issue. In a comment categories map to hardware resources which are open connections on specific ports, so we're unlikely to exceed that kind of limit. — btilly, Apr 25 '16 at 23:17

How to best design a job queue with constraints?

2 Answers2