42

Say you were to develop a REST API that provides access to a set of complex, long-running, operations.

The typical paradigm for an API like this (as I understand it) usually involves (the client) making a request to the server, asking it to perform a given long-running operation. The server will respond with a 202 Accepted message indicating that the request has been received and, with it, provides the location of the result, where it will eventually become available. The client, from then on, polls this location until the result of the long-running task becomes available.

This much makes sense. However, imagine now that these long-running tasks are more complex. Imagine that, during the execution of a task, a specific resource, file, network, etc. becomes unavailable and, in order to proceed, the API must "ask" the client whether the job should continue anyway or whether the task should end here.

How would this requirement change the original paradigm? Instead of having some result located at the given location, would you optionally return some notion of a "question" that needs to be posted back to the server in order to continue?

Assume for the purposes of this question that you can't encode some kind of blanket "continue if error" parameter in the original request and that these questions must be addressed on a case-by-case basis, as they arise, if they arise.

Maybe I'm thinking about this problem the wrong way? I'd be curious to hear how a paradigm like this is usually accomplished, or if it's as simple as, "yeah, just respond with the prompt, post back the result to the server, and continue to query the original location."

I would really appreciate any help I could get.

meci
  • 531
  • 5
  • 5
  • 14
    For something much more complex than start-job/monitor-job-status I wouldn't go with REST. I'd use websockets. If without websockets I'd implement a long-polling (comet) endpoint so I can get real-time (within milliseconds) update from the server. – slebetman Aug 02 '20 at 00:30
  • That is an excellent point and it's a direction I'm definitely considering. I think a websocket approach makes a lot of sense given the complexity. However, I wanted to compare it with a REST approach first to better understand how something like this might be implemented. I really like @hans-martin-mosner's approach of modeling jobs as resources. So, I may have the client API initialize the request and return a "handle" to the job via some kind of proxy job object. The job object could then directly communicate with the server over websockets and raise "server-prompt" events when they occur. – meci Aug 02 '20 at 03:02
  • 1
    Unfortunately I don't have the rep to answer, but... You could also register a URL for a callback when the processing is done. So, make the initial request, including a callback URL in the request. When the processing is done, the service running the process can hit the callback URL it was given, with the result of the operation. This keeps everything as a REST API, without any setup required for websockets. Not sure how 'pure' REST this is, but it's good enough for Microsoft https://docs.microsoft.com/en-us/partner-center/develop/partner-center-webhook-events – J Lewis Aug 03 '20 at 09:19
  • Yet another possibility would be the use of push notifications. This would even allow the user to close the window and still receive notifications of completion/interaction required. – jcaron Aug 04 '20 at 10:52

1 Answers1

79

For long-running operations, it often helps to model the active job as a REST resource with its own structure and/or sub-resources.

For example, starting a job may return a result such as

202 Accepted
Location: https://example.com/jobs/123

At that URL, the client will get a structure such as

{
  "status":"running"
}

as long as the job is running,

{
  "status":"finished",
  "result":"https://example.com/jobs/123/result"
}

when it is completed and a result is available, or

{
  "status":"interaction-required",
  "prompt":"xyz service not available, please restart it or cancel job.",
  "continue":"https://example.com/jobs/123/continue/<token>",
  "cancel":"https://example.com/jobs/123/cancel"
}

to interact with the user. The job would continue (retrying xyz access) after the client posts something to the continue URL (which would include an idempotency token as suggested by @NPSF3000 to prevent accidentally continuing the next interaction), or would be cancelled by posting something to the cancel URL. Another option for cancellation would be a DELETE verb on the job URL. The cancel link could also be made part of the initial job structure to communicate that the job can be cancelled at any time if the application supports that.

The details about which kinds of interaction are possible and how they are presented in the client would need to be designed based on the specific needs of these jobs, but the main thing is that the operation start does not just return the location of the result but of a reified job object that can be queried and manipulated.

Hans-Martin Mosner
  • 14,638
  • 1
  • 27
  • 35
  • 4
    Excellent answer, which should get more focus on the opening paragraph's core: modeling your API might require you to have a different model than your business process - you might consider the entities "operation" and "result" as your business operations, but your rest API needs to model "job" as an entity. – Avner Shahar-Kashtan Aug 01 '20 at 15:31
  • 2
    More information in [How to manage state in REST](https://stackoverflow.com/questions/2641901/how-to-manage-state-in-rest?noredirect=1&lq=1) – HenryM Aug 01 '20 at 17:56
  • @HenryM thanks for the reference, indeed a good read. – Hans-Martin Mosner Aug 01 '20 at 18:37
  • Why would `result` be another URL, and not just the result? – BlueRaja - Danny Pflughoeft Aug 02 '20 at 00:04
  • 1
    @BlueRaja-DannyPflughoeft that might make sense when you want to clean up jobs but keep results available, separately. You could also reuse a previously computed result, in some applications. You might also want to serve job statistics, or use different security policies for the result and the job itself. – WorldSEnder Aug 02 '20 at 00:23
  • 5
    Adding some sort of `idempotency token` to the continue could be a good idea for a production system. – NPSF3000 Aug 02 '20 at 00:33
  • 1
    This is such a clean, easy and elegant solution to implement. I was expecting something extremelly more complicated. Just one small nitpick: where's the cancel link? One solution I see could be to implement an `"actions": { ... }` object or similar, where it could be `"actions": {"continue": "...", "cancel": " ... "}`. In the future, if more actions are required, you can just add them there. And the status could be changed to `"action-required"`. An example of an extra action is asking a file upload for a task that failed to fetch a remote file. Or an option to cancel or retry. – Ismael Miguel Aug 03 '20 at 10:57
  • @IsmaelMiguel Yes, these are details that you will need to decide, there's a lot of room for improvement. Regarding the cancel action, I'm not fully sure which option is best. For cancellation of the complete job, I see different options: one would be a "cancel" link, another would be more RESTful by patching the "status" field of the job. This would also enable interrupting a long-running job when there is no current interaction. At the end of the day, it needs to work for your application, so you got to decide how you implement it. – Hans-Martin Mosner Aug 03 '20 at 12:19
  • @Hans-MartinMosner You are right. I still think that a cancel *example* could improve the last example. Something like ` "cancel":"https://example.com/jobs/123/cancel"` could be enough *for an example*. What do you think? – Ismael Miguel Aug 03 '20 at 12:35