I have not previously designed a nontrivial HTTP API, so I apologize if anything here is textbook stuff. I wasn't able to find anything going into this in much detail when I looked.
Let's say I have an API where you can upload some data and then request a potentially long-running operation be performed on this data. To take an example, let's say you can upload a tarball containing some source code, request that it be compiled and run, and then examine the output.
You could design the API like so. (You can imagine that all the paths below start with something like /session/<id>/ if you like, and obviously I'm leaving out everything that's not directly relevant to the discussion.)
/source
- PUT: Upload tarball of source
/compile
- POST: Request that uploaded tarball be compiled
/compile/status
- GET: Get status of compilation (not started / running / succeeded / failed)
/compile/log
- GET: Show current contents of compiler logfile
/input
- PUT: Upload file for compiled program to take as input
/run
- POST: Request that compiled program be run
/run/status
- GET: Get status of run (not run / running / succeeded / failed)
/run/output
- GET: Get (possibly partial) output of run
There are some potential issues with this design that come down to actions on one endpoint invalidating the results of other endpoints. For example:
- If you have already done a run and upload a new input file using the /input endpoint, this invalidates any previously-obtained values of the /run/status and /run/output endpoints.
- If you have already done a compilation and upload new source, this invalidates any previously-obtained values of /compile/status, /compile/log, and everything under /run
A few potential solutions:
- Send headers with absolutely everything saying it's not allowed to be cached. (Downsides: you lose the ability to cache stuff you'd like to cache; apparently not everything actually obeys these headers, and stuff can get stuck in cache at weird places along the pipeline.)
- I saw one person suggest that you just shouldn't use HTTP GET at all in your API and should make everything POST instead. (Downsides: makes API less intuitive to use; makes you add extra endpoints if you previously had something that would have had both GET and POST methods)
- Redesign API so that everything that can be invalidated by an endpoint is part of that endpoint. If anything can be invalidated by multiple endpoints, combine those into a single endpoint. (Downside: seems like in the long run you likely end up with a single endpoint that you just send extremely complex commands to)
Questions:
- Are any of the above solutions used frequently? Do people have regrets about using them?
- Are there downsides to any of these approaches that I didn't mention above?
- Is anything written about best practices in this situation?
- Are there other options I haven't considered?