Invalidation of cached GET results by other endpoints

Question

I have not previously designed a nontrivial HTTP API, so I apologize if anything here is textbook stuff. I wasn't able to find anything going into this in much detail when I looked.

Let's say I have an API where you can upload some data and then request a potentially long-running operation be performed on this data. To take an example, let's say you can upload a tarball containing some source code, request that it be compiled and run, and then examine the output.

You could design the API like so. (You can imagine that all the paths below start with something like /session/<id>/ if you like, and obviously I'm leaving out everything that's not directly relevant to the discussion.)

/source

PUT: Upload tarball of source

/compile

POST: Request that uploaded tarball be compiled

/compile/status

GET: Get status of compilation (not started / running / succeeded / failed)

/compile/log

GET: Show current contents of compiler logfile

/input

PUT: Upload file for compiled program to take as input

/run

POST: Request that compiled program be run

/run/status

GET: Get status of run (not run / running / succeeded / failed)

/run/output

GET: Get (possibly partial) output of run

There are some potential issues with this design that come down to actions on one endpoint invalidating the results of other endpoints. For example:

If you have already done a run and upload a new input file using the /input endpoint, this invalidates any previously-obtained values of the /run/status and /run/output endpoints.
If you have already done a compilation and upload new source, this invalidates any previously-obtained values of /compile/status, /compile/log, and everything under /run

A few potential solutions:

Send headers with absolutely everything saying it's not allowed to be cached. (Downsides: you lose the ability to cache stuff you'd like to cache; apparently not everything actually obeys these headers, and stuff can get stuck in cache at weird places along the pipeline.)
I saw one person suggest that you just shouldn't use HTTP GET at all in your API and should make everything POST instead. (Downsides: makes API less intuitive to use; makes you add extra endpoints if you previously had something that would have had both GET and POST methods)
Redesign API so that everything that can be invalidated by an endpoint is part of that endpoint. If anything can be invalidated by multiple endpoints, combine those into a single endpoint. (Downside: seems like in the long run you likely end up with a single endpoint that you just send extremely complex commands to)

Questions:

Are any of the above solutions used frequently? Do people have regrets about using them?
Are there downsides to any of these approaches that I didn't mention above?
Is anything written about best practices in this situation?
Are there other options I haven't considered?

This doesn't address your question, but allowing people to upload arbitrary files to be compiled and run seems like a major security vulnerability. — David Pement, Jul 25 '22 at 17:50
@DavidPement: Well obviously either you'd restrict it to trusted users and/or sandbox it, but it's not unusual functionality. There are at least a hundred sites online that provide this sort of functionality, e.g. repl.it, wandbox.org, godbolt.org, leetcode.com, ... — Daniel McLaury, Jul 25 '22 at 20:07

score 2 · Accepted Answer · answered Jul 26 '22 at 10:29

I would re-design the API, but in such a way that you don't need to invalidate old results:

/source POST: Upload tarball of source code; Returns location that can be used to work with this upload (/source/<source-id>)

/source/<source-id>/compile POST: Start compiling

/source/<source-id>/compile GET: Get result of compilation

/source/<source-id>/compile/log GET: Get logs of compilation

/source/<source-id>/run POST: Upload input file and run program wit it. Returns location where results can be obtained (/source/<source-id>/run/<run-id>)

/source/<source-id>/run/<run-id> GET: Get status of the execution

/source/<source-id>/run/<run-id>/output GET: Get output of the execution

By handing out new source and run id's, you sidestep the problem of having to invalidate caches for the previously used id. The old versions can be deleted from the server based on whatever retention policy you decide upon.

I'm accepting this as it's a reasonable answer to the question I actually asked. — Daniel McLaury, Jul 27 '22 at 14:36
(Unfortunately in my actual use case it's not actually just a source code tarball and an input file, but actually about a dozen pieces that can be mixed and matched, which kind of makes this approach less viable. Maybe I will ask a separate question.) — Daniel McLaury, Jul 27 '22 at 14:37

Invalidation of cached GET results by other endpoints

1 Answers1