Long-Running Calls

Question: When building RESTful APIs, do you know of any standard patterns for handling long running API calls? Our decision process may take as much as a couple of minutes.

My Answer: I know of four possible approaches:

Long-running API call: In this approach, you just build an API call that will run for a long time (probably blocking on external calls to other systems). This has a bunch of problems the biggest two are that you have to deal with timeouts from various API libraries, and that it blocks up a thread on the server for a long period of time (threads are a scarce resource). The advantage of this approach is that it is REALLY easy to build.
Polling: In this approach one API call submits the input and returns some sort of key, another API call passes in this key and gets a response (or error message) if it is ready or a not-ready-yet indicator if the process is not yet complete. An enhancement to this has the second API call return an estimate of how much longer the process will take which can be used to optimize the rate of polling by the client. The biggest problem here is the waste of effort involved in polling; a second weakness is that the response will be delayed by (on average) half the poll frequency. The advantage is that it works and in a straightforward fashion.
Long Polling: This approach is like polling, except that the polling call (and perhaps the original call) don't return immediately; instead they block for up to some fixed max time (like 30 seconds). It avoids the delay from polling at the cost of blocking threads like the long-running API and more complexity on both client and server. I don't recommend this approach and have never used it.
Callback aka "Webhook": In this approach the caller of the API that will run for a long time provides a way for the server to "call back" when the process is completed -- usually the URL of a REST API it can invoke with some unique key embedded to uniquely identify the call. When the server-side process is complete, it invokes the webhook and passes the result. A disadvantage is that this requires the client to provide an endpoint where it can receive calls (that may be difficult for some clients). Another issue is that there are more different services with different authentication mechanisms and so forth, so it adds complexity. The architecture sometimes makes security folks nervous, or else gets extra layers of security added which make things more complex. There's a minor issue of how to handle errors on the callback, but this is easily handled. Finally, it becomes difficult to handle changes in the format of the response data. Whew... that sounded like a lot of issues. The positives are pretty good: there are no wasted resources (threads, etc.) and the response can be immediate. It can even scale to multiple responses (like partial answers or updates). There may be a variant of this where the response is done by subscribing to a pub-sub system rather than registering a web hook.

As for my opinion about what to use: (1) is just a cop-out: pretending the service isn't overly long-running. (2) is by far the most common solution and I think it is often the best choice. I don't recommend (3) but there may be niches where it fits well. And (4) is really interesting: there's a cost (the complexity and clients must be able to receive calls), but IF you do a lot of this so that that complexity is shared by a lot of different services then it might be a good choice.

Posted Thu 20 April 2017 by mcherm in Programming