0

I've lately been very interested in reactive microservice design with streaming and event-driven architectures.

When one writes (i.e., manages) services, this paradigm works extremely well by simply responding to published events on topics.

However, we don't always control the implementation of all the services with which we interact. For example, consider a commercial service which exposes a black-box endpoint which does some asynchronous work and returns an ID for that job. It also exposes an endpoint for retrieving the status of that job by ID. For sake of simplicity, we'll assume these two endpoints and a "get results" endpoint are the only exposed endpoints.

In this case, we are left in a situation where we must necessarily poll the service to check the status of the work. Are there established patterns for doing this? I'd guess a supervisor which polls on a timer and publishes the job status might be a "fine" approach, but I'm wondering if there are other battle-tested approaches.

erip
  • 241
  • 1
  • 6
  • 1
    My first instinct here is to redesign so you don't care about the status of the work. Let the work get back to you when it gets back to you. Don't wait for it to have a unique identifier. Give it one from the start and pass it along. – candied_orange Mar 17 '18 at 15:53
  • I think I understand, but I think the trick comes from a client-perspective... someone will have to poll eventually, right? Polling on a client introduces a potential for a denial-of-service. Polling on the server introduces some potential for blocking. – erip Mar 17 '18 at 15:55
  • Callbacks and promises. Or polling periodically. Any service can be DOS'd; polling is a *client* mechanism. The server already knows when the service has been completed. – Robert Harvey Mar 17 '18 at 15:55
  • If this is about polling something you don't have control of I've already provided an answer [here](https://softwareengineering.stackexchange.com/a/349391/131624). – candied_orange Mar 17 '18 at 15:57
  • 1
    Polling is often the very first idea that comes to mind. After a while we realise that the polling is ineficient since most of time it gets the very same result: *nothing new*. If possible, is preferible to poll only when there's an evidence of changes (other events) or, easier, when someone just want to know the status. – Laiv Mar 17 '18 at 15:59
  • @CandiedOrange Indeed, this falls into your *U block and I think my proposed solution matches yours... My question is _is there a better way_. – erip Mar 17 '18 at 16:10
  • 2
    If I couldn't redesign the black box then "a supervisor which polls" is exactly what I'd do. I'd put it as close to the black box as I could to avoid the unneeded chatter over the network. – candied_orange Mar 17 '18 at 16:10
  • How "blackboxed" is the server microservice? Can you somehow add some code to its source? – Constantin Galbenu Mar 17 '18 at 16:19
  • In that case, since we would be redirecting the "hit" to an intermediate. Make sure that intermediate is very good at doing IO. NodeJS is excelent for kind of jobs since reactive/synchronous programming is natural to it. And It's fast at doing the job. – Laiv Mar 17 '18 at 16:28
  • @ConstantinGalbenu Completely closed source. It's commercial. – erip Mar 17 '18 at 17:25
  • @erip if it weren't closed you could have use datagram packets sent on the network when the async job is done. Those packets would immediately trigger the polling for new data and on some specific interval. – Constantin Galbenu Mar 17 '18 at 17:29
  • @erip if it is on your servers you could tail its oplog to see when new data could be available. – Constantin Galbenu Mar 17 '18 at 17:32
  • @ConstantinGalbenu That's a cute idea... forcing it to become a streaming app. :P – erip Mar 17 '18 at 17:35
  • @erip stream-isch :) cute? I expected for you to say genial :) – Constantin Galbenu Mar 17 '18 at 17:42
  • It's not helpful to downvote without explaining why. – erip Mar 17 '18 at 18:55

1 Answers1

2

Guess it's a bit late but here is what I would do : I would put a wrapper around your part and the blackbox.

This wrapper will handle the creation of the jobs and the pulling. You have to make it qo only him can pull on that service. The wrapper will publish event on each finish jobs.

To avoid DOS, you should have a limit of how many jobs you can ask to be processed in the same time. If you don't, come up with a reasonable numberand have your wrapper "queing" or refusing job if the number of jobs running in the black box is already at the defined limit. And by "queing", if you choose that, I mean that your wrapper will have it's on queue of job to process when the black box is already full. That queue will get processed every time there is free space on the black box.

The gist of it is basically that if you can't make some subsystem that is compatible with your system (legacy, foreign service,...), you wrap in something that make it so, ans nothing else than the wrapper can use the underling subsystem.

Walfrat
  • 3,456
  • 13
  • 26
  • I had completely forgotten about this question and this is exactly what I did - I created a proxy that basically acts as a publishing mechanism for the black box with some backpressure. :-) Thanks for the answer! – erip May 27 '20 at 11:27