0

I am designing a workflow, and am trying to avoid parallel deployments of the same service. Thus I am looking to have one service that handles both interactive and batch traffic. My main concern is how to ensure that my service can horizontally scale fast enough, with large batch runs, and not interfere with the interactive traffic. Are there any design patterns for this? Additionally we are primarily using AWS technologies, kubernetes, and JVM deployed languages. Additionally I think it is important to know we will have two endpoints with some traffic going through /service/interactive and /service/batch. We could use a few different mechanism to throttle but I think its a bad experience for our batch users to have to retry if we through a 429. We could also use something like a reply-to queue or a 2-way queue for the batch traffic, but how would we scale up our service to handle more traffic if we have defined a fixed dequeue rate. Can we set the dequeue rate at the queue level instead of at each instance of the service, and can that number dynamically change?

Really just looking for any patterns to handle both batch and interactive traffic in one service. Even if we have to have a parallel implementation for interactive/batch traffic how do we scale batch since it all comes at once? The batches could come at different times through out the day, so time based scaling is not an option, plus I have never been a fan of time based scaling as it is brittle.

To answer the first comment: our interactive process will have about 500 transactions per second constant throughout the day, each transaction taking about 3 seconds in the service. Additionally we will see batches of traffic to that interactive service that will kick off about another 500 threads all at once to process through about 100,000 transactions. Thus if one of those batches kicks off while I am already processing 500 transactions It could really affect those "interactive" transactions that we do not want affected by the batch processing. Also note it takes about 90 seconds per instance to spin up additional instances.

Thanks in advance!

Max N
  • 112
  • 8
Brian
  • 109
  • 4
  • Can you give more real-world like examples or use cases for what you call a "large batch process" and an "interactive process"? Ideally with some rough numbers, and some description where you fear those things could interfere with each other? Currently, it is hard to telll where the actual problem is you want to solve. And questions for "generic just-in-case patterns" are based on the wrong assumption there are such patterns. – Doc Brown Sep 15 '21 at 05:20
  • added some additional information, please let me know if you need more information. – Brian Sep 15 '21 at 18:59
  • Your answer mentions some numbers now, but still no real-world use case or example. And still you are asking generically for patterns, which is not a clearly focussed, answerable problem statement. I am actually unsure if you are trying to optimize something in your system "just in case", or if you are really experiencing failures in your batch processes caused by those interactive processes (or maybe vice versa). – Doc Brown Sep 15 '21 at 20:27
  • ... FWIW, assuming you are talking about write transactions, (not just read-only batches, for example, as part of an ETL process), it might be a good idea trying to design the transactions in those batches in a repeatable, idempotent manner, so if parts of the transactions fail because they interfered with another process, you might be able to simply rerun the batch to let it process the missing transactions. But if this might work really depends on what those processes actually *do*, what you forgot to tell us. – Doc Brown Sep 15 '21 at 20:33
  • @DocBrown, I am not sure how this is not a real world use case. I am asking for a solution to handle two types of traffic in one application. There will be a constant flow of requests for calculations that take about 3 seconds. There will also be batches of traffic randomly throughout the day that request those same calculations. I am trying to determine the best way to ensure the batch requests do not affect the performance of the interactive requests, without making a duplicate deployment. As for your second comment, the batch workload is also synchronous, pub/sub is not an option. – Brian Sep 16 '21 at 11:56
  • Maybe this implied her but can you explain what kinds of transactions are we talking about? That is, what is the percentage of reads versus writes? – JimmyJames Sep 16 '21 at 15:12
  • The the transactions are more RPC. Basically some of the data we need is input to our application, and some keys are provided... Given the keys we collect the rest of the data that is needed from other services all in parallel in async threads (everything that can be done in parallel is done in parallel). Once we have all of the data we need, we perform calculations. We store all of the results and return a result to the client. Not sure if this answers your question, but it isnt exactly a read or a write its RPC :) – Brian Sep 16 '21 at 15:27
  • I'd say the concept of RPC is orthogonal to read/write but it doesn't help to understand the situation. Have you or are you considering creating a different kind of interface for the batch process that would allow you to request a batch or part of a batch in one request? – JimmyJames Sep 16 '21 at 16:05
  • so I will have different endpoints for each process/interactive and process/batch but the logic that needs to execute is exactly the same between the two. Again I am really just looking for patterns for ensuring traffic from resource does not impact traffic from another resource (even though they do the exact same thing). Basically my services cant scale fast enough if there is a huge batch. I know I can use queues or throttling, but just looking to see if there are other alternatives. Thanks again in advance for all the conversation. Really trying to provide all the details needed. – Brian Sep 16 '21 at 17:04

1 Answers1

1

Based on the information you've provided there's one main approach that I would recommend and then there's an incremental design change to that which could be useful.

The main thing would be to use dedicated instance(s) for your batch processes. Batch performance is measured in throughput. I would not expect an initial delay of 90 seconds before the beginning of a batch of 100,000 requests to be an issue and prevent the batch processing from interfering with the interactive service resources. It also allows you to shut down the instance(s) once the batch is complete.

There's a huge caveat here: You have shared dependencies between these two types of uses. In my experience, this is the more difficult problem to manage. If you need to manage the volume of calls against those shared resources, that's a different kind of problem that I will not address here. I'm assuming the dependencies in this case can handle whatever you might throw at them.

The tweak to this solution is to move the batch process to the service container/pod itself. That is, instead of running the batch on some other machine that is calling services on the dedicated container, the batch is now co-located with the service code in a container/pod. Whether you want to continue to call the service within the local network of the pod or actually execute the underlying service code directly from the batch process is up to you.

The big benefit here is that now there is no more coordination between the batch process and the container management. When you run the batch, it spins up the container/pod and does it's work. When the batch completes, the container/pod is removed. If you don't co-locate, you can set up kubernetes so there is a minimum of 0 instances required for the batch endpoint. This will give you a similar result except that the instance will continue to run for some time after the batch has exited.

JimmyJames
  • 24,682
  • 2
  • 50
  • 92