I am designing a workflow, and am trying to avoid parallel deployments of the same service. Thus I am looking to have one service that handles both interactive and batch traffic. My main concern is how to ensure that my service can horizontally scale fast enough, with large batch runs, and not interfere with the interactive traffic. Are there any design patterns for this? Additionally we are primarily using AWS technologies, kubernetes, and JVM deployed languages. Additionally I think it is important to know we will have two endpoints with some traffic going through /service/interactive and /service/batch. We could use a few different mechanism to throttle but I think its a bad experience for our batch users to have to retry if we through a 429. We could also use something like a reply-to queue or a 2-way queue for the batch traffic, but how would we scale up our service to handle more traffic if we have defined a fixed dequeue rate. Can we set the dequeue rate at the queue level instead of at each instance of the service, and can that number dynamically change?
Really just looking for any patterns to handle both batch and interactive traffic in one service. Even if we have to have a parallel implementation for interactive/batch traffic how do we scale batch since it all comes at once? The batches could come at different times through out the day, so time based scaling is not an option, plus I have never been a fan of time based scaling as it is brittle.
To answer the first comment: our interactive process will have about 500 transactions per second constant throughout the day, each transaction taking about 3 seconds in the service. Additionally we will see batches of traffic to that interactive service that will kick off about another 500 threads all at once to process through about 100,000 transactions. Thus if one of those batches kicks off while I am already processing 500 transactions It could really affect those "interactive" transactions that we do not want affected by the batch processing. Also note it takes about 90 seconds per instance to spin up additional instances.
Thanks in advance!