I'm having some problems to think on a solution to control the amount of requests by minute to a external system on a micro services environment on Kubernetes.
The scenario
This external system is an e-mail marketing application (called Responsys) that permits only some amount of requests by minute for each login. Some types of e-mails use 2 requests and some types of e-mails use just one request*.
Actually, each system that needs to send an e-mail send a message to a RabbitMQ queue and one of our micro service is responsable to consume this message, read the informations and communicate with Responsys obeying a 40 requests per minute limitation.
The actual solution
The actual working version of this integration get 20 messages by minute from the queue using a simple scheduled process. Why 20? In the worst case, this 20 e-mails will consume 2 requests. Each e-mail is processed asynchronous, so this 20 e-mails will communicate with Responsys at the same time. The e-mails that could not be processed (Responsys can throw some error), we save on a database table to be analyzed later.
This works pretty good actually, even it's not optimized because some types of e-mails uses only one request. But there is a problem on that solution that can harm our limit of requests.
The problem
Kubernetes can understand at some moment, using his performance algorithms, that one more micro service instance (that integrates with Responsys) is necessary. If this happens, this will break our request limitation, because will be two (or more) instances reading messages from the queue and trying the send e-mails through Responsys, surpassing the 40 requests per minute.
I had the idea to setup the micro service on Kubernetes to not create any replica of this micro service, assuring only one instance, because this micro service is pretty simple and specialized. I don't know how exactly do that yet, but seems very simple reading the Kubernetes documentation, but my colleges don't like the idea, because may exists some weird error scenario where two instances could exists.
So, we are trying to think on a solution besides the micro service instance, using some kind of "ticket system" read from a cache (Redis) shared by any number of micro service instances. This seems a heavy solution for a simple problem, so I would like to have some help to find another alternative for that.
* I simplified the problem, because the requests limitation by minute differ from two different endpoints. One of them permits 200 requests per minute and another 40 per minute. I will limit that number of requests per minute using the limit from the most restrictive endpoint.