2

Load balancing in is a common topic, but there isn't as much said about load balancing out. I may have an application that can prepare 1mm requests/s but is unable to send them all out at that speed because of network restraints.

Is there a concept of an out-balancer? Are there any ready-made solutions out there that can take, say, a csv of 1mm requests and distribute them across nodes and send them?

enter image description here

The "Out Balancer Coordinator" could just be a queue. But that's getting into DIY territory where I'm wondering if something like this already exists.

Edit:

I see a lot of answers that describe queuing and processing pools. We do use queues and processing pools. To increase output we add more nodes to the processing pool. In that way we can scale up and down dynamically (albeit too slow for what we need) to meet throughput needs.

My question must not make much sense, and that's partly what I was wondering- if it made sense.

We use load balancers for traffic coming into the system, and with it don't need to worry about TLS termination, and many millions of connections can be consolidated into tens of thousands- allowing downstream servers to handle requests much quicker with less overhead.

I was looking for a similar service for traffic going out. I'd like to write applications that only need to generate requests, and let an "out-balancer" handle sending those requests as efficiently as possible (TLS and all). Instead of having 100 servers in-the-ready to ensure there is enough compute to make 1mm https requests per second, I could reduce that to 5-10.

Ideally, this "out-balancer" is a managed service somewhere else where I could pay for requests/s and wouldn't need to pay for an always-on server.

It is both a conceptual question as well as a real world question.

micah
  • 195
  • 2
  • 1
    If you treat your requests as external to the publisher, then you can use the same load balancer idiom, no? The load balancer simply accepts a bunch of requests and parcels them out to a pool of resources. – Kristian H Dec 28 '20 at 15:07
  • 1
    What are the lines the point to the blue "internet" icon? Are they separate WAN links? – Kind Contributor Dec 29 '20 at 03:32
  • Also, what are these "requests"? Are they Web Service HTTP Requests? Are you reading from a CSV file, then calling an API for each record? – Kind Contributor Dec 29 '20 at 03:32
  • Do you have a real-world problem with a network bottleneck? Or do you have a conceptual idea about out-balancer and you're trying to think of a scenario where it would be needed? – Kind Contributor Dec 29 '20 at 03:34

4 Answers4

5

Surge Queue

One close component to the "Out Balancer Coordinator" is the surge queue in AWS elastic load balancer. It is configured with SurgeQueueLength that represents he total number of requests (HTTP listener) or connections (TCP listener) that are pending routing to a healthy instance. The maximum size of the queue is 1,024. Additional requests or connections are rejected when the queue is full.

The metric that determines "balancing out" is HealthyHostCount --- the number of healthy instances registered with the load balancer. A newly registered instance is considered healthy after it passes the first health check. Other than "healthy" count metric, the network/bandwidth limit is also the factor, because eventually they all reflect in a higher latency and thus slow response.

Backend service based load balancing

There are similar buffer mechanisms in Google cloud load balancer too, for example, Global Software Load Balancer, and Google Front End. The load balancer finds the nearest GFE location with available capacity, and enables GFE to do backend health check, accept and buffer HTTP requests, and route requests to the nearest VM instance group with available capacity. See this doc and Google SRE chapter for more details.

Autoscaling

No matter whether the "queue" is used or not in balancer, a dependency analysis may still need to guide you toward preemptively implementing auto-scaling, in order to avoid overloading the backends.

lennon310
  • 3,132
  • 6
  • 16
  • 33
2

Request throttling is how we handle a tsunami of outbound traffic. Instead of blasting requests out as fast as you can, pause for some number of milliseconds or seconds between requests.

This technique is definitely DIY territory, but can be as simple as Thread.Sleep(millisecondsToWait) — a one-liner.

This is also used when the destination servers only allow so many requests over a period of time.

If this naive approach still doesn't work, most tech stacks do have code libraries to easily implement a queue system, and even then you might be pausing the thread in between requests

Greg Burghardt
  • 34,276
  • 8
  • 63
  • 114
  • 5
    You seem to be saying that the solution to sending 1 million requests per second is to *not* send 1 million requests per second. – user253751 Dec 28 '20 at 12:49
  • @user253751 - yup. That's what I'm saying. If at all possible, and you can avoid it, don't send 1,000,000 requests as quick as you can. – Greg Burghardt Dec 28 '20 at 14:08
  • if your application involves sending 1 million requests per second, then it does. If your application involves sending 1 million requests in total, it should send them as quick as it can. – user253751 Dec 28 '20 at 14:13
  • 2
    @user253751 - well, send 1,000,000 requests if your application needs it. But if the network or the other system is having trouble handling the load, you either need to increase the capacity of those other systems, or dial back the requests you are making. That's really the only two choices you have. – Greg Burghardt Dec 28 '20 at 15:23
0

Is there an output balancer concept? Sort of.

I would look at it this way: every output is the input to something else. What you are facing here is a capacity question: each processing node and communication link has a transaction processing capacity (transactions per second) which cannot be exceeded. Your mission is to understand the evolution of the various loads the system will encounter over time, determine the maximum loads that will be encountered by each component, and make sure that transactions that cannot be processed immediately have someplace to wait. For example, you don't say how often the 1,000,000 / second transaction rate occurs. If it happens for up to five seconds in one hour, but the rest of the time the rate is zero, that is different than if it averages 10,000 per second with bursts of 1,000,000 per second happening as often as once per minute.

This was the kind of question we faced often in my bank data-processing job in the 1980s when we were transitioning banks from a batch environment to an "online" environment with ATMs and "live" teller terminals. In batch, we collected all the transactions from a branch at the end of the day onto magnetic tape, and then physically transported it to a data center, where the transactions would be processed "after hours" with thousands of similar tapes, while the bank was closed for the night. When designing our ATM/teller/back-end replacement system, we sometimes increased capacity in one node only to find that something downstream in the data flow had to be beefed up, as well, and on and on from the origin of the transaction to its eventual conclusion. Remember the inescapable law of flow: average transaction output rate of any part of the system has to equal the average input rate! (Unless you are allowing input to be ignored).

Earlier, I mentioned the maximum flow rates through processing and communication nodes, and "places to wait" while transactions wait for their turns. In this concept may be included the notion that the user might get a "busy...please wait a few seconds and resubmit" message. Hope this helps.

lennon310
  • 3,132
  • 6
  • 16
  • 33
Mr. Lynch
  • 98
  • 4
  • Definitely interesting. It's just as you describe where we go from near 0 most of the time to millions in a few seconds, then back down to 0. We do use queuing for this. – micah Jan 04 '21 at 04:33
0

You can use some Message Queue (eg. Azure Service Bus) as Out LoadBalancer Co-ordinator. Just blinding put requests, you want to distribute into the queue. This will also decrease the risk of dropping requests and also re-trying is easily possible in this scenario.

Now you just need to have a dynamic number of queue consumers which will be processing out-requests distributed equally. Here again in each box (Out-Balancer Node) you need to have multiple instances of queue consumer as per system configuration.

Glorfindel
  • 3,137
  • 6
  • 25
  • 33