Re-architecting CPU intensive Node application to handle multiple users

Question

A few years ago, I wrote an application that allowed users to upload a file (it's a very specific file type) to a server. It then sends instructions to the server on how to visualise the data, and the server presents the data as either a dot plot graph, or a histogram graph. The PNG is constructed on the server side, and is streamed back to the user. Here's an example of 2 graphs produced by my application:

The technology I used was NodeJS and MongoDB. I'm now revisiting the application as I've received many requests from users to add new features, and some complaints about how slow it is to use.

There are a few issues with the stack. When a user wants to get a graph, a HTTP GET request is made to the server. Within that GET request, there is a lot of heavy computational work. The data is looped through, and the graph is constructed by figuring out the position of each data point on the graph. The most computationally heavy part is figuring out the position on the data-point on the graph. I won't go into it here, but each data point is run through a long formula to figure out the correct position (and for some files there can be over 1,000,000 data points). While all of this is happening (within the GET request), because I'm using Node, all other requests are now blocked (so my application can only handle one request at a time).

I'm looking for suggestions on how to re-architecture it to handle multiple requests concurrently and to draw the graphs quicker (presumably this is all about increasing CPU power). One high-level approach I'm thinking of is:

User makes a request for a graph
Node takes the request and notifies an AWS Lambda function (and HTTP request ends here) which does all the heavy computation work and produces a PNG
PNG is streamed back to the user (this I'm not sure how to do).

All of this has to happen within a few seconds - the user is waiting for a graph to appear. I'm not sure if my suggested approach would be very user friendly as AWS Lambda needs time to start-up, and the user may be waiting around a long time for the graph.

Any suggestions would be greatly appreciated.

Off the top of my head, I would look into using database streaming and sockets — W.K.S, Dec 26 '20 at 17:09
What about doing the rendering fully locally at the client side? With HTML5 and Javascript how it looks today in all relevant browsers, wouldn't it be possible to produce the images without any upload and processing on the server? — Doc Brown, Dec 26 '20 at 23:26
@DocBrown That was actually how I initially did it and it worked for small files but many users have much larger ones, in those cases, there is simply too much data for the browser to handle, and it tended to crash — Mark, Dec 27 '20 at 07:55
@Mark: who are the users of this system / what kind of assumptions can you make for their clients (hardware & browser)? Maybe you simply have to optimize that approach? — Doc Brown, Dec 27 '20 at 08:48
@DocBrown The are medical researchers and doctors. Problem is that these files are growing bigger quickly as the technology improves - quicker than the improvement in browsers. Also the software works on every device at the moment - something I couldnt guarantee with a frontend approach. — Mark, Dec 27 '20 at 09:43
@Mark: if you need to have this work with very different clients, including "small" devices like smart phones or thin clients, using a server-based approach is probably the only option which avoids the necessity of creating individual applications for each device. But then your users will have to live with some latency, that's unavoidable. — Doc Brown, Dec 27 '20 at 09:51
I think the issue is that you do not use NodeJS for CPU intensive processes. It is single threaded! — oshaiken, Jan 08 '21 at 16:12
In this case you can only scale horizontally, because adding more CPU will not yeld expected results. — oshaiken, Jan 08 '21 at 23:31
You should be able to handle this with a [worker thread](https://nodejs.org/api/worker_threads.html). Using [Async Local Storage](https://www.freecodecamp.org/news/async-local-storage-nodejs/) might provide some benefit as well (depending on how things are setup). Alternatively, consider the possibility of doing it all client side in a [web worker](https://developer.mozilla.org/en-US/docs/Web/API/Web_Workers_API/Using_web_workers). — RaspberryK, Jan 08 '21 at 21:22

score 4 · Accepted Answer · answered Jan 10 '21 at 15:05

Nodejs is good for IO intensive tasks but may not be good for CPU intensive tasks. The reason is that it runs on the event loop, which runs on a single thread.

There may be several possible optimizations on the approach to address this issue:

One server

As others mentioned, you'd definitely run the CPU heavy task separately from request handling part.

Cache the request image if a group of graphs are requested very often.
Run the image computation job in another process with the child_process() module. You may create multiple processes but you need to benchmark the "best" number. And if you have multiple processes, you may use a pool to handle the child processes by assigning the task to the process.
To make each process much more productive on multi-core system, webworker-threads provides an asynchronous API for CPU-bound tasks that's missing in Nodejs, which could be the next module you'd consider make use of.
To accelerate the image computations, you may consider GPU library to parallelize the computation.
You may further separate the tasks by splitting your application into two --- one (service_1) for handling the request and polling for the response image, and another one (service_2) is just handling the computations. Using AWS Lambda is fine but if you handle the computations inside the lambda handler, it may be limited by resources (RAM, CPU). An alternative is to use AWS SQS, and service_2 is picking the request message from the queue and compute the image. After the computation is done, the image data can be stored in a database (Redis for example, with an expire strategy setting), then send notifications with SQS again. Service_1 subscribes the specific topic on the SQS (or polls the messages from the queue periodically if the RPS is not too high), load the image data from database, and render the image.

Multiple servers

It is worth profiling on a single server with a load test, to better understand the concurrent request it can handles. If you have to handle more requests than your benchmark indicates, you then have to add more instances and deploy your application to all the servers, with a load balancer to forward the request to one server.

score 2 · Answer 2 · answered Dec 26 '20 at 19:20

You definitely want to separate the computationally intensive part into it's own service. I would even separate it from the part that generates the PNG image. The important part here is to make the process asynchronous. While the numbers are being crunched, serve a PNG image that basically says "please wait." Or return an HTTP message with a 202 Accepted response.

NodeJS might not be the best technology choice for the CPU intensive parts. Since The process is visual in nature, consider choosing a technology that scales well with additional CPUs, or even GPUs if available. Make spinning up new threads to handle multiple users as quick as possible.

Once the raw data for the graph is ready, push it to another service that creates the PNG file.

When in doubt, separate the I/O and CPU intensive tasks from the code that returns data to the user. Plan for eventual consistency. Choose the technology that best fits the task, and can be scaled well.

Thanks. Actually I ran some tests on the computationally intensive part between Java, NodeJS and Python. Python performed best but NodeJS actually performed better than Java. Could you point to a project on Github that is architected like you suggest? — Mark, Dec 27 '20 at 08:00

score 2 · Answer 3 · answered Dec 26 '20 at 22:26

The graph should be rendered in a separate process. Your response to the GET request should now look like this:

If the requested graph is in the cache, serve that cached graph image.
Otherwise, check if the job queue contains a job to generate that image. If not, add such a job to the job queue.
Serve some kind of "please wait" content.

All the task in the job queue needs to do is generate the image and place it in the cache.

All this has to happen within a few seconds - the user is waiting for a graph to appear. I'm not sure that my suggested approach would be very user friendly as AWS Lambda needs time to start-up and the user may be waiting around a long time for the graph.

It's always been the case that the user has had to wait for their graph to be generated. Generating it in a separate process will inevitably add some overhead. But there is no reason that overhead needs to be large, and the alternative would be to make other users wait even longer to see their graphs.

Assuming the graph image is displayed in some web page that you also have control over, the page can contain some JavaScript that periodically requests this GET resource via XHR, displaying only "please wait, generating" until the final finished image is available.

Once the graph is being generated in a separate process, you can work on making an optimized version of that process that generates the graph faster, perhaps in a different programming language more suited to numerical computing.

Thanks. How should I handle the queue? Build it myself or is there something I can use? And what service will actually do the heavy calculations and create the graph? Should it be a NodeJS worker thread? — Mark, Dec 27 '20 at 08:02
@Mark I would split your current nodejs app into two. Place the png generator in one and wrap it to feed from a queue (or a collection in mongo) and place the results into mongo (or redis). The other will take the web api, and instead of doing the work will try retrieve the result or place a request into the queue/mongo collection. Later you can replace either app with another implementation. — Kain0_0, Jan 04 '21 at 22:46
@Mark If your NodeJS app is running on one server, you could just manage it within the NodeJS app - no need to add Redis or Mongo, that's unnecessary complication. And yes, NodeJS worker threads can do what you want. So can worker processes. — user253751, Jan 08 '21 at 22:45

score 2 · Answer 4 · answered Jan 04 '21 at 21:26

I'm looking for suggestions on how to re-architecture it to handle multiple requests concurrently and to draw the graphs quicker (presumably this is all about increasing CPU power).

Other than making your application faster the straight forward way to handle more requests is by adding more workers (instance of your nodejs application) on more machines. Basically putting a load balancer to handle all the requests which will then forward the requests to the worker backends.

If the problem was really about too many requests, then that should allow the system scale nicely.

score -1 · Answer 5 · answered Jan 08 '21 at 22:50

Does it really need to be a lambda function? Assuming you're currently running your web app on one server, and the server has more than one core, you can start a separate thread or a child process to do the calculations. If you need more than one server, it gets more complicated, but if you don't, then why overcomplicate it?

I'd probably make it a child process, because that creates the maximum amount of decoupling. You could even rewrite the calculation program in a different language, then. Essentially, your NodeJS web server would run a command in the background, and send the result back to the user when it's done. While it's running, your web server can still process other web requests - you get a callback when it's done.

Using separate threads or processes will automatically allow you to use all your server's CPU cores - one per calculation.

If your calculation uses a lot of memory and you might get lots of calculations at the same time, you might want to make a queue inside your web server, instead of trying to run them all at once, so you don't run out of memory.

I'm sorry, did I write something terribly stupid and not notice the mistake? — user253751, Jan 09 '21 at 10:14
Thanks, so I need one server with many cores, and each one of these cores does the heavy calculations? So for 11 concurrent users, I need 100 cores? I dont know what you got downvoted, this is a good answer — Mark, Jan 09 '21 at 20:19
@Mark I was assuming that your server is only slightly too slow. If your server is *way* too slow, then you will want to have many servers, not a single server with lots of cores. But since your application is single-threaded now, and let's say your server has 4 cores, you could make it 4 times faster if you used all of them, and you wouldn't need to buy anything! — user253751, Jan 10 '21 at 02:24

Re-architecting CPU intensive Node application to handle multiple users

5 Answers5