-8

I'm pondering a project in which one component needs to make a large number of http requests at accurate times. It should, let's say, release a set of 'dozens to hundreds' of requests at 1 second intervals. It is important that the requests are received by the other parties as close to the target time as possible.

Notwithstanding issues outside of our control such as network partitions / performance, I'm wondering what other issues I might face and any recommendations for overcoming them.

My proficiency is in .Net and JavaScript. I'm wondering if the former would be unsuitable due to its managed nature (garbage collections might cause timing issues). I wonder if JavaScript (Node?) would be any better (even though single threaded, it might 'fire and forget' them fast enough). Would another language / platform such as Erlang be particularly better suited?

This would ideally run on a cloud provider like GCP, Azure or AWS. Could I hit issues such as limits on how many open HTTP requests I could have?

Thanks for your help :-)

  • What has caused you to reject the already available tools for doing this sort of thing? – Philip Kendall Apr 23 '18 at 05:16
  • Nothing but my ignorance to their existence! Can you please elaborate with examples of what you're thinking? Thank you :-) – Damien Sawyer Apr 23 '18 at 05:21
  • The language you use will have a trivial to no impact on the success of this endeavor. – Telastyn Apr 23 '18 at 05:37
  • `It is important that the requests are received by the other parties as close to the target time as possible` nobody can guarantee this. Maybe not even your platform will. I think you are starting by the wrong place. Start sizing the hardware you will need first. How many CPUs, ram and bandwidth. Then start thinking in parallelization. If you need immediacy, you will have to deploy the application as close to the target as possible. Preferably in the same CPD/cluster. – Laiv Apr 23 '18 at 06:23
  • Hmm. Thanks Laiv. Yes, I know that there are, 'no guarantees' by the very nature of the problem domain including many (most) parts being beyond my control - however I suspect that 'given hardware held constant', some software tools will fair better. e.g. I'd be surprised if the "world's greatest" Erlang coder couldn't produce better results than the "world's greatest Ruby coder". Are not some platforms better for writing 'real time' code than others? The famous Java license text comes to mind (https://www.reddit.com/r/ProgrammerHumor/comments/80puvg/dont_use_java_in_nuclear_reactors/) – Damien Sawyer Apr 23 '18 at 06:31
  • Reading down that Reddit thread I just posted, there's a comment which is pertinent - "Garbage collection introduces unknown timing (garbage collection is part of frameworks doing memory management). For some applications, this unpredictability is unsuitable'. That's pretty much my point I guess. That being what it is, perhaps I should do a proof of concept in (non garbage collected?) JavaScript and try and do some performance profiling on it. – Damien Sawyer Apr 23 '18 at 06:36
  • When there is some "cloud space" between your communicating components, the unpredictability of the garbage collector is probably your smallest issue. The unpredictability of the network is probably much more important. – Doc Brown Apr 23 '18 at 09:38
  • Thanks Doc. I understand that, hence my comment "Notwithstanding issues outside of our control". – Damien Sawyer Apr 23 '18 at 19:55
  • 1
    You said nothing about using a real-time operating system, but I assume you are aware that you have no good control over timing without one. – Frank Hileman Apr 24 '18 at 18:17
  • Thanks Frank. That's helpful. I think you're on the right track. I might look into that. Or, perhaps keep it up my sleeve if I hit issues. – Damien Sawyer Apr 24 '18 at 21:28
  • After all the downvotes - I've referred this post to a number of my 20 year plus veteran dev friends for clarification. I mean, perhaps it's me that asked a stupid question. The unanimous agreement among them is that my question was valid and most respondents have missed the point COMPLETELY. I just read this blog post which I think is kind of relevant. https://stackoverflow.blog/2018/04/26/stack-overflow-isnt-very-welcoming-its-time-for-that-to-change/?cb=1 – Damien Sawyer Apr 29 '18 at 07:37
  • The presence or utilization of garbage collection isn't a predicate to poor or unpredictable performance. Those are implementation details of the particular garbage collector. Note answers besides the accepted answer that go into detail: (https://stackoverflow.com/questions/3559878/is-a-garbage-collector-net-java-an-issue-for-real-time-systems) – JustinC May 04 '18 at 01:32

1 Answers1

1

You're definitively going the wrong way about it.

Let say you have have two point of access in HTTP that does the following :

  • Give the current time of the server
  • Perform a write in the database

If you test the first and succesfully handle 100k connections, that doesn't mean your site will reliabely work for every kind of request up to 100k connections.

I think you're on the development side, not system/network. If you really want to test something, you should be settings up a server with enough test datas in it to match what it could be in production, then for each request you can make, you define a reasonable response time (usually < 500ms or 1s).

If you're really aiming for more than 10k connections handled in parallel, I would advise to get some advise from specialists for the hardware, network and database stuff. Unless you're just providing a fully static HTML site.

Also note that if some languages are better for near real time thing, pretty much all of them have some unpredictability (malloc in C), and your computer may also switch on others task from time to time. If you really want to go on the stress test and can't get enough requests with one computer, just set up another one, the only thing that matter after all is the number of request that your server receive.

Walfrat
  • 3,456
  • 13
  • 26
  • Thanks for your detailed response. It works seem though that, like others who've posted, you've kind of missed the point. I completely understand that, once the requests leave my boxes, I have no control over what happens. They may NEVER reach the target. My question is more about how to manage the concurrency on my box(es). For example, am I going to struggle with a large number of I/O completion ports for the responses. – Damien Sawyer Apr 23 '18 at 19:59
  • Your last sentence underlines where you have misunderstood me. What is important is not how many requests are received, it's how many I can send. I have zero guarantees that ANY will be received! I'm only concerned with what I can control. – Damien Sawyer Apr 23 '18 at 20:04