6

This is a general question but I work in the .NET world so I'd like to know if there any specific quirks about the .NET Framework / Core platforms that I should be concerned about here.

I think it's safe to say that as a general rule, breaking long-running jobs in to threads could be slightly faster than breaking jobs in long-running processes. However, the amount of performance difference would be negligible. I haven't done any benchmarks on this, but this answer seems to indicate that there is an consensus on the matter at least as far as .NET is concerned.

However, I still hear the argument "It was broken up in to separate processes for performance reasons" often. I'm puzzled by this and I wonder if this argument ever holds weight. Please remember, this discussion has nothing to do with maintainability. The question I am asking: are multiple jobs running on a single process each necessarily faster than multiple jobs running on a single thread each inside one process?

Intuitively, I would have to guess that the answer is no. For example, 10 long-running jobs running on 10 threads should run with roughly the same performance as 10 long-running jobs on 10 different processes. But, I see people making the design choice of breaking services up in to smaller parts purely for the purpose of performance.

What I suspect is going on is that shoddy code has created scenarios where starvation is occurring. I think that what is happening is that parallelism is being overused and availability of CPUs is not being honoured so that the benefits of multiple cores are being eroded because the jobs are fighting for CPU power. Still, this doesn't really explain why breaking a job out in to separate processes would improve performance.

So, how can I prove that either a) performance is not improved by breaking jobs out in to separate processes, or b) the improvements made by breaking the code out in to separate processes is only because of poor code design in the first place?

  • Aren't they breaking the services into small processes because of some vague promise of and independent release life cycle? Oh, and [containers](https://m.xkcd.com/1988/). Edit: perhaps I should link [10 Tips for failing badly at Microservices by David Schmitz](https://www.youtube.com/watch?v=X0tjziAQfNQ). – Theraot Feb 20 '20 at 03:35
  • No. They claim it's for performance reasons. – Christian Findlay Feb 20 '20 at 03:59
  • (1) Fault Isolation is probably still the most significant motivation for using multiple processes (2) Different processes means different VM (and memory management); if for some reason a long-running sub-process appear to be in an unusual memory state (e.g. a memory leak), it may be possible to "restart" (save, quit, relaunch, load) a sub-process. This is just another kind of fault-isolation and/or to make the best out of a not-so-good situation. – rwong Feb 20 '20 at 04:45
  • Processes will need to use a variety of tools to communicate, e.g. shared memory, inter-process synchronization primitives, etc. The key is to use the correct thing for the correct purpose. Certain applications naturally favor share-nothing-except-the-screen (e.g. Chrome browser) for multiple reasons. – rwong Feb 20 '20 at 04:48
  • But meanwhile, the GPU work is concentrated in a single process. All other processes must submit through this GPU worker. – rwong Feb 20 '20 at 04:48
  • But if you need to take into account the CPU speculation side-channel attack mitigation overhead (which incurs when switching between user code and kernel code, and between user codes in different applications), it should have favored threads performance-wise. Meanwhile, security-wise it would have favored process isolation. – rwong Feb 20 '20 at 04:53
  • I think you're off topic @rwong. I'm not asking if breaking up job in to multiple processes is a good idea. I'm asking if not breaking it up will necessarily make it slower. – Christian Findlay Feb 20 '20 at 05:09
  • 1
    Does this answer your question? [Multiprocessing vs multithreading](https://softwareengineering.stackexchange.com/questions/401752/multiprocessing-vs-multithreading) – Christophe Feb 20 '20 at 07:54
  • 1
    Nope. That question doesn't have an answer, but again, I'm not asking about what is best. I'm just asking of it's possible that it's possible that breaking jobs out in to separate processes could actually be faster than breaking them in to separate threads. – Christian Findlay Feb 20 '20 at 08:02
  • I don't see a claim that they are faster. Just that they are *not much* slower – Caleth Feb 20 '20 at 08:30
  • Note that processes are more expensive on Windows than Linux. AFAIK, Linux processes are approximately as expensive as Windows threads. – user253751 Feb 20 '20 at 10:38
  • Nobody has brought it up, so I will: you can pool threads. If you use a thread pool, you can avoid the cost of spawning a thread (most times). And the cost of spawning a thread is already smaller than the cost of spawning a process. Of course, for the case of long running tasks we won’t focus on that… however, for the case of many short tasks, it is very relevant. – Theraot Feb 20 '20 at 16:48
  • 1
    If you are only focussing on performance with this question, and "this discussion has nothing to do with maintainability", then maybe the title should reflect that. It's always weird if the answer to the title is "yes" but to the body "no" – Mark Feb 21 '20 at 09:56

5 Answers5

8

as a general rule, breaking long-running jobs in to threads could be slightly faster than breaking jobs in long-running processes

There was a time when I believed this as well. But I have first-hand, real-world experience that this can be very wrong - specifically with .NET. I have observed several scenarios where the extra isolation by processes gained a speed-up by a factor almost equivalent to the number of CPUs, whilst thread parallelization only gained a speedup of a factor 2 at maximum. Though I did not always understood the reasons behind this in full, I can tell you two (possible) causes for such behaviour:

  • Shared memory pool between all threads: not sure how well the garbage collector handles the objects created by different threads, but I am pretty sure this was one of the reasons where the speedup was far below the factor we expected

  • Usage of third-party libs which support multithreading, but do not support it very efficiently. An example for this is ADODB for accessing MS Access databases. If one uses thread parallelization to open one connection per thread, each one connecting to a different database file, it seems the different threads disturb each other - still working correctly, but way slower than one could expect.

Hence my recommendation is to try it out: if there is a task to solve where you would expect thread parallelization to bring you a speed factor of "N", where N = "no of CPUs", but you only observe a factor much smaller than N, then try "process parallelization". Of course, this makes only sense if the extra effort for implementing this is acceptable.

I would not be astonished if you make the same observation as me.

Doc Brown
  • 199,015
  • 33
  • 367
  • 565
  • 1
    This is a fair call and the reason why I ask this question. I'm more or less trying to avoid literally benchmarking the difference between threads vs. processes. I could do that, but it would be a huge time sucker. This gives me food for thought, because no matter the principle says, empirical evidence is the only way to prove something true or false. It doesn't leave me in a good spot though. – Christian Findlay Feb 21 '20 at 02:54
  • @MelbourneDeveloper: You can always start to measure the speed improvement by multiple threads compared to a single thread and find out if it scales linearly, or not. If it scales as expected, then using processes cannot bring any benefit. If not, then you have to estimate the effort of creating a "by-process" solution and to analyse if it is actually worth the effort. I think there is no easy way around this. – Doc Brown Feb 21 '20 at 05:31
  • The only way I'd be able to do that would be to start merging existing projects together. Even then, I'd be benchmarking in an unrealistic scenario. – Christian Findlay Feb 21 '20 at 05:33
  • The second anecdote reminds me of COM STA (COM apartment threading model), MTA, COM marshaling... good old Microsoft technology (sarcasm) – rwong Feb 21 '20 at 15:10
  • While your answer could imply that there's a fundamental flaw in how .NET handles threading, I think one possibility is that developers over-communicate between threads and create a lot of contention. Inter-process communication tends to be more structured and the costs more obvious. – JimmyJames Feb 21 '20 at 17:07
  • @JimmyJames: If I remember correctly, in the cases where I observed the described behaviour there was definitly no over-communication, since there was almost no need for communication at all between the threads. I know this since it was pretty straightforward to make a process-parallelized version of that code. – Doc Brown Feb 21 '20 at 18:54
  • I should probably clarify that by 'communication' I mean any kind of coordination between threads including access to shared state. Even if you aren't doing this directly "libs which support multithreading" as you note, may have these kinds of flaws. For example, I work tangentially with a vendor system that has a global cache for all data access. The idea is to make it faster but what it really does is create contention on every data access. The (blameless) developers building on top of this aren't doing anything with inter-thread communication but get it 'for free'. – JimmyJames Feb 21 '20 at 19:05
5

I think you are misconstruing people's motivation for using processes. In general, people don't choose processes for speed, they choose processes for the extra isolation, then measure the speed to make sure they aren't sacrificing too much speed.

There are some languages where you usually need processes to gain true parallelism. JavaScript is famously single-threaded, and Python has the Global interpreter lock that will cause serialization within a single process. Also, separate processes are obviously required to be able to scale across multiple physical nodes. In general, though, threads are preferred for speed, and even beyond that, lightweight threads/fibers/greenlets/goroutines/whatever are going to be faster than threads for many use cases.

Karl Bielefeldt
  • 146,727
  • 38
  • 279
  • 479
  • I had cases in the past where the extra isolation of processes helped to scale up the performance almost linear with the number of CPUs, which was not possible by using just threads. – Doc Brown Feb 20 '20 at 22:15
  • I don't know how many times I have to repeat what I've been told. I haven't misconstrued anything. I have been told that the applications were broken up for performance reasons - not isolation. But, I do understand your point. – Christian Findlay Feb 20 '20 at 23:32
3

There is not a real performance benefit for using processes over threads.

The real reason why you hear this is a tiny “white lie”. Thing is: writing multithreaded software is hard and it can get real complex real fast. You have to deal with race conditions, locking, asynchronous calls and a lot more, which make it very error-prone.

So you do the poor man’s multithreading: multiple processes of sequentially coded programs. You still have to deal with some race conditions but all in all, it becomes magnitudes easier to code.

But then when someone ask you why you chose processes over threads, are you going to say: “I’m too bad of a programmer to use threading” or mutter something about performance?

Pieter B
  • 12,867
  • 1
  • 40
  • 65
  • 5
    Using processes because they are harder to get wrong isn't a bad thing. [Falling into the pit of success](https://blog.codinghorror.com/falling-into-the-pit-of-success/) – Caleth Feb 20 '20 at 13:58
  • @Caleth true that. – Pieter B Feb 20 '20 at 15:43
  • 1
    @Caleth Actually it's also easy to get wrong too. There are a few famous examples (well known big corps, both Seattle and Bay Area) where when big corps implement process isolation, they got so many deadlocks (or what looks like so) in shipped products that the instability haunted their customer base for several years, tanking their reputations. After that they learned how to do this correctly. It was a workforce-wide educational issue, and inadequately covered by the CS education. Now these products look very stable. But they also became very memory hungry too... – rwong Feb 21 '20 at 02:19
  • I think you may have hit the nail on the head here @Pieter-B – Christian Findlay Feb 21 '20 at 02:48
0

As far as I remember it depends on the OS. If the OS supports only ULT (user level threads) and if your process will do a lot of blocking I/O on different ULTs then it will probably be faster to split it into multiple processes and let the OS schedule them so that every time one of the processes block the OS will schedule one of the other processes and the overall performance will be better, especially in a multiprocessor environment.

Glorfindel
  • 3,137
  • 6
  • 25
  • 33
-1

There is positive proof there is no difference: Windows only runs threads. Windows does not "run" processes. Processes are a security boundary and memory is managed per-process.

Martin K
  • 2,867
  • 6
  • 17