60

I have close to 3 years experience writing web applications in Java using MVC frameworks (like struts). I have never written multithreaded code till now though I have written code for major retail chains.

I get a few questions on multithreading during interviews and I answer them usually (mostly simple questions). This left me wondering how important is Multithreading in the current industry scenario ?

user2434
  • 1,277
  • 1
  • 12
  • 14
  • 8
    You may not have done so explicitly but you have definitely taken advantage of it behind the scenes. – Martin York Sep 19 '11 at 17:45
  • 1
    I too rarely work with multi-threaded code for work, but I do try to read up on it / be able to discuss it during an interview. I would not want to work with coders who do not get threads, and I would not want to work with coders who do not care whether other coders get threads. – Job Sep 19 '11 at 17:47
  • 1
    I rarely use it in web development, but I think it's more common elsewhere. For instance, I was recently writing an Android app and realized you're **required** to use multithreading if you have any network activity. – jwegner Sep 19 '11 at 19:15
  • 4
    It's not multithreading that's important, it's parallel computing. If you think that everything single request that goes to your web app is on the thread... you must be smoking something. – user606723 Sep 19 '11 at 19:55
  • 1
    The ability to "Think outside the thread" is very nice even for single threaded programming. You take a lot less for granted, and your code is generally more robust and reusable. – corsiKa Sep 19 '11 at 21:10
  • how important is it in web development though? javascript ? 1 thread. web server? most web application frameworks only let you know about one thread. – Neil McGuigan Sep 20 '11 at 03:01
  • @NRM: Increasingly more you need to think of concurrency in JavaScript apps too. E.g. read about [Web Workes](https://developer.mozilla.org/en/Using_web_workers). – Jonas Sep 20 '11 at 17:31

15 Answers15

93

It is extremely important.

What is more important though is to understand that multithreading is just one way to solve the asynchrony problem. The technical environment in which many people are now writing software differs from the historical software development environment (of monolithic applications performing batch computations) in two key ways:

  • Many-core machines are now common. We can no longer expect clock speeds or transistor densities to increase by orders of magnitude. The price of computation will continue to fall, but it will fall because of lots of parallelism. We're going to have to find a way to take advantage of that power.

  • Computers are now heavily networked and modern applications rely upon being able to fetch rich information from a variety of sources.

From a computational standpoint, these two factors essentially boil down to the same core idea: information increasingly will be available in an asynchronous fashion. Whether the information you need is being computed on another chip in your machine or on a chip halfway around the world doesn't really matter. Either way, your processor is sitting there burning billions of cycles a second waiting for information when it could be doing useful work.

So what matters now, and what will matter even more in the future, is not multithreading per se, but rather, dealing with asynchrony. Multithreading is just one way to do that -- a complicated, error-prone way that is only going to get more complicated and more error-prone as weak-memory-model chips become more widely used.

The challenge for tools vendors is to come up with some way better than multithreading for our customers to deal with the asynchronous infrastructure they'll be using in the future.

Eric Lippert
  • 45,799
  • 22
  • 87
  • 126
  • 5
    +1 for an excellent answer, it deserves more credit than my own humble attempt. – Péter Török Sep 19 '11 at 20:22
  • 2
    **Information increasingly will be available in an asynchronous fashion.** If that ain't the truth. . . – surfasb Sep 19 '11 at 20:55
  • 2
    `concurrency` is more important than `asynchronous ` behavior. You can have asyncronous without concurrency ( ie multiple threads on a single core CPU ) `asynchronous` is not a semantic substitute for `concurrency`. –  Sep 20 '11 at 17:18
  • 5
    @Jarrod: Taming asynchrony is *more* important than merely taming concurrency for precisely the reason you mention: **concurrency is just a particularly difficult kind of asynchrony.** The difficult part of concurrency is not the "things happening at the same time" aspect of it and indeed, concurrency is often only *simulated concurrency*, eg, non-cooperative multitasking via time slicing. The difficult part is in *efficiently* using resources without blocking, hanging, deadlocking, and without writing *inside out* programs that are hard to reason about locally. – Eric Lippert Sep 20 '11 at 18:27
  • "concurrency is often only simulated concurrency, eg, non-cooperative multitasking via time slicing": in my understanding this is still (true) concurrency, maybe you mean it is not parallelism? – Giorgio Jan 16 '17 at 19:14
  • @Giorgio: By "concurrent" I mean *existing, happening or done at the same time*. – Eric Lippert Jan 16 '17 at 21:35
  • @EricLippert: Ok, maybe I just misinterpreted your comment.: I thought you meant "simulated concurrency" is not real concurrency. – Giorgio Jan 17 '17 at 18:30
46

It is getting ever more important as modern processors have more and more cores. A decade ago most of the existing computers had only a single processor, so multithreading was important only on higher-end server applications. Nowadays even basic laptops have multicore processors. In a few years even mobile devices... So more and more code is required to use the potential performance advantages of concurrency and to run correctly in a multithreaded environment.

Péter Török
  • 46,427
  • 16
  • 160
  • 185
  • 3
    +1: More important than ever. Remember too that in a system design, you can also get the benefits of multithreading just by partitioning the work so that more processes are doing it. – Scott C Wilson Sep 19 '11 at 13:15
  • 11
    Quite a few mobile devices already have multi-core processors! – Che Jami Sep 19 '11 at 16:58
  • 3
    I'd argue that multi-threading has been important since the first time-sharing system was built. Having multiple processors/cores just adds a new dimension of efficiency to having multiple threads. – jwernerny Sep 19 '11 at 17:43
  • Maybe (especially on mobile devices) threads is a bad idea. The OS should probably handling optimizing the usage of the cores without having buggy user code attempting to do threading. There are very few applications a normal user has access to that need or would benefit for multitudes. The only exception are (high end graphics applications/developers tools/weather modelling/Web servers (and associated services)) all very high end specialized applications. – Martin York Sep 19 '11 at 17:49
  • 1
    @Tux-D, you may very well have a game on a mobile device which utilizes more than one core. It's not something exceptional. – Catherine Sep 19 '11 at 19:17
  • @Tux-D: Almost all applications a normal user use **should** use multithreading. Almost all GUI frameworks use an event-loop and **all** long running tasks (e.g. accessing the harddrive, Internet or parsing XML) should be done in a background thread, otherwise the user experience will be bad with a blocked UI. It's not so much about using all cores, but doing things **asynhronous** tasks. See Eric Lipperts answer. – Jonas Sep 19 '11 at 20:09
  • @Jonas: I agree on both points. But I don't think it is anybodies interest to allow developers to do it. Providing the functionality in a safe way by providing the toolkit or OS resources that do it automatically is better. – Martin York Sep 19 '11 at 21:30
  • @Tux-D: I agree, but then it must me in higher abstractions e.g. as in Erlang a simple `spawn` and no shared data, just message passing. It's nice to use Akka actors in Java and Scala with this model. – Jonas Sep 19 '11 at 21:39
  • You've not programmed until you've tried C in 3+n: n>0 dimensions. Multi-threading is just the tip of the iceberg. Groking Hypercubes...that makes you a better programmer. –  Sep 20 '11 at 03:10
28

In general, multi-threading is already quite important, and is only going to get more important in the next few years (as Péter Török) pointed out - it is how processors will scale for the forseeable future (more cores instead of higher MHz).

In your case, however, you seem to be working mainly with web applications. Web applications, by their nature, are multi-threaded due to the way your web server processes requests for each user (i.e. in parallel). While it's probably important for you to understand concurrency and thread-safety (especially when dealing with caches and other shared data), I doubt you will run into too many cases where it's beneficial to multi-thread the web application code internally (i.e. multiple worker threads per request). In that sense, I think being an expert at multi-threading is not really necessary for a web developer. It's often asked in interviews, because it is quite a tricky subject, and also because many interviewers just google up a few questions 10 minutes before you get there.

Daniel B
  • 6,184
  • 1
  • 22
  • 30
  • +1 for the note that the poster is a web developer and most web server containers do a decent amount of multi-threading work for you. Not that it eliminates the need in some cases, but 99% of the time multi-threading controller code isn't the biggest performance improvement for an MVC call. – Mufasa Sep 23 '11 at 17:27
19

Multi-threading is a red herring. Multi-threading is a implementation detail to the real problem which is Concurrency. Not all threaded programs are concurrent because of locks and what not.

Threads are only one model and implementation pattern for implementing concurrent programs.

For instance you can write highly scalable and fault tolerant software without every doing any multi-threading in languages such as Erlang.

  • +1 although I still think Erlang is multi-threaded; the community just redefined the word "thread" to depend on mutable shared state, and thus distinguish themselves from it. – Dan Sep 20 '11 at 04:38
  • 1
    the Erlang VM usings 1 thread per CPU by default, but as an Erlang developer, you don't have access to the underlying OS threads only the light weight processes that the Erlang VM supplies. –  Sep 20 '11 at 17:16
10

I get a few questions on multithreading during interviews...

Well for passing the interviews, multithreading might be quite important. Quoting self, "when interviewing candidates for our team, I ask concurrency questions not because these skills are important in our project (these are not) but because these somehow make it easier for me to evaluate general knowledge of language we use..."

gnat
  • 21,442
  • 29
  • 112
  • 288
  • 2
    Having some idea about multithreading and concurrent programming also usually translates to a defensive approach, which can be a very good thing. If you have to take into account that something entirely unrelated within your process may or may not preempt a single logical statement and execute in the middle of everything else, then you have to plan for that possibility. Multithreaded implementations (as opposed to other forms of concurrency) simply means that you have the additional burden of that it may do something to any state that isn't thread-local. – user Sep 20 '11 at 07:49
6

Understanding how to leverage threading to improve performance is a critical skill in today's software environment, for most industries and applications.

At a minimum, understanding the issues involved with concurrency should be a given.

The obvious note that not all applications or environments will be able to take advantage of it applies, for example in many embedded systems. However it seems as though the Atom processor (et al) seem to be working to change that (lightweight multicore starting to become more common).

Stephen
  • 2,121
  • 1
  • 14
  • 24
4

Sounds like you're already writing multithreaded code.

Most Java web applications can handle multiple requests at the same time, and they do this by using multiple threads.

Therefore I'd say it's important to know the basics at least.

Tom Jefferys
  • 303
  • 1
  • 4
  • 18
    apparently (s)he isn't writing multithreaded code, only (single threaded) code which is run in a multithreaded environment. – Péter Török Sep 19 '11 at 12:59
2

It's still important in situations where you need it, but like a lot of things in development it's the right tool for the right job. I went for 3 years without touching threading, now practically everything I do has some grounds in it. With multi-core processors there's still a great need for threading, but all the traditional reasons are still valid, you still want responsive interfaces and you still want to be able to deal with sync and get on with other things at once.

Nicholas Smith
  • 1,621
  • 10
  • 11
2

Short answer: Very.

Longer answer: Electronic (transistor-based) computers are fast approaching the physical limits of the technology. It is becoming harder and harder to squeeze more clocks out of each core while managing heat generation and the quantum effects of microscopic circuits (circuit paths are already being placed so close together on modern chips that an effect called "quantum tunneling" can make an electron "jump the tracks" from one circuit to another, without needing the proper conditions for a traditional electrical arc); so, virtually all chip manufacturers are instead focusing on making each clock able to do more, by putting more "execution units" into each CPU. Then, instead of the computer doing just one thing per clock, it can do 2, or 4, or even 8. Intel has "HyperThreading", which basically splits one CPU core into two logical processors (with some limitations). Virtually all manufacturers are putting at least two separate CPU cores into one CPU chip, and the current gold standard for desktop CPUs is four cores per chip. Eight is possible when two CPU chips are used, there are server mainboards designed for "quad quad-core" processors (16 EUs plus optional HT), and the next generation of CPUs is likely to have six or eight per chip.

The upshot of all of this is that, to take full advantage of the way computers are gaining computing power, you must be able to allow the computer to "divide and conquer" your program. Managed languages have at least a GC thread which handles memory management separately from your program. Some also have "transition" threads which handle COM/OLE interop (as much for protecting the managed "sandbox" as for performance). Beyond that, though, you really have to start thinking about how your program can do multiple things simultaneously, and architect your program with features designed to allow pieces of the program to be handled asynchronously. Windows, and windows users, will practically expect your program to perform long, complicated tasks in background threads, which keep the UI of your program (which runs in the program's main thread) "responsive" to the Windows message loop. Obviously, problems that have parallelizable solutions (like sorting) are natural candidates, but there are a finite number of types of problems that benefit from parallelization.

KeithS
  • 21,994
  • 6
  • 52
  • 79
1

Just a warning about multithreading: More threads don't mean better efficiency. If not managed properly, they may slow down the system. Scala's actor improve upon Java's threading and maximize system usage (mentioned it as you're a Java developer).

EDIT: Here's are some things to keep in mind about the downsides of multithreading:

  • interference of threads with each other when sharing hardware resources
  • Execution times of a single thread are not improved but can be degraded, even when only one thread is executing. This is due to slower frequencies and/or additional pipeline stages that are necessary to accommodate thread-switching hardware.
  • Hardware support for multithreading is more visible to software, thus requiring more changes to both application programs and operating systems than Multiprocessing.
  • Difficulty of managing concurrency.
  • Difficulty of testing.

Also, this link might be of some help about the same.

c0da
  • 1,526
  • 3
  • 12
  • 20
  • 2
    This doesn't seem to answer the OP's question :-/ – Péter Török Sep 19 '11 at 13:04
  • It gives a top(most) level view of threading, though. A thing to consider before delving into multi-threading. – c0da Sep 19 '11 at 13:08
  • @c0da Stack Exchange isn't a discussion board: answers should directly answer the question. Can you expand your answer to bring it back to what the asker is looking for? –  Sep 20 '11 at 00:32
1

This left me wondering how important is Multithreading in the current industry scenario?

In performance-critical fields where the performance isn't coming from third party code doing the heavy-lifting, but our own, then I'd tend to consider things in this order of importance from the CPU perspective (GPU is a wildcard I won't go into):

  1. Memory Efficiency (ex: locality of reference).
  2. Algorithmic
  3. Multithreading
  4. SIMD
  5. Other Optimizations (static branch prediction hints, e.g.)

Note that this is list is not solely based on importance but a lot of other dynamics like the impact they have on maintenance, how straightforward they are (if not, worth considering more in advance), their interactions with others on the list, etc.

Memory Efficiency

Most might be surprised at my choice of memory efficiency over algorithmic. It's because memory efficiency interacts with all 4 other items on this list, and it's because consideration of it is often very much in the "design" category rather than "implementation" category. There is admittedly a bit of a chicken or the egg problem here since understanding memory efficiency often requires considering all 4 items on the list, while all 4 other items also require considering memory efficiency. Yet it's at the heart of everything.

For example, if we have a need for a data structure that offers linear-time sequential access and constant-time insertions to the back and nothing else for small elements, the naive choice here to reach for would be a linked list. That's disregarding memory efficiency. When we consider memory efficiency in the mix, then we end up choosing more contiguous structures in this scenario, like growable array-based structures or more contiguous nodes (ex: one storing 128 elements in a node) linked together, or at the very least a linked list backed by a pool allocator. These have a dramatic edge in spite of having the same algorithmic complexity. Likewise, we often choose quicksort of an array over merge sort in spite of an inferior algorithmic complexity simply because of memory efficiency.

Likewise, we can't have efficient multithreading if our memory access patterns are so granular and scattered in nature that we end up maximizing the amount of false sharing while locking at the most granular levels in code. So memory efficiency multiplies the efficiency multithreading. It's a prerequisite to getting the most of out threads.

Every single item above on the list has a complex interaction with data, and focusing on how data is represented is ultimately in the vein of memory efficiency. Every single one of these above can be bottlenecked with an inappropriate way of representing or accessing data.

Another reason memory efficiency is so important is that it can apply throughout an entire codebase. Generally when people imagine that inefficiencies accumulate from little bitty sections of work here and there, it's a sign that they need to grab a profiler. Yet low-latency fields or ones dealing with very limited hardware will actually find, even after profiling, sessions that indicate no clear hotspots (just times dispersed all over the place) in a codebase that's blatantly inefficient with the way it's allocating, copying, and accessing memory. Typically this is about the only time an entire codebase can be susceptible to a performance concern that might lead to a whole new set of standards applied throughout the codebase, and memory efficiency is often at the heart of it.

Algorithmic

This one's pretty much a given, as the choice in a sorting algorithm can make the difference between a massive input taking months to sort versus seconds to sort. It makes the biggest impact of all if the choice is between, say, really sub-par quadratic or cubic algorithms and a linearithmic one, or between linear and logarithmic or constant, at least until we have like 1,000,000 core machines (in which case memory efficiency would become even more important).

It's not at the top of my personal list, however, since anyone competent in their field would know to use an acceleration structure for frustum culling, e.g. We're saturated by algorithmic knowledge, and knowing things like using a variant of a trie such as a radix tree for prefix-based searches is baby stuff. Lacking this kind of basic knowledge of the field we're working in, then algorithmic efficiency would certainly rise to the top, but often algorithmic efficiency is trivial.

Also inventing new algorithms can be a necessity in some fields (ex: in mesh processing I've had to invent hundreds since they either did not exist before, or the implementations of similar features in other products were proprietary secrets, not published in a paper). However, once we're past the problem-solving part and find a way to get the correct results, and once efficiency becomes the goal, the only way to really gain it is to consider how we're interacting with data (memory). Without understanding memory efficiency, the new algorithm can become needlessly complex with futile efforts to make it faster, when the only thing it needed was a little more consideration of memory efficiency to yield a simpler, more elegant algorithm.

Lastly, algorithms tend to be more in the "implementation" category than memory efficiency. They're often easier to improve in hindsight even with a sub-optimal algorithm used initially. For example, an inferior image processing algorithm is often just implemented in one local place in the codebase. It can be swapped out with a better one later. However, if all image processing algorithms are tied to a Pixel interface which has a sub-optimal memory representation, but the only way to correct it is to change the way multiple pixels are represented (and not a single one), then we're often SOL and will have to completely rewrite the codebase towards an Image interface. Same kind of thing goes for replacing a sorting algorithm -- it's usually an implementation detail, while a complete change to the underlying representation of data being sorted or the way it's passed through messages might require interfaces to be redesigned.

Multithreading

Multithreading is a tough one in the context of performance since it's a micro-level optimization playing to hardware characteristics, but our hardware is really scaling in that direction. Already I have peers who have 32 cores (I only have 4).

Yet mulithreading is among the most dangerous micro-optimizations probably known to a professional if the purpose is used to speed up software. The race condition is pretty much the most deadly bug possible, since it's so indeterministic in nature (maybe only showing up once every few months on a developer's machine at a most inconvenient time outside of a debugging context, if at all). So it has arguably the most negative degradation on maintainability and potential correctness of code among all of these, especially since bugs related to multithreading can easily fly under the radar of even the most careful testing.

Nevertheless, it's becoming so important. While it may still not always trump something like memory efficiency (which can sometimes make things a hundred times faster) given the number of cores we have now, we're seeing more and more cores. Of course, even with 100-core machines, I'd still put memory efficiency at the top of the list, since thread efficiency is generally impossible without it. A program can use a hundred threads on such a machine and still be slow lacking efficient memory representation and access patterns (which will tie in to locking patterns).

SIMD

SIMD is also a bit awkward since the registers are actually getting wider, with plans to get even wider. Originally we saw 64-bit MMX registers followed by 128-bit XMM registers capable of 4 SPFP operations in parallel. Now we're seeing 256-bit YMM registers capable of 8 in parallel. And there's already plans in place for 512-bit registers which would allow 16 in parallel.

These would interact and multiply with the efficiency of multithreading. Yet SIMD can degrade maintainability just as much as multithreading. Even though bugs related to them aren't necessarily as difficult to reproduce and fix as a deadlock or race condition, portability is awkward, and ensuring that the code can run on everyone's machine (and using the appropriate instructions based on their hardware capabilities) is awkward.

Another thing is that while compilers today usually don't beat expertly-written SIMD code, they do beat naive attempts easily. They might improve to the point where we no longer have to do it manually, or at least without getting so manual as to writing intrinsics or straight-up assembly code (perhaps just a little human guidance).

Again though, without a memory layout that's efficient for vectorized processing, SIMD is useless. We'll end up just loading one scalar field into a wide register only to do one operation on it. At the heart of all of these items is a dependency on memory layouts to be truly efficient.

Other Optimizations

These are often what I would suggest we start calling "micro" nowadays if the word suggests not only going beyond algorithmic focus but towards changes that have a minuscule impact on performance.

Often trying to optimize for branch prediction requires a change in algorithm or memory efficiency, e.g. If this is attempted merely through hints and rearranging code for static prediction, that only tends to improve the first-time execution of such code, making the effects questionable if not often outright negligible.

Back to Multithreading for Performance

So anyway, how important is multithreading from a performance context? On my 4-core machine, it can ideally make things about 5 times faster (what I can get with hyperthreading). It would be considerably more important to my colleague who has 32 cores. And it will become increasingly important in the years to come.

So it's pretty important. But it's useless to just throw a bunch of threads at the problem if the memory efficiency isn't there to allow locks to be used sparingly, to reduce false sharing, etc.

Multithreading Outside of Performance

Multithreading isn't always about sheer performance in a straightforward throughput kind of sense. Sometimes it's used to balance a load even at the possible cost of throughput to improve responsiveness to the user, or to allow the user to do more multitasking without waiting for things to finish (ex: continue browsing while downloading a file).

In those cases, I'd suggest that multithreading rises even higher towards the top (possibly even above memory efficiency), since it's then about user-end design rather than about getting the most out of the hardware. It's going to often dominate interface designs and the way we structure our entire codebase in such scenarios.

When we're not simply parallelizing a tight loop accessing a massive data structure, multithreading goes to the really hardcore "design" category, and design always trumps implementation.

So in those cases, I'd say considering multithreading upfront is absolutely critical, even more than memory representation and access.

0

Concurrent and parallel programming is what is becoming important. Threads are just one programming model of doing multiple things at the same time (and not in pseudo-parallel like it used to be before the rise of multi-core processors). Multi-threading is (IMHO fairly) criticized for being complex and dangerous since threads share many resources and the programmer is responsible for making them cooperate. Otherwise you end up with deadlocks which are hard to debug.

sakisk
  • 3,377
  • 2
  • 24
  • 24
0

Since we may need to contact many external applications, there may be some background process should occur where the external system interaction takes more time and end user can't wait till the process is done. so Multithreading is important..

we are using in our app, we first try to contact external system if it is down then we save the request in Database and span a thread to finish the process in backgound. May required in Batch operations too.

TPReddy
  • 51
  • 3
0

Historically people had to struggle by doing multithreaded programming by hand. They had to work with all of the core components (threads, semaphores, mutexes, locks, etc.) directly.

All these efforts resulted in applications that were able to scale by adding additional cpus to a single system. This vertical scalability is limited by "whats the biggest server I can buy".

Nowadays I see a shift towards using more frameworks and different design models for software design. MapReduce is one such model which is focused towards batch processing.

The goal is scaling horizontally. Adding more standard servers instead of buying bigger servers.

That said the fact remains that really understanding multithreaded programming is very important. I've been in the situation where someone created a race condition and didn't even know what a race condition is until we noticed strange errors during testing.

Niels Basjes
  • 119
  • 4
-1

My machine has 8 cores. In Task Manager, I have 60 processes running. Some, like VS, use up to 98 threads. Outlook uses 26. I expect most of my memory usage is the stacks allocated to each of those idle threads.

I'm personally waiting for the 300-core computer to come out so that I don't have to wait for Outlook to respond. Of course by then Outlook will use 301 threads.

Multi-threading only matters if you are building systems that will be the only important process on the computer at a particular time (e.g., calculation engines). Desktop apps would probably do the user a favour by not using up every available core. Web apps using the request/response model are inherently multi-threaded.

It matters to framework and language designers, and back-end systems programmers - not so much to application builders. Understanding some basic concepts such as locking and writing async code is probably worthwhile though.

Paul Stovell
  • 1,689
  • 1
  • 9
  • 14
  • I will often whack something on a background thread such as a long DB load, but its very rare I have to deal with race conditions or locks etc. (in fact probably never) – Aran Mulholland Sep 19 '11 at 23:00