How much effort should we spend to programming for multiple cores?

Question

Processors are getting more and more cores these days, which leaves me wondering...

Should we, programmers, adapt to this behaviour and spent more effort on programming for multiple cores?

To what extent should we do and optimize this? Thread? Affinity? Hardware optimizations? Something else?

ChrisF · Accepted Answer · 2012-02-06T12:33:15.177

16

No matter how good you are, it will be unlikely that you'll come up with a better scheme of managing threads etc. than the teams developing the language and compiler you are writing your code in.

If you need your application to be multi-threaded then create the threads you need and let the compiler and OS get on with their jobs.

You do need to be aware of how those threads are managed so you can make best use of the resources. Not creating too many threads is one thing that springs to mind as an example.

You also need to be aware of what is going on (see Lorenzo's comment) so you can provide hints to the thread management (or override it in special cases), but I would have thought that these would be few and far between.

edited Feb 06 '12 at 12:33

answered Sep 24 '10 at 11:43

ChrisF

38,878
11
125
168

3

But a thread continuously jumping from a core to another will have performance penalties (due to missed first and second level CPU cache), especially in architectures where two distinct physical dies are employed. In multithreaded intensive code, affinity is a good thing. – Wizard79 Sep 24 '10 at 16:36
@Lorenzo - In that case you'll need to see if you can tie the thread to a single core - which is perhaps a special case - but an interesting one. – ChrisF Sep 24 '10 at 16:39
1

Wouldn't it be a rather odd move for the OS to context switch an active thread from one core to another? – JBRWilkinson Oct 10 '10 at 00:25
I agree with @JBRWilkinson, thread affinity seems like an OS job to me. – Collin Feb 06 '12 at 13:16
@CollinHockey - I agree too. I was just using that as an example for where providing hints could potentially be useful. If you can come up with a better example then that'd be great. – ChrisF Feb 06 '12 at 13:19
Affinity is a latency/throughput tradeoff. It delays the scheduling of a thread until it can be run on the affinity core again. – pjc50 Jan 13 '15 at 15:25
1

@JBRWilkinson Under linux (and I think most OSes) threads jump between cores all the time. The first reason is that you have many more threads overall than cores. And if some threads die then you need to balance. The second reason is that many threads are sleeping. And when some wake up the kernel might think one core has more load than others and move a thread, often your cpu hogging computing thread. Then 2 cpu hogging threads run on the same core untill the kernel moves one back. If you you are splitting a large job into exactly num-cores parts then you want to set the thread affinity. – Goswin von Brederlow Jun 04 '15 at 09:48

score 6 · Answer 2 · answered Sep 24 '10 at 16:38

The hard part is all in splitting your CPU intensive algorithm to chunks of execution that could be threaded.

Then, a thread continuously jumping from a core to another will have performance penalties (due to missed first and second level CPU cache), especially in architectures where two distinct physical dies are employed. In this case thread-core affinity is a good thing.

score 5 · Answer 3 · answered Sep 24 '10 at 11:51

I'm a .NET programmer, and I know that .NET has a high-level abstraction for multithreading called Tasks. It protects you from having to know too much about how to do proper multithreading against the metal. I assume that other current development platforms have similar abstractions. So if you are going to do anything with multithreading, I would try to work at that level if at all possible.

Now, to the question of should you even care about multithreading in your particular application. The answer to that question is very dependent on the application you are writing. If you are writing an application that does processing on thousands (or more) independent things, and this processing can be done in parallel, then you will almost certainly get an advantage from multithreading. However, if you are writing a simple data-entry screen, multithreading might not buy you much.

At the very least, you need to be concerned with multithreading when you are working on a UI. You don't want to fire off a long-running operation from the UI and have it become unresponsive because you hijacked the UI thread to do that operation. Fire off a background thread, and at least give the user a Cancel button so they don't have to wait for it to complete if they made a mistake.

score 5 · Answer 4 · answered Sep 24 '10 at 14:00

In the land of Objective-C and Mac OS X and iOS, the frameworks (like many others) are written to take advantage of these increases in processor cores and present the developer with a nice interface to make use of them.

Example on Mac OS X and iOS is Grand Central dispatch. There are additions to libc (I believe) to facilitate queue based multi-threading. Then the Cocoa and Foundation frameworks (amongst others) are written on top of GCD, giving the developer easy access to dispatch queues and threading with very little boiler-plate code.

Many languages and frameworks have similar concepts.

score 3 · Answer 5 · answered Oct 09 '10 at 20:02

We're now (October 2010) in a time of immense transition.

We could today purchase a 12 core desktop.
We could today purchase a 448 core processing card (search for NVidia Tesla).

There are limits to how much we developers can work in ignorance of the tremendously parallel environments our programs will be working within in the near future.

Operating systems, runtime environments and programming libraries can only do so much.

In the future, we'll need to be partitioning our processing into discrete chunks for independent processing, using abstractions like the new .NET "Task Framework".

Details such as cache management and affinity will still be present - but they'll be the provence of the ultra-performant application only. No same developer will be wanting to manage these details manually across a 10k core machine.

score 3 · Answer 6 · answered Sep 27 '11 at 08:05

well, it really depends on what you are developing. the answer, depending on what you are developing, can range from "it's insignificant" to "it's absolutely critical, and we expect everyone in the team to have a good understanding and use of parallel implementations".

for most cases, a solid understanding and use of locks, threads, and tasks and task pools will be a good start when the need for parallelism is required. (varies by lang/lib)

add to that the differences in designs you must make - for nontrivial multiprocessing, one must often learn several new programming models, or parallelization strategies. in that case, the time to learn, to fail enough times to have a solid understanding, and to update existing programs can take a team a year (or more). once you have reached that point, you will (hopefully!) not percieve or approach problems/implementations as you do today (provided you have not made that transition yet).

another obstacle is that you're effectively optimizing a program for a certain execution. if you're not given much time to optimize programs, then you really won't benefit from it as much as you should. high level (or obvious) parallelization can improve your program's percieved speed with fairly little effort, and that's as far as many teams will go today: "We've parallelized the really obvious parts of the app" - that's fine in some cases. will the benefit of taking the low hanging fruit and using simple parallization be proportionate to the number of cores? oftentimes, when there are two to four logical cores but not so often beyond that. in many cases, that's an acceptable return, given the time investment. this parallel model is many people's introduction to implementing good uses of parallelism. it is commonly implemented using parallelized iteration, explicit tasks, simple threads or multitasking.

what you learn using this trivial parallel models will not be ideal in all complex parallel scenarios; effectively applying complex parallel designs requires a much different understanding and approach. these simple models are often detached or have trivial interaction with other components of the system. as well, many implementations of these trivial models do not scale well to effectively complex parallel systems - a bad complex parallel design can take as long to execute as the simple model. ill: it executes twice as fast as the single threaded model, while utilitzing 8 logical cores during execution. the most comon examples are using/creating too many threads and high levels of synchronization interference. in general, this is termed parallel slowdown. it's quite easy to encounter if you approach all parallel problems as simple problems.

so, let's say you really should utilize efficient multithreading in your programs (the minority, in today's climate): you'll need to employ the simple model effectively to learn the complex model and then relearn how you approach program flow and interaction. the complex model is where your program should ultimately be at since that's where hardware is today, and where the most dominant improvements will be made.

the execution of simple models can be envisioned like a fork, and the complex models operate like a complex, uh, ecosystem. i think understanding of simple models, including general locking and threading should be or will soon be expected of intermediate developers when the domain (in which you develop) uses it. understanding complex models is still a bit unusual today (in most domains), but i think the demand will increase quite quickly. as developers, much more of our programs should support these models, and most of use are quite far behind in understanding and implementing these concepts. since logical processor counts are one of the most important areas of hardware improvement, the demand for people who understand and can implement complex systems will surely increase.

finally, there are a lot of people who think the solution is just "add parallelization". oftentimes, it's better to make the existing implementation faster. it's much easier and much more straightforward in many cases. many programs in the wild have never been optimized; some people just had the impression that the unoptimized version would be eclipsed by hardware someday soon. improving the design or algos of existing programs is also an important skill if performance is important - throwing more cores at problems is not necessarily the best or simplest solution.

when targeting modern PCs, most of us who need to implement good parallel systems will not need to go beyond multithreading, locking, parallel libraries, a book's worth of reading, and a lot of experience writing and testing programs (basically, significantly restructuring how you approach writing programs).

score 2 · Answer 7 · answered Sep 24 '10 at 11:48

2

We do, but we write calculation heavy software so we benefit directly from multiple cores.

Sometimes the scheduler moves threads between cores a lot. If that is not acceptable, you can play with the core affinity.

answered Sep 24 '10 at 11:48

Toon Krijthe

2,634
1
21
30

score 0 · Answer 8 · answered Sep 24 '10 at 14:51

As it stands, processor frequency isn't going to increase in the near future. We're stuck around the 3 GHz mark (w/o overclocking). Certainly, for many applications it may not be necessary to go beyond very basic multi-threading. Obviously if you're building a user interface application, any intensive processing should be done on a background thread.

If you're building an application that is processing huge amounts of data that must be real-time, then yes, you probably should look into multi-threading programming.

For multi-threaded programming you will find that you will get diminishing returns on your performance; you can spend hours and improve the program by 15%, and then spend another week and only improve it by a further 5%.

How much effort should we spend to programming for multiple cores?

8 Answers8

Linked