6

After asking this question, I got to realize that parallelism may not always be good.

So far I can see that parallelism (under c#) is

  • A little complicated code-wise
  • Will probably insert some concurrency complications
  • Will not always be faster

I am looking inside parallelism because I want to make existing and new applications work faster, but now I see that it is no silver bullet.

When should I use it? are there any best practices? Is it just a waste of time in most cases? Is it suitable for a specific kind of applications?

I am looking forward for your insights on this matter.

Mithir
  • 1,339
  • 3
  • 13
  • 23
  • 2
    recommended reading: [The Free Lunch Is Over: A Fundamental Turn Toward Concurrency in Software](http://www.gotw.ca/publications/concurrency-ddj.htm) by Herb Sutter. _"In this article, I’ll describe the changing face of hardware, why it suddenly does matter to software, and how specifically the concurrency revolution matters to you and is going to change the way you will likely be writing software in the future..."_ – gnat May 03 '12 at 08:52

8 Answers8

7

While it's definitely not a silver bullet, .Net 4's Task Parallel Library can simplify some aspects of parallelising your code, once you've already made the decision to do so.

To answer your question, there certainly are differences between types of problems, in terms of how much benefit they gain from being parallelised. Some problems are referred to as embarassingly parallel, and the definition has a lot to do with the amount of communication necessary between the tasks.

Although I am sure you could come up with a set of guidelines for the types of tasks that benefit greatly from parallelisation in general (e.g. see linkerro's answer for some), you can also simply analyse the problem and make a judgement call yourself (assuming you understand concepts at a fairly low level).

Having said all of this, there is a lot to be said for simplicity and code readability. In general, unless you are dealing with problems that greatly benefit from parallelism and / or performance is a high priority, attempting to make everything parallel is just going to complicate things and bite you. Most of the time, there are better places to spend your spare brainpower :)

Daniel B
  • 6,184
  • 1
  • 22
  • 30
  • +1 this is a good answer, thanks. I'm curious as to if there is a good way to determine which of my specific processes is good for parallel and which are not, like, maybe if CPU spikes or something like that – Mithir May 03 '12 at 13:47
  • Thanks. Well, if you have a single task which causes one core of the CPU to spike to 100% (preferably for an extended period of time), then it's very likely that you can get performance benefits by parallelisation. Long-running calculation-type tasks are great candidates, as long as the calculation can be broken down into sub-tasks that can be done separately. A typical example is applying an effect to an image - the calculation is often done pixel-by-pixel in a way that doesn't depend on other calculations. This is obviously hugely parallelisable. – Daniel B May 03 '12 at 18:04
  • The above example should give you an idea of what is meant by "not much communication between tasks" - the result of one calculation doesn't rely on another. Tasks of this nature benefit greatly, and with a bit of clever thinking, you can often turn an algorithm into a parallisable one by switching things around. This is the "analysis" approach I was referring to. Looking at processes from the outside (as you suggest with CPU spikes, etc) and trying to figure out whether they will benefit from parallelisation is a difficult task, however. – Daniel B May 03 '12 at 18:15
  • The difficulty comes in already in the beginning, for you to detect a CPU spike, the task has to run for a significant amount of time (if its much less than the frequency of the measurements taken, which is often 1 second, the spike will be closer to 0). Also, a task might be waiting for other resources and idling (say, waiting for a file write, or packets from the network). This may incorrectly lead you to believe parallelisation is not worthwhile. In isolation though, a long, single core CPU spike is a good indication. – Daniel B May 03 '12 at 18:21
4

There a some cases when you should use parallelism, the simpliest case is to perform big task in background to remain responsive GUI. It is a task parallelism. Another case is when you have to process a lot of data - video decoding, solving optimization problems and so on. Such type of parallelism is called as data parallelism. In data parallelism case a task are processed faster, because all chunks of data are processed independently. But for task parallelism the performance is not always the main purpose. Often you should use it to improve usability.

Aligus
  • 256
  • 1
  • 5
  • So would you say that in most cases I shouldn't bother using parallel programming? – Mithir May 03 '12 at 13:37
  • Yes, of course. Parallel programming is a quite difficult thing and can be cause a lot of mistakes. If you do not need parallelism you should not use it. – Aligus May 03 '12 at 14:47
3

Assuming we're talking about platform applications, I think you will next-to-always end-up needing some form of parallelism as your application grows more complex (as per the reading material recommended by gnat). There is always a large file to load, or a complaint from the customer that some web service dependency is too slow.

However, be wary about using vanilla threads and basic synchronization primitives, because multithreading is hard; instead, try to use higher-level primitives, patterns and components (you will find lots of great stuff to start with in the System.Threading.Tasks and the System.Collections.Concurrent namespaces). More importantly, as the complexity of your solution increases, avoid clever tricks.

Specifically, your friendly neighborhood platform application in C# is likely to have a UI message pump, and stuff executing in the background, in order a) to prevent locking the UI or b) to leverage multiple logical CPUs. A good decision list on "how to parallelize the heavy stuff in the background" would be:

  • if the task is simply not CPU intensive, but may spend a lot of time waiting (ex. for IO or for other tasks), send it to the Thread Pool. A good pattern here is to send to the TP just the call that may make you wait - the framework already has something for that, you will find Begin*/End* methods where you may wait a long time - and to marshal back to some safe context (ex. the UI thread) when the chicken's done. This way, your exposure to parallelism problems is low - and is bound to be ridiculously low in the next versions of .NET, with async and await.

  • inhomogeneous tasks with low granularity - that's usually task parallelism, the Command/Task design pattern is your friend, also, try to borrow from functional programming - avoid mutable and shared data, use continuations - and use the framework (specifically - the TPL, or at the very least a BlockingCollection implementing a producer-consumer);

  • homogeneous tasks with high granularity - that's usually data parallelism, where the data structures and algorithms you deploy make all the difference. Try to isolate the units of work with this profile from the rest of your logic, and consider going purely functional with it - use map-reduce as a good hint.

vski
  • 1,064
  • 5
  • 10
2

All tasks are not created equal. Some tasks lend themselves to parallelism naturally. For example, consider that you are creating a travel website like Kayak. One of the most important thing to do is, use the API's provided by search providers like British Airways, Singapore airways etc., You set of a different thread to different APIs so that, fetching can be done in parallel.If a thread does not return the result in time, abort it(may be). As you can imagine, it is very difficult to do this task without using concurrency.

With processors arriving with multiple cores these days, concurrent programming is becoming more and more relevant.

Vinoth Kumar C M
  • 15,455
  • 23
  • 57
  • 86
1

I use it on server side applications where I have 8 cores at my fingertips. That's a lot of speed improvement of your application works with big data sets.

Still, I only use it when I know there are operations that can be parallelised: hits to the database (using different connections from the connection pool), requests to web services, disk IO (this one is arguable but I go on the presumption that threads will begin work as soon as they get their IO sorted and not have to wait for the other ones to do their work).

All this leads to marginal improvements but over large datasets parallel wins over serial in these conditions.

Still, I only when user interaction is impacted by slow operations as writing the code to perform parallel operations may not be trivial even with plinq.

linkerro
  • 687
  • 3
  • 9
1

Parallelism is a very important tool to have in your belt but to learn it well you must use it a lot until you get the hang of it. You're lucky to be using C# because TPL is the greatest library I've ever seen and combined with LINQ and method extensions it's just killer.

Answering your questions specifically.

When should I use it?

There are basically two reasons you might want to use TPL:

  1. Background processing: you want to make sure that your application is still responsive while doing an operation that might take a while. You must study and understand how to make calls asynchronously.
  2. Parallel processing: you want to make sure several threads are being used to execute your tasks in parallel making your code run faster (which works better on multi-core systems).

TPL will help you greatly in both scenarios (C# 5.0 has some new special tricks that turns asynchronous programming into kid's stuff).

Are there any best practices?

Yes there are. For Windows Forms for example you could use BackgroundWorker because it helps you deal with the fact that no call can be made to UI objects outside of their own threads.

There are many other but they will vary depending on what exactly you're doing (ASP.NET, WPF, Silverlight, WinForms and so on). Most of the time I use MSDN and StackOverflow but the fact is the more you use TPL in general the more you'll gain knowledge and the easier it gets to use it and to know when parallelism is an adequate solution or when it's not.

Is it just a waste of time in most cases? Is it suitable for a specific kind of applications?

There's no universal answer to this question, every project has to be put in perspective and analyzed. There are certainly specific types of applications that are more suitable to parallelism. In parallel processing for example:

  • games
  • number crunching applications
  • image and video processing
  • data processing

In background processing any kind of application that has a UI and manages tasks that take more than a couple seconds. You can find excellent articles about C# 5.0 async in here:

Link

Mobile applications are bringing the attention back to this subject because mobile platforms simply don't have the same level of processing power than desktops and usually users have very short patience when running an app. See this article for example about how background processing can create the impression your application is lightning fast:

http://speakerdeck.com/u/mikeyk/p/secrets-to-lightning-fast-mobile-design?slide=82

If you want to learn, nothing beats trying it over and over until you see the results for yourself (good or bad).

Glorfindel
  • 3,137
  • 6
  • 25
  • 33
Alex
  • 3,228
  • 1
  • 22
  • 25
  • I wonder why this was downvoted... If someone cares to explain – Mithir May 04 '12 at 07:42
  • @Mithir probably for no reason at all. Usually the downvoter is supposed to post a comment you can tell it's uncalled for because there are no comments. If someone cares to comment on the downvote I'll be glad to improve my answer. – Alex May 04 '12 at 13:33
  • Awesome, because my question is downvoted nobody cares to read it so nobody gives it any upvote. And the genius who downvoted it just vanished without leaving a word... awesome! – Alex May 04 '12 at 18:58
  • I gave you an upvote. ;-) because your answer is good. – sivabudh Jun 06 '12 at 16:22
0

Concurrent programming and parallelism within one process through multi-threading is very powerful but also complicated to get right because of all the problems with shared data: deadlocks, lost updates, etc.

However, there are situations in which it can be very useful (or even necessary):

  1. For efficiency: if you have multiple cores you can run different threads on different cores.
  2. For modularity: you split your implementation into different tasks and run them separately.
  3. Because it is a requirement of your application, e.g. you want to fetch data from a server while your application continues to interact with the user.

Regarding 2, I find this approach very effective for implementing workflows in which there are different (virtual) processors that transform streams of data. Processors communicate through some kind of pipes, e.g. the output data stream of processor A is the input stream of processor B.

In this case all the effort in managing concurrency can be localized in the implementation of the data stream class (some kind of thread-safe queue with methods, say, push_front(), pop_back()): you can encapsulate all the management of data concurrency in the implementation of this queue class and then program each workflow node as if it were the only thread running on the system.

Following a similar but more general idea, it might be interesting to have a look at the actor model. In this model you see each thread as an object that sends and receives messages so you do not have to deal with threads directly. Languages that include a standard actor support are Erlang and Scala. Otherwise you need an appropriate library for this, e.g. akka (Scala, Java), Theron (C++). For implementations of the actor model in C#, look at the answers to this question.

Bottom-line, threads can very useful but if you intend to use them in a non-trivial way, I would advise you to use a more abstract approach using some library.

Giorgio
  • 19,486
  • 16
  • 84
  • 135
-1

Code paths that don't do substantial work would not be very useful to execute in parallel. Look at the Amdahl's Law for details.

Satyajit
  • 57
  • 2