40

Part 1

Clearly Immutability minimizes the need for locks in multi-processor programming, but does it eliminate that need, or are there instances where immutability alone is not enough? It seems to me that you can only defer processing and encapsulate state so long before most programs have to actually DO something (update a data store, produce a report, throw an exception, etc.). Can such actions always be done without locks? Does the mere action of throwing out each object and creating a new one instead of changing the original (a crude view of immutability) provide absolute protection from inter-process contention, or are there corner cases which still require locking?

I know a lot of functional programmers and mathematicians like to talk about "no side effects" but in the "real world" everything has a side effect, even if it's the time it takes to execute a machine instruction. I'm interested in both the theoretical/academic answer and the practical/real-world answer.

If immutability is safe, given certain bounds or assumptions, I want to know what the borders of the "safety zone" are exactly. Some examples of possible boundaries:

  • I/O
  • Exceptions/errors
  • Interactions with programs written in other languages
  • Interactions with other machines (physical, virtual, or theoretical)

Special thanks to @JimmaHoffa for his comment which started this question!

Part 2

Multi-processor programming is often used as an optimization technique - to make some code run faster. When is it faster to use locks vs. immutable objects?

Given the limits set out in Amdahl's Law, when can you achieve better over-all performance (with or without the garbage collector taken into account) with immutable objects vs. locking mutable ones?

Summary

I'm combining these two questions into one to try to get at where the bounding box is for Immutability as a solution to threading problems.

GlenPeterson
  • 14,890
  • 6
  • 47
  • 75
  • 24
    `but everything has a side effect` -- Uh, no it doesn't. A function that accepts some value and returns some other value, and doesn't disturb anything outside of the function, has no side effects, and is therefore thread-safe. Doesn't matter that the computer uses electricity. We can talk about cosmic rays hitting memory cells too, if you like, but let's keep the argument practical. If you want to consider things like how the way the function executes affects power consumption, that's a different problem than threadsafe programming. – Robert Harvey Oct 24 '12 at 18:30
  • 5
    @RobertHarvey - Maybe I'm just using a different definition of side-effect and I should have said, "real-world side-effect" instead. Yes, mathematicians have functions without side-effects. Code which executes on a real-world machine takes machine resources to execute, whether it mutates data or not. The function in your example puts its return value on the stack in most machine architectures. – GlenPeterson Oct 24 '12 at 18:46
  • 1
    If you can actually get through it, I think your question goes to the heart of this infamous paper http://research.microsoft.com/en-us/um/people/simonpj/papers/marktoberdorf/mark.pdf – Jimmy Hoffa Oct 24 '12 at 18:46
  • 6
    For purposes of our discussion, I am assuming that you're referring to a Turing-complete machine that is executing some sort of well-defined programming language, *where the implementation details are irrelevant.* In other words, it shouldn't matter what the stack is doing, if the function I'm writing in my programming language of choice can guarantee immutability *within the confines of the language.* I don't think about the stack when I'm programming in a high-level language, nor should I have to. – Robert Harvey Oct 24 '12 at 18:49
  • @JimmyHoffa: I haven't read the paper, but **tl;dr:** *Side effects in Haskell are sequestered within constructs call Nomads.* The Nomad itself (or at least calling it) is considered *purely functional.* – Robert Harvey Oct 24 '12 at 18:53
  • 1
    @RobertHarvey spoonerism; Monads heh And you can gather that from the first couple pages. I mention it because over the course of the whole it details a technique for handling side effects in a practically pure way, I'm pretty sure it would answer Glen's question, so posted it as a good foot note to anyone who finds this question in the future for further reading. – Jimmy Hoffa Oct 24 '12 at 19:01
  • @RobertHarvey Yes, I'm very interested in a purely theoretical answer to this question. I'm *also* very interested in a practical answer where "machine" refers to a real-world physical device consisting of silicone, lead, aluminum, and plastic, which someone relies on to be fast and correct in order to feed their family. I am assuming, of course, that immutability vs. locks has been tested in such a situation. – GlenPeterson Oct 24 '12 at 20:16
  • So are you really asking whether programming languages that claim to make guarantees about immutability are truly "safe?" Are you asking from the point of view of, say, a medical device manufacturer, the [safety of whose machine may depend](http://en.wikipedia.org/wiki/Therac-25) on the correct operation of its software? – Robert Harvey Oct 24 '12 at 20:18
  • 1
    Yes, I suppose that's my question. I'm assuming that the answer is that yes there is safety given certain assumptions. I want to know what the borders of the "safety zone" are exactly. Some examples of possible boundaries: I/O, Exceptions/errors, Interfaces with programs written in other languages, Interfaces with other machines (physical, virtual, or theoretical). I'll ad that to the question. – GlenPeterson Oct 24 '12 at 20:24
  • Your "part two" is unanswerable. It's going to depend entirely on the tradeoff between lock contention and the creation of new immutable objects that can only be used once, and is subject to a raft of indeterminates, not the least of which is the skill of the programmer. – Robert Harvey Oct 24 '12 at 20:49
  • @RobertHarvey - your comments have been very helpful. If you write it up as an answer I'll give you an up-vote. :-) In any case, I appreciate your help. – GlenPeterson Oct 24 '12 at 21:12
  • @JimmyHoffa - That paper is going to be a blast. I had to laugh out loud on page 3, "In short, Haskell is the world’s finest imperative programming language." Ha-ha! He's quite irreverent! – GlenPeterson Oct 24 '12 at 21:14
  • I'll give it some thought and write up a better answer tonight. – Robert Harvey Oct 24 '12 at 21:15
  • @GlenPeteron Yup. That paper is highly recommended. Keep in mind Simon Peyton Jones is one of the main contributors of the Glasgow Haskell Compiler, and one of the designers of Haskell itself. – Andres F. Oct 25 '12 at 13:00
  • If the output of a pure function isn't being fed to a device with side effect, or to another functions whose output are, then the compiler is free to optimize that function away. In a way a function that is totally free of side effect in sense is just heating up the machine. – Lie Ryan Oct 31 '14 at 14:32

5 Answers5

38

This is an oddly phrased question that is really, really broad if answered fully. I'm going to focus on clearing up some of the specifics that you're asking about.

Immutability is a design trade off. It makes some operations harder (modifying state in large objects quickly, building objects piecemeal, keeping a running state, etc.) in favor of others (easier debugging, easier reasoning about program behavior, not having to worry about things changing underneath you when working concurrently, etc.). It's this last one we care about with this question, but I want to emphasize that it is a tool. A good tool that often solves more problems than it causes (in most modern programs), but not a silver bullet... Not something that changes the intrinsic behavior of programs.

Now, what does it get you? Immutability gets you one thing: you can read the immutable object freely, without worrying about its state changing underneath you (assuming it is truly deeply immutable... Having an immutable object with mutable members is usually a deal breaker). That's it. It frees you from having to manage concurrency (via locks, snapshots, data partitioning or other mechanisms; the original question's focus on locks is... Incorrect given the scope of the question).

It turns out though that lots of things read objects. IO does, but IO itself tends to not handle concurrent use itself well. Almost all processing does, but other objects may be mutable, or the processing itself might use state that is not friendly to concurrency. Copying an object is a big hidden trouble point in some languages since a full copy is (almost) never an atomic operation. This is where immutable objects help you.

As for performance, it depends on your app. Locks are (usually) heavy. Other concurrency management mechanisms are faster but have a high impact on your design. In general, a highly concurrent design that makes use of immutable objects (and avoids their weaknesses) will perform better than a highly concurrent design that locks mutable objects. If your program is lightly concurrent then it depends and/or doesn't matter.

But performance should not be your highest concern. Writing concurrent programs is hard. Debugging concurrent programs is hard. Immutable objects help improve your program's quality by eliminating opportunities for error implementing concurrency management manually. They make debugging easier because you're not trying to track state in a concurrent program. They make your design simpler and thus remove bugs there.

So to sum up: immutability helps but will not eliminate challenges needed to handle concurrency properly. That help tends to be pervasive, but the biggest gains are from a quality perspective rather than performance. And no, immutability does not magically excuse you from managing concurrency in your app, sorry.

GlenPeterson
  • 14,890
  • 6
  • 47
  • 75
Telastyn
  • 108,850
  • 29
  • 239
  • 365
  • +1 This makes sense, but could you give an example of where in a deeply immutable language you still have to worry about handling concurrency properly? You state that you do, but such a scenario isn't clear to me – Jimmy Hoffa Oct 25 '12 at 04:54
  • @JimmyHoffa In an immutable language you still need someway to update state between threads. The two most immutable languages I know (Clojure and Haskell) provide a reference type(atoms and Mvars) that provide a way to ship modified state between threads. The semantics of their ref types prevent certain types of concurrency errors, but others are still possible. – stonemetal Oct 25 '12 at 14:04
  • @stonemetal interesting, in my 4 months with Haskell I had not even heard of Mvars yet, I always heard to use STM for concurrency state communication which behaves more like Erlang's message passing I thought. Though perfect example of immutability not solving concurrent issues I can think of is updating a UI, if you have 2 threads trying to update a UI with different versions of data, one may be newer and therefore needs to get the second update so you have a race condition where you must guarantee the sequencing somehow.. Interesting thought.. Thanks for the details – Jimmy Hoffa Oct 25 '12 at 14:35
  • 1
    @jimmyhoffa - The most common example is IO. Even if the language is immutable, your database/website/file is not. Another is your typical map/reduce. Immutability means that the aggregation of the map is more foolproof, but you still need to handle the 'once all the map is done in parallel, reduce' coordination. – Telastyn Oct 25 '12 at 17:15
  • 1
    @JimmyHoffa: `MVar`s are a low-level mutable concurrency primitive (technically, an immutable reference to a mutable storage location), not too different from what you'd see in other languages; deadlocks and race conditions are very possible. STM is a high-level concurrency abstraction for lock-free mutable shared memory (very different from message-passing) that allows composable transactions with no possibility of deadlocks or race conditions. Immutable data is just thread-safe, nothing else to say about it. – C. A. McCann Oct 25 '12 at 17:48
15

A function that accepts some value and returns some other value, and doesn't disturb anything outside of the function, has no side effects, and is therefore thread-safe. If you want to consider things like how the way the function executes affects power consumption, that's a different problem.

I am assuming that you're referring to a Turing-complete machine that is executing some sort of well-defined programming language, where the implementation details are irrelevant. In other words, it shouldn't matter what the stack is doing, if the function I'm writing in my programming language of choice can guarantee immutability within the confines of the language. I don't think about the stack when I'm programming in a high-level language, nor should I have to.

To illustrate how this works, I'm going to offer a few simple examples in C#. In order for these examples to be true, we have to make a couple of assumptions. First, that the compiler follows the C# specification without error, and second, that it produces correct programs.

Let's say I want a simple function that accepts a string collection, and returns a string that is a concatenation of all of the strings in the collection separated by commas. A simple, naïve implementation in C# might look like this:

public string ConcatenateWithCommas(ImmutableList<string> list)
{
    string result = string.Empty;
    bool isFirst = false;

    foreach (string s in list)
    {
        if (isFirst)
            result += s;
        else
            result += ", " + s;
    }
    return result;
} 

This example is immutable, prima facie. How do I know that? Because the string object is immutable. However, the implementation is not ideal. Because result is immutable, a new string object has to be created each time through the loop, replacing the original object that result points to. This can negatively affect speed and put pressure on the garbage collector, since it has to clean up all of those extra strings.

Now, let's say I do this:

public string ConcatenateWithCommas(ImmutableList<string> list)
{
    var result = new StringBuilder();
    bool isFirst = false;

    foreach (string s in list)
    {
        if (isFirst)
            result.Append(s);
        else
            result.Append(", " + s);
    }
    return result.ToString();
} 

Notice that I've replaced string result with a mutable object, StringBuilder. This is much faster than the first example, because a new string is not created each time through the loop. Instead, the StringBuilder object merely adds the characters from each string to a collection of characters, and outputs the whole thing at the end.

Is this function immutable, even though StringBuilder is mutable?

Yes, it is. Why? Because each time this function is called, a new StringBuilder is created, just for that call. So now we have a pure function that is thread-safe, but contains mutable components.

But what if I did this?

public class Concatenate
{
    private StringBuilder result = new StringBuilder();
    bool isFirst = false;

    public string ConcatenateWithCommas(ImmutableList<string> list)
    {
        foreach (string s in list)
        {
            if (isFirst)
                result.Append(s);
            else
                result.Append(", " + s);
        }
        return result.ToString();
    } 
}

Is this method thread-safe? No, it isn't. Why? Because the class is now holding state on which my method depends. A race condition is now present in the method: one thread may modify IsFirst, but another thread may perform the first Append(), in which case I now have a comma at the beginning of my string which is not supposed to be there.

Why might I want to do it like this? Well, I might want the threads to accumulate the strings into my result without regard to order, or in the order that the threads come in. Maybe it's a logger, who knows?

Anyway, to fix it, I put a lock statement around the method's innards.

public class Concatenate
{
    private StringBuilder result = new StringBuilder();
    bool isFirst = false;
    private static object locker = new object();

    public string AppendWithCommas(ImmutableList<string> list)
    {
        lock (locker)
        {
            foreach (string s in list)
            {
                if (isFirst)
                    result.Append(s);
                else
                    result.Append(", " + s);
            }
            return result.ToString();
        }
    } 
}

Now it's thread-safe again.

The only way that my immutable methods could possibly fail to be thread-safe is if the method somehow leaks part of its implementation. Could this happen? Not if the compiler is correct and the program is correct. Will I ever need locks on such methods? No.

For an example of how implementation could possibly be leaked in a concurrency scenario, see here.

Robert Harvey
  • 198,589
  • 55
  • 464
  • 673
  • 3
    Unless I'm mistaken, because a `List` is mutable, in the first function you claimed 'pure', another thread could remove all of the elements from the list or add a bunch more while it's in the foreach loop. Not certain how that would play with the `IEnumerator` being `while(iter.MoveNext())`ed, but unless the `IEnumerator` is immutable (doubtful) then that would threaten to thrash the foreach loop. – Jimmy Hoffa Oct 25 '12 at 04:45
  • True, you have to assume that the collection is never written to while threads are reading from it. That would be a valid assumption, if each thread calling the method builds its own list. – Robert Harvey Oct 25 '12 at 14:43
  • I don't think you can call it 'pure' when it has that mutable object in it that it is using by reference. If it received an IEnumerable you *might* be able to make that claim because you can't add or remove elements from an IEnumerable, but then it could be an Array or a List handed in as IEnumerable so the IEnumerable contract doesn't guarantee any form of purity. The real technique to make that function pure would be immutability with pass-by-copy, C# doesn't do this so you would have to copy the List right when the function receives it; but the only way to do that is with a foreach on it... – Jimmy Hoffa Oct 25 '12 at 14:54
  • @JimmyHoffa: I solved the problem by adding a wrapper around List, making it immutable. – Robert Harvey Oct 25 '12 at 15:23
  • Just call that class Enumerable, it doesn't inherit IList, I don't think you could implement IList immutably because add/delete's would effect the internal state of the IList which is passed-by-reference, unless you implemented it on a struct to get rid of the pass-by-reference semantics. (If you could implement IList on a struct?) – Jimmy Hoffa Oct 25 '12 at 15:41
  • OK, It's obviously possible to write an `ImmutableList` (maybe not trivially with a `List`), but I punted on the code for that, since writing an `ImmutableList` class is really beyond the scope of the answer anyway. – Robert Harvey Oct 25 '12 at 16:12
  • Not to be argumentative, but thinking about it I really can't come up with a way to implement the IList interface immutably. Using locks I could think of how to make it thread-safe sure, but immutable I can't figure out. The reason String pulls it off is all the replace/remove/concat/etc return a new copy rather than effecting the referenced string, IList contract's add/delete don't return new copies; they have to alter the objects internal state and that object may be referenced from multiple places because it's not a value type. I would genuinely like to see an ImmutableList if it's possible – Jimmy Hoffa Oct 25 '12 at 16:33
  • The way you do it is you throw a `NotImplementedException` in the implementations for those `IList` methods that perform a write, such as `Add()`; I admit there may be some other intricacies involved. In this hypothetical example, `ImmutableList` only implements an Enumerator, and cannot be written to once it is constructed. – Robert Harvey Oct 25 '12 at 16:37
  • 1
    @JimmyHoffa: Dammit, you got me obsessed over this chicken and egg problem! If you see a solution anywhere, please let me know. – Robert Harvey Oct 25 '12 at 22:04
  • @RobertHarvey Thanks for writing this up. I really appreciate your examples. They really clarified things for me - so much that I wanted to rewrite my question. Telastyn had the perspective I was looking for: "immutability does not magically excuse you from managing concurrency in your app" so I chose that one, but I would have accepted two equally good answers (meaning yours too) if I could have. Thanks again for all your effort. – GlenPeterson Oct 26 '12 at 12:12
  • 1
    Just came across this answer now and it's one of the best explanations on the topic I've come across, the examples are super concise and really make it easy to grok. Thanks! – Stephen Byrne Dec 09 '15 at 21:04
  • Two things: Shouldn't `isFirst` set to `true` at the beginning and then `if (isFirst) { result += s; isFirst = false; }`? Other one is, shouldn't we use `result.Append(", ").Append(s);` instead of `result.Append(", " + s)`? – Utku Nov 02 '20 at 17:01
4

I'm unsure if I understood your questions.

IMHO the answer is yes. If all your objects are immutable, then you don't need any locks. But if you need to preserve a state (e.g. you implement a database or you need to aggregate the results from multiple threads) then you need to use mutability and therefore also locks. Immutability eliminates the need for locks, but usually you cannot afford to have completely immutable applications.

Answer to part 2 - locks should be always slower than no locks.

Maros
  • 61
  • 2
  • 3
    Part two is asking "What is the performance tradeoff between locks and immutable structures?" It probably deserves its own question, if it's even answerable. – Robert Harvey Oct 24 '12 at 21:14
4

Encapsulating a bunch of related state in a single mutable reference to an immutable object can make it possible for many kinds of state modification to be performed lock-free using the pattern:

do
{
   oldState = someObject.State;
   newState = oldState.WithSomeChanges();
} while (Interlocked.CompareExchange(ref someObject.State, newState, oldState) != oldState;

If two threads both try to update someObject.state simultaneously, both objects will read the old state and determine what the new state would be without each other's changes. The first thread to execute the CompareExchange will store what it thinks the next state should be. The second thread will find that the state no longer matches what it had previously read, and will thus re-compute the proper next state of the system with the first thread's changes taken into effect.

This pattern has the advantage that a thread which gets waylaid cannot block the progress of other threads. It has the further advantage that even when there is heavy contention, some thread will always be making progress. It has the disadvantage, though, that in the presence of contention a lot of threads may spend a lot of time doing work which they will end up discarding. For example, if 30 threads on separate CPUs all try to change an object simultaneously, the one will succeed on its first attempt, one on its second, one on its third, etc. so that each thread ends up on average making about 15 attempts to update its data. Using an "advisory" lock can improve things significantly: before a thread attempts an update, it should check whether a "contention" indicator is set. If so, it should acquire a lock before doing the update. If a thread makes a few unsuccessful attempts at an update, it should set the contention flag. If a thread which tries to acquire the lock finds there was nobody else waiting, it should clear the contention flag. Note that the lock here is not required for "correctness"; code would work correctly even without it. The purpose of the lock is to minimize the amount of time code spends on operations which are not likely to succeed.

supercat
  • 8,335
  • 22
  • 28
4

You start with

Clearly Immutability minimizes the need for locks in multi-processor programming

Wrong. You need to carefully read the documentation for every class that you use. For example, const std::string in C++ is not thread safe. Immutable objects can have internal state that changes when accessing them.

But you are looking at this from a totally wrong point of view. It doesn't matter whether an object is immutable or not, what matters is whether you change it. What you are saying is like saying "if you never take a driving test, you can never lose your driving license for drunk driving". True, but rather missing the point.

Now in the example code someone wrote with a function named "ConcatenateWithCommas": If the input was mutable and you used a lock, what would you gain? If someone else tries to modify the list while you try to concatenate the strings, a lock can keep you from crashing. But you still don't know whether you concatenate the strings before or after the other thread changed them. So your result is rather useless. You have a problem that isn't related to locking and cannot be fixed with locking. But then if you use immutable objects, and the other thread replaces the whole object with a new one, you are using the old object and not the new object, so your result is useless. You have to think about these problems at an actual functional level.

gnasher729
  • 51
  • 1
  • 2
    `const std::string` is a poor example and a bit of a red herring. C++ strings are mutable, and `const` cannot guarantee immutability anyway. All it does is say only `const` functions may be called. However, those functions can still alter internal state, and the `const` can be cast away. Finally, there is the same issue as any other language: just because _my_ reference is `const` does not mean _your_ reference is too. No, a truly immutable data structure should be used. –  Oct 01 '15 at 12:05