40

Even languages where you have explicit pointer manipulation like C it's always passed by value (you can pass them by reference but that's not the default behavior).

What is the benefit of this, why are so many languages passed by values and why are others passed by reference ? (I understand Haskell is passed by reference though I'm not sure).

  • 5
    `void acceptEntireProgrammingLanguageByValue(C++);` – Thomas Eding Jun 19 '12 at 23:24
  • 4
    It could be worse. Some old languages also allowed for [call by name](https://en.wikipedia.org/wiki/Call_by_name#Call_by_name) – hugomg Jun 19 '12 at 23:50
  • 27
    Actually, in C you can't pass by reference. You can *pass a pointer by value*, which is very similar to pass-by-reference, but not the same thing. In C++, though, you can pass by reference. – Mason Wheeler Jun 20 '12 at 00:27
  • @Mason Wheeler can you elaborate more or add some link, beacuase your statement is not clear to me as I'm not C/C++ guru, just a regular programmer, thanks – Betlista Nov 09 '12 at 11:38
  • 1
    @Betlista: With pass by reference, you can write a swap routine that looks like this: `temp := a; a := b; b := temp;` And when it returns, the values of `a` and `b` will be swapped. There's no way to do that in C; you have to pass pointers to `a` and `b` and the `swap` routine has to act on the values they point to. – Mason Wheeler Nov 09 '12 at 11:54
  • @Mason Wheeler: thanks for reply, but I know that, but it is same in C and C++ isn't it? – Betlista Nov 10 '12 at 13:45
  • @Betlista `void swap(int &a, int &b) {temp = a; a = b; b = temp;}` is a legal function in C++ and does what it looks like it does. That same function is illegal in C because `int &a` in the parameter declaration is not legal. – 8bittree Oct 05 '16 at 17:14
  • 1
    Haskell is not pass by reference. Haskell is a pure functional language - thus, it has proper variables and uses the substitution model of evaluation. Maybe you're thinking of "call-by-need", which is orthogonal. – gardenhead Oct 06 '16 at 03:11

7 Answers7

64

Pass by value is often safer than pass by reference, because you cannot accidentally modify the parameters to your method/function. This makes the language simpler to use, since you don't have to worry about the variables you give to a function. You know they won't be changed, and this is often what you expect.

However, if you want to modify the parameters, you need to have some explicit operation to make this clear (pass in a pointer). This will force all your callers to make the call slightly differently (&variable, in C) and this makes it explicit that the variable parameter may be changed.

So now you can assume that a function will not change your variable parameter, unless it is explicitly marked to do so (by requiring you to pass in a pointer). This is a safer and cleaner solution than the alternative: Assume everything can change your parameters, unless they specifically say they can't.

Martijn
  • 103
  • 4
Oleksi
  • 11,874
  • 2
  • 53
  • 54
63

Call-by-value and call-by-reference are implementation techniques that were mistaken for parameter-passing modes a long time ago.

In the beginning, there was FORTRAN. FORTRAN only had call-by-reference, since subroutines had to be able to modify their parameters, and computing cycles were too expensive to allow multiple parameter-passing modes, plus not enough was known about programming when FORTRAN was first defined.

ALGOL came up with call-by-name and call-by-value. Call-by-value was for things that were not supposed to be changed (input parameters). Call-by-name was for output parameters. Call-by-name turned out to be a major crock, and ALGOL 68 dropped it.

PASCAL provided call-by-value and call-by-reference. It did not provide any way for the programmer to tell the compiler that he was passing a large object (usually an array) by reference, to avoid blowing the parameter stack, but that the object should not be changed.

PASCAL added pointers to the language design lexicon.

C provided call-by-value, and simulated call-by-reference by defining a kludge operator to return a pointer to an arbitrary object in memory.

Later languages copied C, mostly because the designers had never seen anything else. This is probably why call-by-value is so popular.

C++ added a kludge on top of the C kludge to provide call-by-reference.

Now, as a direct result of call-by-value vs. call-by-reference vs. call-by-pointer-kludge, C and C++ (programmers) have horrible headaches with const pointers and pointers to const (read-only) objects.

Ada managed to avoid this whole nightmare.

Ada does not have explicit call-by-value vs. call-by-reference. Rather, Ada has in parameters (which may be read but not written), out parameters (which MUST be written before they can be read), and in out parameters, which may be read and written in any order. The compiler decides whether a particular parameter is passed by value or by reference: it is transparent to the programmer.

John R. Strohm
  • 18,043
  • 5
  • 46
  • 56
  • 1
    +1. This is the only answer I've seen on here so far that actually answers the question and makes sense and does not use it as a transparent excuse for a pro-FP rant. – Mason Wheeler Jun 20 '12 at 00:37
  • 12
    +1 for "Later languages copied C, mostly because the designers had never seen anything else." Every time I see a new language with 0-prefixed octal constants I die a little inside. – librik Jun 20 '12 at 11:58
  • Also, not sure about early Pascal, but in modern Pascal (Delphi, etc) you can pass a large object such as an array by reference but keep it immutable by using the `const` keyword on the parameter. – Mason Wheeler Jun 20 '12 at 13:12
  • 4
    This could be the first sentence of the Bible of programming: `In the beginning, there was FORTRAN`. –  Jan 18 '13 at 15:06
  • @MattFenwick Or `In the beginning, there was assembler.` – sashoalm Nov 28 '13 at 14:36
  • @ satuon kind of missing the point, aren't you? –  Dec 02 '13 at 18:42
  • 1
    With regard to ADA, if a global variable is passed to an in/out parameter, will the language prevent a nested call from examining or modifying it? If not, I would think there would be a clear difference between in/out and ref parameters. Personally, I'd like to see languages explicitly support all combinations of "in/out/in+out" and "by value/by ref/don't-care", so programmers could offer maximum clarity as to their intention but optimizers would have maximum flexibility in implementation [so long as it jibed with programmer intent]. – supercat Jun 12 '14 at 15:32
  • @supercat: No. In/out parameters are by definition readable and writable by the called routine. – John R. Strohm Jun 12 '14 at 20:13
  • @JohnR.Strohm: If global variable `foo` which equals 2 is passed as parameter `moo` to method `bar` using genuine value-in-out semantics, then and if method `bar` multiplies `foo` by 3 and `moo` by 5, then just before `bar` exits, `foo` will be 6 and `moo` will be 10; when it exits, `foo` will be set to 10. Had genuine reference semantics been used, `foo` and `moo` would be aliases to the same variable, which would be equal to 30. Would ADA in-out parameters be specified as using the value-in-out behavior, the reference behavior, or neither? – supercat Jun 12 '14 at 20:23
  • @supercat: Deliberate aliasing. Undefined behavior. You' – John R. Strohm Jun 12 '14 at 23:04
  • @supercat: You're on your own. At best. Some places, you'll get a free spa treatment, complete with tar and feathers. – John R. Strohm Jun 12 '14 at 23:12
  • @JohnR.Strohm: Do you mean C-ish Undefined Behavior (nasal demons), or would a compiler be limited to a fixed set of choices (e.g. compiler may at its leisure refuse compilation, behave with reference semantics, behave with value in/out semantics, or trigger an exception upon use of the aliased value)? Nasal demons would seem contrary to the ADA design objectives, but I'm not really familiar with the language. – supercat Jun 13 '14 at 16:00
  • In the early days of FORTRAN, addresses were smaller than floating-point numbers, so even if a called function didn't need to modify its parameters it would generally be cheaper to pass the addresses of floating-point variables (creating hidden variables for constants) than to pass the actual values. – supercat Oct 05 '16 at 18:15
  • @supercat Parameter-passing methods in the early days of FORTRAN were heavily machine-dependent. There was a lot more variation in machine architecture than is seen today. – John R. Strohm Oct 05 '16 at 20:34
  • @librik is that because it is too easy to do the wrong thing or why is that bad? – Neikos Oct 06 '16 at 11:09
  • @Neikos if you refer to the "0-prefixed octal constants" comment, that is indeed a common (and completely unnecessary) pitfall for programmers. *Any* other symbol would have been less error prone (ok, perhaps some kind of whitespace or punctuation might have been even worse, but...) – Hulk Oct 06 '16 at 13:36
  • @Neikos: `int i = 001; int j = 010; int k = 100;` - find out where that value `8` comes from. – Hulk Oct 06 '16 at 13:43
  • @Neikos It can lead to unexpected behavior. `070` and `080` look like they should have a difference of either 8 or 10, depending on whether you read them as octal or decimal, but since `8` is not an octal digit, they often either end up with a difference of 14, or `080` is a syntax error. See [this question](http://programmers.stackexchange.com/q/98692/121035) for an example of someone who was tripped up when they tried to zero-pad some numbers. – 8bittree Oct 06 '16 at 13:45
  • Well I am familiar with Octal notation in C like languages and it hasn't happened to me so far (that I know of), but I can totally understand that it can be surprising (and thus a pitfall). I do prefer the '0' notation that has become somewhat popular lately. – Neikos Oct 06 '16 at 13:46
  • 2
    @Neikos There's also the fact that the `0` prefix is inconsistent with every other non-base ten prefix. That may be somewhat excusable in C (and mostly source-compatible languages, e.g. C++, Objective C), if octal was the *only* alternative base when introduced, but when more modern languages use both `0x` and `0` from the start, it just makes them look poorly thought out. – 8bittree Oct 06 '16 at 14:00
  • @supercat: For Ada, undefined is undefined. In such a circumstance, the Ada compiler may or may not detect the offending code, and it may or may not do something that the compiler writers consider appropriate, and whether what it does or does not do is something that the programmer, the author of the bogus code, would consider appropriate is completely irrelevant. The application programmer uttered the bogus aliasing code, so the application programmer suffers the consequences. – John R. Strohm May 04 '17 at 14:58
  • @supercat: It is widely accepted in the high-reliability programming world that code that commits deliberate aliasing is a Very Bad Thing. Your question suggests that you want to know what happens WHEN, not if but WHEN, a programmer commits such a faux pas. Why do you defend such a practice? – John R. Strohm May 04 '17 at 15:01
  • @JohnR.Strohm: In "modern C", code like `if (should_launch_missiles()) { arm_missiles(); if (should_launch_missiles()) { launch_missiles(); } disarm_missiles();` could get turned into `should_launch_missiles(); arm_missiles(); should_launch_missiles(); launch_missiles();` if code has received inputs that would make UB inevitable in all situations that would reach `disarm_missiles()`. Even if there would be no plausible way that the actions performed by `arm_missiles()` and `launch_missiles()` could occur except when those two functions are executed in sequence without... – supercat May 04 '17 at 15:29
  • ...an intervening call to `disarm_missiles()`, and even though code as written would provide no path by which they could occur in such order without an intervening check of `should_launch_missiles()`, such safety code could get undermined by compiler "optimization". I don't think Ada compilers do such things, do they? – supercat May 04 '17 at 15:30
  • @JohnR.Strohm: As for aliasing, I agree that in most cases where a function has an in-out parameter it shouldn't care whether it uses copy-in/copy-out or reference semantics, but there are some patterns where it may be helpful. Consider, for example, cases where code is supposed to be able to write data to something that behaves like a stream that can accept records from multiple sources. It would be rather irksome if passing a stream to a function meant that nothing else could use that stream at all until the function returned, except through the stream reference passed to that function. – supercat May 04 '17 at 15:41
  • @supercat: Arming and launching weapons are two very different operations, with different preconditions. I have never seen a system that combines those operations, to make an armed launch a one-step operation, the way your proposed example code envisions. I am not really at liberty to go into much more detail than that. – John R. Strohm May 04 '17 at 17:13
  • @JohnR.Strohm: It was not intended as a literal example how code to arm and launching actual missiles but there are many kinds of systems which would require a processor has to perform two operations to trigger some potentially-dangerous event, and where there would be no way for the code invoking the preparatory step to be executed without being immediately followed by code that tests all of the preconditions for the event, and cancels preparation if they don't. If code can only execute from write-only storage, it may be impossible for even a disruption that arbitrarily changes RAM... – supercat May 04 '17 at 18:27
  • ...to trigger the event except when the proper conditions apply unless it is followed by another such disruption at just the right time Normally, it would be harder to imagine any behavioral spec looser than "arbitrarily change all RAM to hold the most vexing combination of bits", but code which would be fail-safe even in the presence of arbitrary RAM corruption events could be undermined by an optimizing compiler that regards it as "redundant". – supercat May 04 '17 at 18:32
  • Note that early FORTRAN did not specifically say "pass by reference" and some systems apparently used value-result. This produces different semantics, observable by passing the same variable twice to a function with two different dummy arguments, but this was declared invalid FORTRAN so you were not officially able to tell whether your system used by-ref or value-result. – torek Apr 16 '22 at 06:38
  • @torek, you are correct and that is a very good observation. It has been a very long time since I've seen value-result parameter handling. The problem with it is that the compiler has to go through a lot of gyrations to ensure that it will work correctly. It must assume that ANY parameter can be changed (separate compilation and library linking), and decide in the calling routine whether to store the value returned, or not. This decision must be conservative: if it isn't sure, it must store, to err on the safe side. – John R. Strohm Apr 16 '22 at 08:50
  • Call by name is much more powerful. Call by name arguments are what today would be called lambdas. – gnasher729 Mar 25 '23 at 08:18
13

Pass by reference allows for very subtle unintended side effects that are very difficult to next to impossible to trace down when they start causing the unintended behavior.

Pass by value, especially final, static or const input parameters makes this entire class of bugs disappear.

Immutable languages are even more deterministic and easier to reason about and understand what is going in and what is expected to be coming out of a function.

  • 7
    It's one of those things that you figure out after trying to debug 'ancient' VB code where everything is by reference as default. – jfrankcarr Jun 19 '12 at 19:59
  • Maybe it's just that my background is different, but I have no idea what sort of "very subtle unintended side effects that are very difficult to next to impossible to trace down" you're talking about. I'm used to Pascal, where by-value is the default but by-reference can be used by explicitly marking the parameter, (basically the same model as C#,) and I've never had that cause problems for me. I can see how it would be problematic in "ancient VB" where by-ref is the default, but when by-ref is opt-in, it makes you think about it while you're writing it. – Mason Wheeler Jun 20 '12 at 00:32
  • 4
    @MasonWheeler what about newbies that come along behind you and just copy what you did without understanding it and start manipulating that state all over the place indirectly, real world happens all the time. Less indirection is a good thing. Immutable is best. –  Jun 20 '12 at 00:47
  • 1
    @MasonWheeler The simple alias examples, like the `Swap` hacks that don't use a temporary variable, though not likely to be a problem themselves, show how hard a more complex example might be to debug. – Mark Hurd Jun 28 '12 at 02:16
  • 2
    @MasonWheeler: The classic example in FORTRAN, which led to the saying "Variables won't; constants aren't", would be passing a floating-point constant like 3.0 to a function which modifies the passed-in parameter. For any floating-point constant that was passed to a function the system would create a "hidden" variable that was initialized with the proper value and could be passed to functions. If the function added 1.0 to its parameter, an arbitrary subset of 3.0's in the program could suddenly become 4.0's. – supercat Oct 05 '16 at 18:10
7

Why are so many languages passed by value?

The point of breaking up large programs into small subroutines is that you can reason about the subroutines independently. Pass-by-reference breaks this property. (As does shared mutable state.)

Even languages where you have explicit pointer manipulation like C it's always passed by value (you can pass them by reference but that's not the default behavior).

Actually, C is always pass-by-value, never pass-by-reference. You can take an address of something and pass that address, but the address will still be passed by value.

What is the benefit of this, why are so many languages passed by values and why are others passed by reference ?

There are two main reasons for using pass-by-reference:

  1. simulating multiple return values
  2. efficiency

Personally, I think #1 is bogus: it's almost always an excuse for bad API and/or language design:

  1. If you need multiple return values, don't simulate them, just use a language which supports them.
  2. You can just as well simulate multiple return values by packaging them into some lightweight data structure such as a tuple. This works particularly well if the language supports pattern matching or destructuring bind. E.g. Ruby:

    def foo
      # This is actually just a single return value, an array: [1, 2, 3]
      return 1, 2, 3
    end
    
    # Ruby supports destructuring bind for arrays: a, b, c = [1, 2, 3]
    one, two, three = foo
    
  3. Oftentimes, you don't even need multiple return values. For example, one popular pattern is that the subroutine returns an error code and the actual result is written back by reference. Instead, you should just throw an exception if the error is unexpected or return an Either<Exception, T> if the error is expected. Another pattern is to return a boolean which tells whether the operation was successful and return the actual result by reference. Again, if the failure is unexpected, you should throw an exception instead, if the failure is expected, e.g. when looking up a value in a dictionary, you should return a Maybe<T> instead.

Pass-by-reference may be more efficient than pass-by-value, because you don't have to copy the value.

(I understand Haskell is passed by reference though I'm not sure).

No, Haskell is not pass-by-reference. Nor is it pass-by-value. Pass-by-reference and pass-by-value are both strict evaluation strategies, but Haskell is non-strict.

In fact, the Haskell Specification doesn't specify any particular evaluation strategy. Most Hakell implementations use a mixture of call-by-name and call-by-need (a variant of call-by-name with memoization), but the standard doesn't mandate this.

Note that even for strict languages, it doesn't make sense to distinguish between pass-by-reference or pass-by-value for functional languages since the difference between them can only be observed if you mutate the reference. Therefore, the implementor is free to chose between the two, without breaking the semantics of the language.

Jörg W Mittag
  • 101,921
  • 24
  • 218
  • 318
  • 9
    "If you need multiple return values, don't simulate them, just use a language which supports them." This is a bit of a strange thing to say. It's basically saying "if your language can do everything you need, but one feature that you'll probably need in less than 1% of your code--but you still **will** need it for that 1%--can't be done in a particularly clean way, then your language isn't good enough for your project and you should rewrite the whole thing in another language." Sorry, but that's just plain ridiculous. – Mason Wheeler Jun 20 '12 at 00:35
  • +1 I totally agree with this point _The point of breaking up large programs into small subroutines is that you can reason about the subroutines independently. Pass-by-reference breaks this property. (As does shared mutable state.)_ – Rémi May 01 '14 at 20:42
3

There is different behavior depending on the calling model of the language, the type of arguments and the memory models of the language.

With simple native types, passing by value allow you to pass the value by the registers. That can be very fast since the value aren't need to be loaded from memory neither save back. Similar optimization is also possible simply by reusing the memory used by the argument once the callee is done with it, without risking to mess with the caller copy of the object. If the argument was a temporary object, you would have probably save a full copy doing so (C++11 make this kind of optimization even more obvious with the new right-reference and its move semantic).

In a lot of OO languages (C++ is more an exception in this context), you cannot passed an object by value. You are force to pass it by reference. This make the code polymorphic by default and are more inline with the notion of instances proper to OO. Also, if you want to pass by value, you must make a copy yourself, acknowledging the cost of performance that generated such action. In this case, the language choose for you the approach that is more likely to give you the best performance.

In case of functional languages, I guess passing by value or reference is simply a question of optimization. Since the functions in such language are pure and so free of side effects, there is really no reason to copy the value except for speed. I'm even pretty sure that such language often shared the same copy of objects with the same values, a possibility available only if you used a pass by (const) reference semantic. Python also used this trick for integer and common strings (like methods and class names), explaining why integer and strings are constant objects in Python. This also help to optimize the code again, by allowing for example pointer comparaison instead of content comparaison, and doing lazy evaluation of some internal data.

2

If you pass by reference, then you are effectively always working with global values, with all of the problems of globals (namely scope and unintended side effects).

References, just like globals, are sometimes beneficial, but they shouldn't be your first choice.

jmoreno
  • 10,640
  • 1
  • 31
  • 48
2

You can pass an expression by value, it's natural. Passing expression by (temporary) reference is... weird.

herby
  • 2,734
  • 1
  • 18
  • 25
  • 1
    Also, passing expression by temporary reference can lead to bad bugs (in stateful language), when you happily change it (since it's just temporary), but then it backfires when you actually pass a variable, and you must devise ugly workaround like passing foo+0 instead of foo. – herby Jun 19 '12 at 21:23