What are the benefits of referential transparency to a programmer?

Question

In programming, what are the benefits of referential transparency?

RT makes one of the major differences between functional and imperative paradigms, and is often used by advocates of the functional paradigm as a clear advantage over the imperative one; but in all of their efforts, these advocates never explain why it is a benefit to me as a programmer.

Sure, they'll have their academic explanations to how "pure" and "elegant" it is, but how does it make it better than a less "pure" code? How does it benefit me in my day-to-day programming?

Note: This is not a duplicate of What is referential transparency? The latter addresses the topic of what is RT, while this question adressses its benefits (which may not be so intuitive).

Related: http://stackoverflow.com/questions/210835/what-is-referential-transparency/9859966#9859966 — meriton, Nov 13 '16 at 19:18
See also: http://softwareengineering.stackexchange.com/questions/254304/what-is-referential-transparency/254306 — Giorgio, Nov 13 '16 at 19:25
Referential transparecy allows you to use *equational reasoning* to: 1) Prove properties of the code and 2) *write* programs. There are a few books about Haskell where the authors should how you can start from some equations that you want a function to fullfil and using just equational reasoning you end up obtain an implementation of said function, which is, hence, certainly correct. Now how much this can be applied in "day-to-day" programming probably depends on the context... — Bakuriu, Nov 13 '16 at 20:07
@err Do you like code that's easier to refactor because you know whether calling a function twice is the same as storing its value in a variable, then using said variable twice? Would you say that's a benefit for your day to day programming? — Andres F., Nov 13 '16 at 21:22
The benefit is you don't need to waste time thinking about referencial nontransparency. Kinda like how the benefits of variables are that you don't need to waste time thinking about register allocation. — user253751, Nov 13 '16 at 23:26
@Bakuriu I'm not familiar with the concept of _equational reasoning_, so I looked it up, and I find it hard to think of a way to incorporate it in my day-to-day programming. Could you perhaps give examples? Where did it serve you outside the scope of purely abstract discussions? — Eyal Roth, Nov 14 '16 at 15:49
@AndresF I like code that's easier to refactor because it has tests :) — Eyal Roth, Nov 14 '16 at 15:52
@errr That's an orthogonal concept, not an either-or proposition ("I don't care about meaningful identifiers, I care about code that has tests"). Besides, tests are rarely done at the expression level. A benefit of better programming languages is that they let you focus on the tests *that matter*, as opposed to busy-work. — Andres F., Nov 14 '16 at 16:08
@errr To put it another way: how does programming in a high level language, as opposed to assembly language (with tests!), benefit it you in your day-to-day programming? :) — Andres F., Nov 14 '16 at 16:11
@AndresF Oh, I wasn't saying I didn't get your point nor do I disagree with it. All I'm saying is that in most cases I'm not afraid of refactoring my code as long as I have tests (and _I am_ comfortable with the amount of tests that I write and maintain). Perhaps I'm writing a mostly referential-transparent code without even realizing/noticing it. — Eyal Roth, Nov 14 '16 at 16:28

Arseni Mourzenko · Accepted Answer · 2016-11-14T02:33:48.313

37

The benefit is that pure functions make your code easier to reason about. Or, in another words, side effects increase the complexity of your code.

Take an example of computeProductPrice method.

A pure method would ask you for a product quantity, a currency, etc. You know that whenever the method is called with the same arguments, it will always produce the same result.

You can even cache it and use the cached version.
You can make it lazy and postpone its call to when you actually need it , knowing that the value won't change meanwhile.
You can call the method multiple times, knowing that it won't have side effects.
You can reason about the method itself in an isolation from the world, knowing that all it needs are the arguments.

A non-pure method will be more complex to use and debug. Since it depends on the state of the variables other than the arguments and possibly altering them, it means that it could produce different results when called multiple times, or not have the same behavior when not called at all or called too soon or too late.

Example

Imagine there is a method in the framework which parses a number:

decimal math.parse(string t)

It doesn't have referential transparency, because it depends on:

The environment variable which specifies the numbering system, that is Base 10 or something else.
The variable within the math library which specifies the precision of numbers to parse. So with the value of 1, parsing the string "12.3456" will give 12.3.
The culture, which defines the expected formatting. For instance, with fr-FR, parsing "12.345" will give 12345, because the separation character should be ,, not .

Imagine how easy or difficult would it be to work with such method. With the same input, you can have radically different results depending on the moment when you call the method, because something, somewhere changed the environment variable or switched the culture or set a different precision. The non-deterministic character of the method would lead to more bugs and more debugging nightmare. Calling math.parse("12345") and obtaining 5349 as an answer since some parallel code was parsing octal numbers isn't nice.

How to fix this obviously broken method? By introducing referential transparency. In other words, by getting rid of global state, and moving everything to the parameters of the method:

decimal math.parse(string t, base=10, precision=20, culture=cultures.en_us)

Now that the method is pure, you know that no matter when you call the method, it will always produce the same result for the same arguments.

edited Nov 14 '16 at 02:33

answered Nov 13 '16 at 13:55

Arseni Mourzenko

134,780
31
343
513

4

Just an addendum: referential transparency applies to all expressions in a language, not just functions. – gardenhead Nov 13 '16 at 18:05
3

Note that there are limitations on how transparent you can be. Making `packet = socket.recv()` referentially transparent rather defeats the point of the function. – Mark Nov 13 '16 at 19:51
1

Should be culture=cultures.invariant. Unless you *want* to accidentally create software that only works properly in the US. – user253751 Nov 13 '16 at 23:28
@immibis: hm, good question. What would be the parsing rules for `invariant`? Either they are the same as for `en_us`, in which case, why bother, or they correspond to some other country, in which case, which one and why this one instead of `en_us`, or they have their specific rules which don't match any country anyway, which would be useless. There is really no “true answer” between `12,345.67` and `12 345,67`: any “default rules” will work for a few countries, and won't work for others. – Arseni Mourzenko Nov 13 '16 at 23:50
3

@ArseniMourzenko It's generally a "lowest common denominator" and similar to the syntax many programming languages use (which is also culture-invariant). `12345` parses as 12345, `12 345` or `12,345` or `12.345` is an error. `12.345` parsed as an invariant floating-point number always yields 12.345, in accordance with the programming language convention of using . as the decimal separator. Strings are sorted by their Unicode code points and case-sensitively. And so on. – user253751 Nov 14 '16 at 00:08
@immibis: to be honest, I don't remember how, for instance, in .NET, `CultureInvariant` is different from `en-US`. This being said, lowest common denominator doesn't seem right to me when it comes to cultures. For instance, how would one invariantly format a date? `mm/dd/yyyy`? Why not `dd/mm/yyyy`? Or maybe `yyyy-mm-dd`—a format no human being aside programmers actually uses? Such choices will necessarily be arbitrary, so why not setting `en-US` as an arbitrary default culture, instead of inventing one which is not used in any country? – Arseni Mourzenko Nov 14 '16 at 02:11
@ArseniMourzenko Perhaps even yyyymmddhhmmssZ, but that's the point, it forces the programmer to actually specify a culture when they want to interface with humans. The advantage is you don't get code that accidentally only works properly in the US, as I already stated. – user253751 Nov 14 '16 at 03:07
I like the first half of this answer, but think there must be a stronger example that could be used in the second half. Invoking the possibility that someone has set the environment to use something other than base-10 or fooled around with the library's default precision is really quite obscure/esoteric, and dismissable as overly pedantic. Locale-specific differences make a better argument (kind of; though many environments will expose the locale so that you _don't_ have to explicitly pass it around everywhere), but by the time I get there you've already lost me. – aroth Nov 14 '16 at 03:09
@immibis: after checking, it's `mm/dd/yyyy` with `CultureInvariant` in .NET. I still think it makes perfect sense to have an arbitrary culture set as the default one. People who care about internalization and globalization will set a custom culture anyway. People who don't, for instance because they are writing a small personal script to do a specific task, will benefit from a default culture. But this is an interesting subject; I'll try to ask a separate question on that tomorrow. – Arseni Mourzenko Nov 14 '16 at 03:27
@aroth: well, my example had no intention of being more practical than the examples of OOP with dogs and cats. I can probably imagine something more concrete such as some data transfer stuff where packet size is defined in a global variable, the protocol—in application configuration (which could end up as a singleton), and the encoding—in environment variables, but I don't think it will be necessarily more illustrative. For now, I can't find an example which would be appealing to any programmer while looking as an actual case from real life. – Arseni Mourzenko Nov 14 '16 at 03:32
Pricing isn't a terribly good example from a real point of view. Pricing in real instances needs to change relatively frequently, and so it is often database based or even Excel based. Making it RT would require writing new code every year or so because of inflation! – Ross Presser Nov 22 '16 at 19:38

David Arno · Answer 2 · 2016-11-14T09:52:40.770

11

Do you often add a break point to a point in your code and run the app in the debugger in order to work out what's happening? If you do, that's largely because you aren't using referential transparency (RT) in your designs. And so have to run the code to work out what it does.

The whole point to to RT is that the code is highly deterministic, ie you can read the code and work out what it does, every time, for the same set of inputs. Once you start adding in mutating variables, some of which have scope beyond a single function, you can't just read the code. Such code has to be executed, either in your head or in the debugger, to work out how it truly works.

The simpler the code is to read and reason, the simpler it is to maintain and to spot bugs, so it saves time and money for you and your employer.

edited Nov 14 '16 at 09:52

answered Nov 13 '16 at 13:52

David Arno

38,972
9
88
121

10

"Once you start adding in mutating variables, some of which have beyond a single function, you can't just read the code, you have to execute it, either in your head or in the debugger, to work out how it truly works.": Good point. In other words, referential transparency does not only mean that a piece of code will always produce the same result for the same inputs, but also that the produced result is the **only** effect of that piece of code, that there are no other, hidden side effect like changing some variable that has been defined far away in another module. – Giorgio Nov 13 '16 at 19:23
That is a good point. I do have a bit of a problem with the _the simpler the code is to read/reason_ argument, since _simpler to read or reason_ is a somewhat vague and subjective attribute of code. – Eyal Roth Nov 14 '16 at 15:41
_Once you start adding in mutating variables, some of which have scope beyond a single function_ but why assignment operation is discouraged even when variable scope is local to function ? – rahulaga-msft Nov 28 '18 at 22:13

Karl Bielefeldt · Answer 3 · 2016-11-13T21:58:00.690

9

People throw around the term "easier to reason about," but never explain what that means. Consider the following example:

result1 = foo("bar", 12)
// 100 lines of code
result2 = foo("bar", 12)

Are result1 and result2 the same or different? Without referential transparency, you have no idea. You have to actually read the body of foo to make sure, and possibly the body of any functions foo calls, and so forth.

People don't notice this burden because they are accustomed to it, but if you go work in a purely functional environment for a month or two then come back you will feel it, and it is a huge deal.

There are so many defense mechanisms people do to work around the lack of referential transparency. For my little example, I might want to keep result1 around in memory, because I wouldn't know if it would change. Then I have code with two states: before result1 was stored and after. With referential transparency, I can just recalculate it easily, as long as the recalculation is not time consuming.

edited Nov 13 '16 at 21:58

answered Nov 13 '16 at 18:37

Karl Bielefeldt

146,727
38
279
479

1

You mentioned that referential transparency allows you to reason about the result of the calls to foo() and know whether `result1` and `result2` are the same. Another important aspect is that if `foo("bar", 12)` is referentially transparent, then you do not have to ask yourself whether this call has produced some effects somewhere else (set some variables? deleted a file? whatever). – Giorgio Nov 13 '16 at 19:42
The only "referential integrity" I'm familiar with involves relational databases. – Mark Nov 13 '16 at 19:56
1

@Mark It's a typo. Karl meant referential transparency, as is obvious from the rest of his answer. – Andres F. Nov 13 '16 at 21:18

score 6 · Answer 4 · answered Nov 13 '16 at 13:58

I'd say: referential transparency isn't only good for functional programming but for everyone who works with functions because it follows to principle of least astonishment.

You have a function and can reason better about what it does because there are no external factors you need to take in account, for a given input the output will always be the same. Even in my imperative language I try to follow this paradigm as much as possible, the next thing that basically automatically follows from this is: small easy to understand functions instead of the gruesome 1000+ line functions I sometimes run in.

Those large functions do magic and I'm afraid to touch them because they can break in spectacular ways.

So pure functions aren't something only for functional programming, but for every program.

What are the benefits of referential transparency to a programmer?

4 Answers4

Example