46

Say we have a normal pure function such as

function add(a, b) {
  return a + b
}

And then we alter it such that it has a side effect

function add(a, b) {
  writeToDatabase(Math.random())
  return a + b;
}

It's not considered a pure function as far as I know because I often hear people call pure functions "functions without side effects." However, it does behave like a pure function as far as the fact that it will return the same output for the same inputs.

Is there a different name for this type of function, is it unnamed, or is it still actually pure and I'm mistaken about the definition of purity?

m0meni
  • 773
  • 1
  • 6
  • 12
  • 93
    "Not a pure function". – Ross Patterson May 01 '16 at 03:16
  • 2
    @RossPatterson that's what I thought as well, but by asking I learned about referential transparency so I'm glad I didn't keep it to myself. – m0meni May 01 '16 at 03:54
  • 9
    If `writeToDatabase` fails it could trigger an exception thus making your second `add` function produce an exception sometimes even if called with the same arguments that before didn't have problems... most of the time having side effects introduces this kind of error-related conditions that break "input-output purity". – Bakuriu May 01 '16 at 09:40
  • What would be the point of having a name for such a function? The reason we care about purity is because it allows us specific optimisations, but why would you want to distinguish between functions with side-effects whose inputs may depend on global state and functions with side-effects? There is just no practical use here. – Voo May 01 '16 at 17:36
  • @Voo Caching is one of those cases where you get the same result but can have a side effect. – Jimmy T. May 01 '16 at 18:47
  • @Jimmy A caching layer that randomly ignores side effects seems like a downright horrible idea. – Voo May 01 '16 at 18:52
  • @Voo If the value is not in the cache there is a side effect otherwise not. That's not random. – Jimmy T. May 01 '16 at 18:54
  • 27
    Something that always give the same output for a given input is called *deterministic*. – njzk2 May 01 '16 at 19:24
  • 1
    To be more precise, a pure function has no *visible* side effects. I'd argue that "visible" should be interpreted in this context as visible within the API that the function belongs to. If writing to a database has no visible effect, then this function could be considered pure. On the other hand, if it has no visible effect, then why would you do it? Perhaps for debugging... – Kevin Krumwiede May 02 '16 at 02:25
  • 1
    I'd argue that such a function would be called "doing too much", since it has both a relevant return value AND a side effect and is almost certainly doing two things. – Erik May 02 '16 at 08:19
  • 2
    @njzk2: True, but it's also _stateless_. A _stateful deterministic_ function may not give the same output for every input. Example: `F(x)` is defined to return `true` if it's called with the same argument as the previous call. Clearly with the sequence `{1,2,2} => {undefined, false, true}` this is deterministic, yet it gives different outputs for `F(2)`. – MSalters May 02 '16 at 10:44
  • @MSalters good precision. In my current world functions tend to be stateless but belong to stateful object, in which case one could argue that the state of the object containing the function is part of the argument – njzk2 May 02 '16 at 13:22
  • a **pure function** is one which do not affect anything outside of it's scope. a **const function** is one which do not read any data other than variables that are passed to it. a **deterministic function** is one which always give the same output for a given input. an optimized compiler could choose not to call a pure function in a short-circuit evaluated if statement because it has no side-effect. an optimized compiler could cache return value of a const function for a given input and use it in place of function calls because it is constant. a deterministic function gives you peace of mind. – Kerem Baydoğan May 24 '16 at 08:43
  • @KeremBaydoğan Your definitions are non-standard. Your definitions of pure and deterministic are puzzling. Also, what is a "const function"? The closest general concept I can think of is *constant* function, [which is not what you described](https://en.wikipedia.org/wiki/Constant_function). – Andres F. Mar 23 '17 at 03:59
  • If a function has side effects that would be surprising given its name I would call it "broken". – bikeman868 Mar 24 '17 at 07:16
  • Fine gentlemen, allow me to give proposals. Let's call it either **dirty-**, **kinky-** or **naughty function** – Aphton Nov 01 '18 at 10:04

7 Answers7

86

I'm not sure about universal definitions of purity, but from the point of view of Haskell (a language where programmers tend to care about things such as purity and referential transparency), only the first of your functions is "pure". The second version of add isn't pure. So in answer to your question, I'd call it "impure" ;)

According to this definition, a pure function is a function that:

  1. Only depends on its input. That is, given the same input, it will always return the same output.
  2. Is referentially transparent: the function can be freely replaced by its value and the "behavior" of the program will not change.

With this definition, it's clear your second function cannot be considered pure, since it breaks rule 2. That is, the following two programs are NOT equivalent:

function f(a, b) { 
    return add(a, b) + add(a, b);
}

and

function g(a, b) {
    c = add(a, b);
    return c + c;
}

This is because even though both functions will return the same value, function f will write to the database twice but g will write once! It's very likely that writes to the database are part of the observable behavior of your program, in which case I've shown your second version of add isn't "pure".

If writes to the database aren't an observable part of your program's behavior, then both versions of add can be considered equivalent and pure. But I can't think of a scenario where writing to the database doesn't matter. Even logging matters!

Andres F.
  • 5,119
  • 2
  • 29
  • 41
  • 1
    Isn't "only depends on its input" redundant given referential transparency? Which would imply RT is synonymous with purity? (I'm getting more confused about this the more sources I look up) – Ixrec Apr 30 '16 at 22:32
  • I agree it's confusing. I can only think of contrived examples. Say `f(x)` depends not only on `x`, but also on some external global variable `y`. Then, if `f` has the property of RT you can freely swap all its occurrences with its return value *as long as you don't touch* `y`. Yes, my example is dubious. But the important thing is: if `f` writes to the database (or writes to a log) it loses the property of RT: now it doesn't matter whether you leave global `y` untouched, you *know* the meaning of your program changes depending on whether you actually call `f` or simply use its return value. – Andres F. Apr 30 '16 at 22:37
  • Humph. Let us say that we have such a function that is pure except for side effects and is also guaranteed to have such behavior where your two samples are equivalent. (I had this case come up in fact so it's not hypothetical.) I think we're not quite done. – Joshua May 01 '16 at 01:39
  • 2
    I'd argue the second function could break rule #1 too. Depending on the language and the error handling practices of the database API in use, the function may well not return anything at all if the database is unavailable or the db write fails for some reason. – Zach Lipton May 02 '16 at 02:38
  • @ZachLipton agree, writeToDatabase is almost guaranteed to be impure in more ways. it probably reads a connection string from disk or some cache and it sends a web request to the db and might throw exceptions and so on. and as we all know, impurity spreads and taints everything that touches it. no function calling an impure function can be considered pure. – sara May 02 '16 at 06:34
  • 1
    Since Haskell was mentioned: In Haskell adding a side-effect like that requires *changing the signature of the function*. (think of it like giving the original database as an additional input and getting the modified database as an additional return value of the function). It is actually possible to model side effects quite elegantly in the type system that way, it's just that today's mainstream languages don't care enough about side-effects and pureness to do this. – ComicSansMS May 02 '16 at 08:56
  • @ComicSansMS Indeed! I didn't want to complicate my answer with Haskell's additional type system features. – Andres F. May 02 '16 at 20:33
  • @ZachLipton Of course, but I didn't want to complicate my answer. Assume that, in the context of this answer, this is a magical database that doesn't need to be configured and never fails to write. Or replace "database" with "logging to stdout and ignoring all output errors". – Andres F. May 02 '16 at 20:35
22

What do you call a function [for which] the same input will always return the same output, but also has side effects?

Such a function is called

deterministic

An algorithm whose behavior can be completely predicted from the input.

termwiki.com

Regarding state:

Depending on whose definition of a function you use, a function has no state. If you come from the object oriented world, remember that x.f(y) is a method. As a function it would look like f(x,y). And if you're into closures with enclosed lexical scope remember that immutable state might as well be part of the functions expression. It's only mutable state that would impact the functions deterministic nature. So f(x) = x + 1 is deterministic so long as the 1 doesn't change. Doesn't matter where the 1 is stored.

Your functions are both deterministic. Your first is also a pure function. Your second is not pure.

Pure function

  1. The function always evaluates the same result value given the same argument value(s). The function result value cannot depend on any hidden information or state that may change while program execution proceeds or between different executions of the program, nor can it depend on any external input from I/O devices.

  2. Evaluation of the result does not cause any semantically observable side effect or output, such as mutation of mutable objects or output to I/O devices.

wikipedia.org

Point 1 means deterministic. Point 2 means referential transparency. Together they mean a pure function only allows its arguments and its returned value to change. Nothing else causes change. Nothing else is changed.

candied_orange
  • 102,279
  • 24
  • 197
  • 315
  • -1. Writing to database depends on external state that generally cannot be determined looking on the inputs. The database may be unavailable for a number of reasons and whether the operation will succeed is not predictable. This is not deterministic behavior. – Frax Nov 01 '18 at 08:06
  • @Frax System memory might be unavailable. The CPU might be unavailable. Being deterministic doesn't guarantee success. It guarantees that successful behavior is predictable. – candied_orange Nov 01 '18 at 09:48
  • OOMing is not specific to any function, it is different category of problem. Now, let's just look at point 1. of your "pure function" definition (which indeed means "deterministic"): "The function result value cannot depend on any hidden information or **state that may change while program execution proceeds or between different executions of the program**, nor can it depend on any external input from I/O devices". Database is that kind state, so OPs function clearly doesn't fulfill this condition - it isn't deterministic. – Frax Nov 01 '18 at 11:11
  • @candied_orange I would agree if the write to the DB was dependent on the inputs only. But it's `Math.random()`. So no, unless we assume a PRNG (instead of a physical RNG) AND consider that PRNGs state part of the input (which it isn't, the reference is hardcoded), it's not deterministic. – marstato Nov 01 '18 at 11:14
  • @Frax output is not input. Equipment failure is not input. You really can't classify an algorithm based on the equipment it's run on. That isn't how this kind of reasoning works. This is about making LOGICAL dependencies explicit. – candied_orange Nov 01 '18 at 11:25
  • @marstato the write to the database has no impact on the result returned by the second function. That means its return is deterministic. The second algorithm is just not side effect free. – candied_orange Nov 01 '18 at 11:36
  • 2
    @candied_orange your citation of deterministic states "an algorithm whose **behaviour** can be completely predicted from the input." Writing to IO, to me, is definitely behaviour, not result. – marstato Nov 01 '18 at 12:51
  • @marstato when you think of the principles that make up functional purity that way you end up hopelessly entangling them. Emitting radio waves out into the universe when you use electricity is a side effect. It's depending on that side effect that robs you of deterministic behavior. If nothing in your program cares it's still deterministically predictable. It's just not pure because if fails point 2. It has side effects. – candied_orange Nov 01 '18 at 15:18
  • @candied_orange i agree, calculating 1+1 emits waves and advances the instruction pointer (even produce sound); side effects not accounted for. Though, you are exaggeration too much. These side effects don't matter in the semantics of the programming language. The write to the DB definitely matters on that level. – marstato Nov 01 '18 at 16:33
  • @marstato a write to the database ONLY matters if something reads that from the database as far as DETERMINISTIC behavior is concerned. What prohibits writing random data to a database in a pure function is that it would be a side effect. To be pure you must respect both. Plain and simple. Please don't conflate the two separate issues. – candied_orange Nov 01 '18 at 17:21
  • Man, the function does _literally_ random stuff, and you keep saying it is deterministic. Sure, having (or not having) side effects is orthogonal to being deterministic. But having random side-effects is not. Or maybe should I say _indeterministic_ side-effects? – Frax Nov 01 '18 at 17:40
  • @Frax The in-deterministic side-effects are called side-effects precisely because they are not the functions effect which remains deterministic. More to the point. It is simply not useful to construct a way of reasoning about these functions if you argue it's otherwise. You're effectively arguing that being pure and deterministic are the same thing. You can think that way if you like but it's not useful. – candied_orange Nov 01 '18 at 17:45
  • How would you call functions with deterministic side-effects then? Like, say, changing a global variable to a fixed value? – Frax Nov 01 '18 at 18:30
  • @Frax well I wouldn't call them pure. Change, deterministic or not, leaking out as a side-effect, where any poor unsuspecting function can see it, violates the idea of pure functions. You just created shared mutable state. Deterministic or not that's the crap that forces me to put locks or semaphores on things. Pure functions can run in multiple threads without that crap because there is no shared mutable state. – candied_orange Nov 01 '18 at 18:58
  • Your last comment is a pure straw man. Are you trolling purposefully or by accident? – Frax Nov 01 '18 at 19:16
  • Let us [continue this discussion in chat](https://chat.stackexchange.com/rooms/85210/discussion-between-candied-orange-and-frax). – candied_orange Nov 01 '18 at 20:07
9

If you don't care about the side effect, then it's referentially transparent. Of course it's possible that you don't care but someone else does, so the applicability of the term is context-dependent.

I don't know of a general term for precisely the properties you describe, but an important subset are those that are idempotent. In computer science, slightly differently to in mathematics*, an idempotent function is one that can be repeated with the same effect; that is to say the nett side-effect result of doing it many times is the same as of doing it once.

So, if your side-effect was to update a database with a certain value in a certain row, or to create a file with exactly consistent contents, then it would be idempotent, but if it added to the database, or appended to a file, then it would not.

Combinations of idempotent functions may or may not be idempotent as a whole.

*The use of idempotent differently in computer science than mathematics appears to have come from an incorrect use of the mathematical term that was then adopted because the concept is useful.

Jon Hanna
  • 2,115
  • 12
  • 15
  • 3
    the term "referentially transparent" is more strictly defined than whether or not "anyone cares". even if we disregard IO issues such as connection problems, missing connection strings, timeouts, etc., then it's still easy to show that a program that replaces `(f x, f x)` with `let y = f x in (y, y)` will run into out of disk space-exceptions twice as fast you could argue that these are edge cases you do not care about, but with such a fuzzy definition we might as well call `new Random().Next()` referentially transparent because heck, I don't care what number I get anyway. – sara May 02 '16 at 06:44
  • @kai: Depending on the context, you may ignore side-effects. On the other hand, the return value of a function like random is not a side effect: it is its main effect. – Giorgio May 09 '16 at 20:48
  • @Giorgio `Random.Next` in .NET does indeed have side effects. Very much so. If you can `Next`, assign it to a variable and then call `Next` again and assign it to another variable, chances are they will not be equal. Why? Because invoking `Next` changes some hidden internal state in the `Random` object. This is the polar opposite of referential transparency. I don't understand your claim that "main effects" cannot be side effects. In imperative code it's more common than not that the main effect is a side effect, because imperative programs are stateful by nature. – sara May 10 '16 at 08:54
3

I don't know how such functions is called (or whether there even is some systematic name), but I would call function that is not pure (as other answers cowered) but always returns same result if supplied with same parameters "function of its parameters" (compared to function of its parameters and some other state). I'd call it just function, but unfortunately when we say "function" in context of programming, we mean something that does not have to be actual function at all.

user470365
  • 1,229
  • 6
  • 8
  • 1
    Agreed! It's (informally) the mathematical definition of "function", but like you say, unfortunately "function" means something different in programming languages, where it's closer to "a step-by-step procedure required to obtain a value". – Andres F. Apr 30 '16 at 23:21
2

It basically depends on whether or not you care about the impurity. If the semantics of this table are that you don't care how many entries there are, then it's pure. Else, it's not pure.

Or to put it another way, it's fine as long as optimizations based on purity don't break program semantics.

A more realistic example would be if you were trying to debug this function and added logging statements. Technically, the logging is a side effect. Do the logs make it impure? No.

DeadMG
  • 36,794
  • 8
  • 70
  • 139
  • Well, it depends. Maybe the logs make it impure, for example if you care about how many times, and at what times, "INFO f() called" appears in your log. Which you often do :) – Andres F. Apr 30 '16 at 22:57
  • 8
    -1 Logs do matter. On most platforms output of any kind implicitly synchronizes execution thread. Behaviour of your program becomes dependent on other threads writes, on external log writers, sometimes on log reads, on state of file descriptors. It is as pure as bucket of dirt. – Basilevs May 01 '16 at 07:41
  • @AndresF. Well, you probably don't care about the literal number of times. You probably only care that it's logged as many times as the function was called. – DeadMG May 01 '16 at 08:52
  • @Basilevs The behaviour of the function isn't dependent on them at all. If the log write fails, you just carry right on. – DeadMG May 01 '16 at 08:52
  • Should I carry on if function deadlocks? Is not dependent? – Basilevs May 01 '16 at 08:55
  • Your logging library should guarantee no deadlocks. Otherwise, you really need a new one. – DeadMG May 01 '16 at 13:00
  • This means that "the number of times the function was called" is an observable property of your program's behavior. Therefore, the two programs do not behave the same! – Andres F. May 01 '16 at 14:34
  • (PS: take a look at Haskell's [trace](http://hackage.haskell.org/package/base-4.8.2.0/docs/Debug-Trace.html#v:trace), which makes it clear adding logging to a function makes it **not** referentially transparent, and therefore (by Haskell's definition) impure. `trace` is a dangerous function because it "hides" this impurity from the function's signature, which is in general a no-no). – Andres F. May 01 '16 at 15:27
  • 2
    It's a matter of whether you choose to define the logger to be part of the execution environment or not. For another example, is my pure function still pure if I attach a debugger to the process and set a breakpoint on it? From the POV of the person using the debugger, clearly the function has side-effects, but normally we analyse a program with the convention that this "doesn't count". The same can (but needn't) go for logging used for debugging, which I presume is why trace is hiding its impurity. Mission-critical logging, e.g. for audit, of course *is* a significant side-effect. – Steve Jessop May 01 '16 at 16:06
  • @DeadMG in many systems, the logging system can be swapped for another without the program even knowing. So no, you can't rely too much on it. – njzk2 May 01 '16 at 19:27
1

I'd say that the best thing to ask is not how we would call it, but how we would analyze such a piece of code. And my first key question in such an analysis would be:

  • Does the side effect depend on the argument to the function, or the result on the side effect?
    • No: The "effectful function" can be refactored into a pure function, an effectful action, and a mechanism for combining them.
    • Yes: The "effectful function" is a function that produces a monadic result.

This is straightforward to illustrate in Haskell (and this sentence is only half a joke). An example of the "no" case would be something like this:

double :: Num a => a -> IO a
double x = do
  putStrLn "I'm doubling some number"
  return (x*2)

In this example the action that we take (print the line "I'm doubling some number") has no impact on the relationship between x and the result. This means we can refactor it this way (using the Applicative class and its *> operator), which shows that the function and the effect are in fact orthogonal:

double :: Num a => a -> IO a
double x = action *> pure (function x)
  where
    -- The pure function 
    function x = x*2  
    -- The side effect
    action = putStrLn "I'm doubling some number"

So in this case I, personally, would say that it's a case where you can factor out a pure function. A lot of Haskell programming is about this—learning how to factor out the pure parts from the effectful code.

An example of the "yes" sort, where the pure and the effectful parts are not orthogonal:

double :: Num a => a -> IO a
double x = do
  putStrLn ("I'm doubling the number " ++ show x)
  return (x*2)

Now, the string that you print depends on the value of x. The function part (multiply x by two), however, doesn't depend on the effect at all, so we can still factor it out:

logged :: (a -> b) -> (a -> IO x) -> IO b
logged function logger a = do
  logger a
  return (function a)

double x = logged function logger
  where function = (*2) 
        logger x putStrLn ("I'm doubling the number " ++ show x)

I could go on spelling out other examples, but I hope this is enough to illustrate the point I started with: you don't "call" it something, you analyze how the pure and the effectful parts relate and factor them out when it is to your advantage.

This is one of the reasons Haskell uses its Monad class so extensively. Monads are (among other things) a tool for performing this sort of analysis and refactoring.

sacundim
  • 4,748
  • 1
  • 19
  • 16
-2

Functions that are intended to cause side-effects are often called effectful. Example https://slpopejoy.github.io/posts/Effectful01.html

  • Only answer to mention the widely recognized term effectful and it gets down voted.... Ignorance is bliss I suppose. .. – Ben Hutchison May 02 '16 at 22:24
  • "effectful" is a word that the author of that post co-opted to mean "having side-effects." He says so himself. – Robert Harvey May 04 '16 at 01:11
  • Googling [effectful function](https://www.google.com.au/search?q=effectful+function) reveals plenty of evidence its a widely used term. The blog post was given as one of many examples, not as the definition. In the functional programming circles where pure functions are the default, there's a need for a positive term to describe deliberately side-effecting functions. Ie more than just the _absence of purity_. Effectful is that term. Now, consider yourself educated. – Ben Hutchison May 04 '16 at 03:35
  • [Meh.](https://www.google.com/trends/explore#q=effectful) – Robert Harvey May 04 '16 at 04:05