16

By definition, a pure function is deterministic + no side effect.

Is there any example for a function which has no side effects, but is non-deterministic? I.e., a function without side effects, but not pure.

To me non-deterministic function comes from randomness. But random generator has side effect. AFAIK random generator's implementation mutates some global state.

EDIT: duplicate with https://stackoverflow.com/q/54992302

Doc Brown
  • 199,015
  • 33
  • 367
  • 565
Helin Wang
  • 271
  • 1
  • 5
  • What's your definition of "side effect"? What is your definition of "pure"? The most common definition is that "pure" and "no side effect" are synonymous, so a function without side effects is pure by definition. A function without side effects that is not pure cannot exist, because "without side effects" and "pure" are the same. – Jörg W Mittag Oct 26 '22 at 15:07
  • 2
    @JörgWMittag I got the definition of pure = no side effect + deterministic from a book. Is it wrong? – Helin Wang Oct 26 '22 at 15:08
  • 1
    What do you mean by "wrong"? If the book says that is the definition, then that is the definition *for that book*. A different book may or may not have a different definition. – Jörg W Mittag Oct 26 '22 at 15:09
  • Could we say a function which accepts an object as input, changes its properties randomly and always output "hello world", is both determinic because always return "hello world" but having side effects because the input is randomly changed ? Reversing it may help you to find a answer : Is a function that only return a random value has side effect ? Its depend what you or your book call "side effect". – Alexandre DUPONCHEL Oct 26 '22 at 15:29
  • 3
    @JörgWMittag: Usually a [pure](https://en.wikipedia.org/wiki/Pure_function) function would be one that, in addition to having on side effects always returns the same outputs when given the same inputs. So, for example, a function that returns the current value of a global variable has no side effects (no state is altered etc), but it can return different values at different times, even for the same inputs (and hence is not "pure"). In particular a function with side effects is _not_ necessarily pure. – psmears Oct 27 '22 at 10:56
  • 1
    @psmears: At least in the languages and communities that I am familiar with, accessing mutable state is a side-effect. But, as I pointed out in my comment above, it is entirely possible that the book the OP is reading defines those terms differently, and for understanding statements in the book, the only relevant definition is the definition in that book, just like for understanding statements made by me, the only relevant definition of those terms is mine. – Jörg W Mittag Oct 27 '22 at 11:08
  • 4
    @JörgWMittag: That is an _unusual_ definition of [side effect](https://en.wikipedia.org/wiki/Side_effect_(computer_science)). Generally a function has a side effect if calling it affects global state - in other words, calling the function vs not calling it makes a difference; you can tell whether it was called or not. Reading global state wouldn't be a side effect under this definition. This matches the standard (non-CS) meaning of "side effect". You may use the term differently, but I believe the definitions I'm giving are more commonly accepted (eg they match the ones in Wikipedia :) ). – psmears Oct 27 '22 at 11:25
  • 1
    GetUtcDate()? https://learn.microsoft.com/en-us/sql/t-sql/functions/getutcdate-transact-sql?view=sql-server-ver16 – xxbbcc Oct 27 '22 at 15:01
  • I think you are confusing something that is non-deterministic with something where you just don't know what the behaviour is. – Wyck Oct 27 '22 at 15:13
  • @psmears: Reading external state and ignoring the result would not be a side effect, but reading external state and using it in a manner that would make it possible to determine *when* a function was called would generally need to be treated as one. – supercat Oct 27 '22 at 22:25
  • 1
    a `*pseudo*-random generator's implementation mutates some global state.` FTFY – Michael Oct 27 '22 at 23:08
  • 1
    @supercat: I get what you're saying, but that's not generally considered a "side-effect" under the usual definition. Yes, it means the function isn't "pure", and that means (for example) you can't do certain optimisations (eg assuming that calling the function twice with the same arguments will give the same result) but certain optimisations _can_ be done in this case (in particular, if the result of the function isn't used then the call to the function can be optimised out entirely). That's why the standard definition (the one I'm describing, the one in Wikipedia) is useful. – psmears Oct 28 '22 at 09:26
  • @psmears: I would regard the cases where the result of a function isn't used as qualifying under "reading external state and ignoring the result". – supercat Oct 28 '22 at 16:16
  • @psmears: It is true that this definition of "side effect" relies upon factors extrinsic to a function, but I think that's often unavoidable. If one knows that global object `unsigned foo;` will only be accessed by a single thread, and some function `bar()` modifies `foo`, but all calls to `bar` are immediately followed by code that unconditionalyl overwrites the value of `foo` without reading the value left by `bar()`, is the modification of `foo` by `bar` a side effect? For that matter, if one has a program whose defined purpose is to output a couple of words for animals... – supercat Oct 28 '22 at 16:56
  • ...in arbitrary ordrer, and the only effect of `bar` is to cause whether the program outputs `cat dog` or `dog cat`, should the modification of `bar` be considered a side effect? What optimizing compilers need to know, in order to produce the most efficient code meeting requirements, is whether various transforms would affect program behavior *in ways that are relevant to those requirements*. If a program uses a random number generator in such a way that random outputs would likely work better than fixed ones (e.g. a quicksort routine selecting a pivot), an optimizing transform... – supercat Oct 28 '22 at 17:04
  • ...that replaced a "real" random number generator with a Dilbert RNG one may affect performance, but not observably affect output. If, however, a program used an RNG for a task that specified that the expected probability of any particular function call matching the result for any particular previous function call must be betwee 1/255.999 and 1.256.001, then each call to the "get random byte" function should be viewed as having an observable side effect by expanding the number of ways in which the output might be recognized as statistically anomalous. – supercat Oct 28 '22 at 17:08
  • @supercat: "If a function modified a global variable, but the global variable always happens to get overwritten just afterwards, does the function have a side effect?" Yes, because modifying a global variable falls under the definition of "side effect". Granted you can argue that, in a particular circumstance, one specific side effect may or may not be relevant to meeting the requirements that need to be met - but that doesn't stop it being a side effect. That's just what, by convention, the commonly accepted definition happens to be. – psmears Oct 28 '22 at 21:35
  • [Nondeterminism](https://en.wikipedia.org/wiki/Nondeterministic_algorithm) is a concept distinct from randomness or side effects. It's generally used in computer theory, as nondeterminism can't actually be implemented on real computers (though it can be simulated via parallelism). In a nutshell, in answer to a question like "Go left or right?", a deterministic algorithm has to pick one or the other, while a nondeterministic algorithm says "yes". – chepner Oct 29 '22 at 13:50
  • @psmears: If one treats as axiomatic the fact that the function will never be invoked in contexts where any changes it makes to global object will ever be externally observed, then the function will be transitively equivalent to one that copies the global variable to a temporary and then replaces all actions upon the global with actions upon the temporary. If a function is transtiively equivalent to a function without side effects, that should imply that the original also has no side effects. Thus, whether or not a function has side effects would depend upon the axioms one adopts. – supercat Oct 29 '22 at 16:26
  • @chepner: Recognition of "true" non-determinism woud be a useful tool in language specification to replace UB in many situations where all possible superimposed states would be equally acceptable, especially if there were a way to force such a superposition to collapse to a single arbitrarily-chosen state when needed. For example, most optimizations that could be facilitated by treating integer overflow as UB would be equally permissible if it yielded a non-deterministic superposition of all mathematical integers congruent to the arithmetically correct result, mod the range of integers. – supercat Oct 29 '22 at 17:00
  • @chepner: If automatic-duration objects were allowed to hold non-deterministic values, except when passed across function boundaries, then a compiler given `int y=x*5; foo(y/5, y/7);` could treat the left operand of `y/5` as holding the mathematically correct value of `x*5` (so the quotient would yield `x`) while treating the left operand of `y/7` as holding a product that was truncated to the range of `int`. If the only situations where `x*5` could overflow would involve invalid data that couldn't be processed *usefully*, and where any combination of values passed to `foo` would be... – supercat Oct 29 '22 at 17:09
  • ...equally tolerably useless, such a non-deterministic model could facilitate the generation of more efficient machine code than would be possible if a programmer had to avoid overflow at all costs. – supercat Oct 29 '22 at 17:10
  • @supercat: I'm not really sure where you're going with this. If you pick axioms such that a function has access to state that can never be accessed outside that call to that function, that's another way of saying that that state is local to that function call; results on what is/isn't a side effect follow accordingly. You may as well say "If you pick axioms such that the operator in a group is always commutative, then you can prove XYZ". It's true you can get interesting theorems that way, but that's just not how groups are commonly defined... – psmears Oct 29 '22 at 21:09
  • @psmears: What terminology would be better to specify a sound mathematical basis for allowing some combinations of optimizations but not others, in a manner that will uphold program requirements? Many "astoninishing" optimizations that the Standard was almost certainly not intended to invite stem from compilers combining optimizing transforms like "Replace function X with a version that doesn't do Q, because Y does Q", and "Replace function Y with a version that doesn't do Q, because X does Q". If X were regarded axiomatically as doing Q, then the substitution for X would violate... – supercat Oct 31 '22 at 15:01
  • ...that axiom, but the transform on Y would be yield a function equivalent to the original. Likewise, if Y were regarded as axiomatically as doing Q, then the substitution for X would yield a function equivalent to the original, but for Y would violate the axiom. – supercat Oct 31 '22 at 15:07
  • @supercat: I'm even less sure what you're going for here. The term "side effect" has a particular, broadly accepted meaning; there are reasons for that, which I've described. You're free to argue that a different definition would be better in some (or indeed all) circumstances. You may even be right. But that doesn't change the fact that the most commonly used definition of the term is the one I've described, which is also given in the wikipedia page. – psmears Oct 31 '22 at 20:53
  • @psmears: Returning to the original question, the question of whether a piece of code has any *observable* side effects will depend upon the existence of means by which various potential side effects might be observed, and many definitions of "side effect" allow those which could not be observed via any recognized means to be ignored if convenient. On many systems, one could physically cause *any* piece of code to have arbitrary side effects by attaching a hardware bus monitor, having it trigger an interrupt when a certain address is detected, and installing a suitable interrupt handler. – supercat Oct 31 '22 at 21:30
  • @psmears: The generally-theoretical possibility of a bus monitor being used to reveal whether a piece of code is executed would not generally be considered a "side effect" of that code, but if a bus monitor was connected in a manner that would change program state if it triggered, such change would become a "side effect". – supercat Oct 31 '22 at 21:36
  • 1
    **Comments are not for extended discussion. Feel free to continue this discussion in our [chat](https://chat.stackexchange.com/rooms/21/the-whiteboard).** – maple_shaft Nov 01 '22 at 14:50

5 Answers5

38

Of course this depends on the definitions. Let's drop "pure" which has a definition in your question that clearly makes non-deterministic pure functions impossible as being deterministic is a requirement for being pure. Let's instead assume these simple definitions:

  • A function is deterministic if its output strictly depends on its inputs, i.e. it will never return different results if called with the same parameters.
  • A function is side-effect-free if it does not effect any outside state, i.e. there is no external state that could tell you whether the function was called or not.

So a side-effect-free nondeterministic function would need to access external state that is modified not by itself but by something else in its environment. For example, a function that returns the current temperature in my room would be side-effect-free (it does not change it) but nondeterministic (you don't know which result it will return when you call it again at another time).

By your definition, such a function isn't pure as it depends on other data besides its parameters.

Hans-Martin Mosner
  • 14,638
  • 1
  • 27
  • 35
  • 6
    As a further clarification, note that pseudo-random number generators return a *sequence of numbers.* They maintain some state because they must remember which number in the sequence to return next, and are therefore not "side-effect-free" functions. But true random number sources take as their input some form of white noise or pink noise, in the form of, say, a thermal junction or an alpha particle radioactive source. That is a form of external state, *not modified by the function.* Of course, you could say the same about any function that reads the state of some mechanical device. – Robert Harvey Oct 26 '22 at 19:31
  • 7
    @RobertHarvey, I wouldn't veer into philosophy of physics. There is no *known* mechanism of determination for those processes, but science has certainly not found that they are "entirely non-deterministic". – Steve Oct 26 '22 at 21:48
26

A properly working classical computer is, by definition, deterministic. That is, the output of the same series of steps with the same inputs will produce the same result. When we talk about non-determinism in that context, what that typically means (in my experience) is that there are implicit inputs whose state can vary.

If we accept the above, then a non-deterministic function with no side effects might be something like a pseudo-random number generator that uses the system clock time to generate numbers, the distinction being that it doesn't change any state as part of its execution. It's not pure because it depends on state.

I think it's important to reiterate that the actual function is still deterministic. If you changed such a function to accept a time value instead of using the system clock's state, it will produce the same output for the same time value.

psmears
  • 188
  • 4
JimmyJames
  • 24,682
  • 2
  • 50
  • 92
  • 30
    You don't need a pseudo-random number generator, just a function that returns the system clock. I think that's the minimal example for a non-deterministic function without side effects. – Rainer P. Oct 27 '22 at 11:16
  • You could avoid that last bit of determinism by using a *true* random number generator, e.g. something that samples the decay of a chunk of Cesium-137. – Ray Oct 27 '22 at 16:02
  • @Ray The values that would come from decay samples are additional inputs. – JimmyJames Oct 27 '22 at 16:23
  • @RainerP. I agree but it's not clear to me why that's important here. – JimmyJames Oct 27 '22 at 16:24
  • Yes, but no more so than the system clock or PRNG state would. Either approach could be written either as taking the value as an explicit input, or by making a function call internally (which yes, could be thought of as being an implicit input, since it's calling a non-pure function). – Ray Oct 27 '22 at 16:50
  • @RainerP. Keeping track of time inherently involves side effects. Is a function that calls a non-pure function a pure function? – David Hammen Oct 27 '22 at 17:27
  • @Ray That's not "pure". The decay has side effects. The decay of a cesium-137 nucleus results in a barium-137 nucleus, and that barium-137 nucleus won't be available to decay again. – David Hammen Oct 27 '22 at 17:41
  • @Ray "Yes, but no more so than the system clock or PRNG state would." Agreed. That's exactly my point. – JimmyJames Oct 27 '22 at 18:16
  • 1
    Ah, I see. I explained my point poorly, then. I wasn't arguing that a true-random sample would reduce the number of implicit inputs, but rather that it is genuinely nondeterministic / *unpredictable* (even in theory), whereas the system clock's value is fully determined by its value at the previous nanosecond (or whatever precision). (Of course, in *practice*, in order to actually make that prediction, we'd need a second clock of at least as high a precision, but we could still make certain statements like "It'll be higher than it was last time", which can't be done at all for Cesium, etc.). – Ray Oct 27 '22 at 18:31
  • 1
    @Ray I get what you are saying. The point of the answer is that all computation in a classical computer is deterministic. When we say a function is 'non-deterministic' (meaning the same function inputs do not always return the same results) what we *really* mean is that there are other 'hidden' inputs not given as function parameters. The system clock example is just one of many possible sources of input. I chose it because it's ubiquitous. – JimmyJames Oct 27 '22 at 19:35
  • 1
    @Ray If the system clock has a different oscillator than the CPU, then the exact value read can depend on micrscopic zone temperature variations that cause slew between the two oscillators. These depend on quantum effects that are believed to be truly random. You can "mine" this on real computers by sampling the TSC (high speed clock) when a packet arrives over the network because the network card has its own oscillator. – David Schwartz Oct 27 '22 at 19:56
  • @DavidSchwartz Interesting but, wow, really going down the rabbit hole here! – JimmyJames Oct 27 '22 at 20:26
  • 3
    *"there are **implicit inputs** whose state can vary"* -- oh boy are there a lot of those cases. Sneaky, sneaky cases, like `DateTime.Parse("1/1/2022")` in C#, which uses the current culture to interpret which number is the month, and which number is the day. Any function that defaults to some system preference would seem to be deterministic on a single system, but non-deterministic across different systems, when in fact it is always non-deterministic. I can change my date/time preferences, and now my assumptions in code are wrong. – Greg Burghardt Oct 28 '22 at 12:54
  • 1
    @GregBurghardt But I think that is exactly the point. Such functions are not pure. They take hidden inputs. And lo and behold, a lot of programs (and certainly scripts) failed when locales began to be a thing because suddenly some command outputs didn't contain certain keywords any longer. That's *exactly* what cannot happen with a pure function. – Peter - Reinstate Monica Oct 28 '22 at 14:17
  • @GregBurghardt Personally, I consider that example to be bad design, but it is exactly the kind of thing we are talking about. I've seen worse, though. The whole mess of how Java would magically find an XML parser implementation for you burned me a number of times. The cumulative amount of time and effort lost fighting that is ridiculous. – JimmyJames Oct 28 '22 at 16:44
  • 1
    "A properly working classical computer is, by definition, deterministic." Unless it can do I/O. Reading the last pressed keyboard key might not be deterministic, even if the whole program has no side effects. – Polygnome Oct 28 '22 at 21:15
  • @Polygnome It seems like you haven't bothered to read the full answer much less the comments. The 'I' in "I/O" stands for 'input'. – JimmyJames Oct 28 '22 at 21:18
12

There is no need to use some special hardware like a clock or sensors to give an example.

The point is, any function which return value can be influenced by a side effect of other functions is non-deterministic. Still, such functions don't need to have any side-effects by themselves.

Just as simple as

 int myVariable;   // outer scope (!), 
                   // will be changed somewhere else in the program

 int MyFunction(){ return myVariable; }

Based on this, I am sure you can now construct arbitrary non-deterministic, side-effect free functions by yourself, they just have to rely on some stateful variable outside the local scope of the function.

Doc Brown
  • 199,015
  • 33
  • 367
  • 565
  • If a function calls one that has side effects, the function itself has side effects. I do not think this answers the question. Otherwise calling File.WriteAllText() does not have side effects, because it simply calls some OS function (which has the side effects). – Tvde1 Oct 27 '22 at 10:41
  • 3
    @Tvde1: that is not what I wrote, maybe you read my answer again. – Doc Brown Oct 27 '22 at 11:06
  • 1
    @Tvde1 This answer isn't saying that this function should _call_ one that has side effects, only that a function without side effects might be affected by others that do have side effects that were called earlier at some point in the system. A function that returns the number of cached objects is such an example: while the function itself has no side effects (it simply returns the number of cached objects), other functions can add or remove things from the cache, thus making our example-function non-deterministic. – T. Sar Oct 27 '22 at 13:11
  • @Tvde1 in other words: a function that internally relies on the state of the system (as in, directly access state without getting those things in via parameters or the like) without necessarily changing it is at the same time non-deterministic and without side effects. – T. Sar Oct 27 '22 at 13:14
  • What makes this function non-deterministic? With the same input (myVariable included), it will result in the same output. – Tvde1 Oct 27 '22 at 13:58
  • 1
    @Tvde1 It depends on your definitions. If you consider the entire memory state of the program to be "input", then you're right. But if you're just talking about the function's explicit parameters, then `myVariable` isn't an input. – Barmar Oct 27 '22 at 14:01
  • 4
    @Tvde1: scope matters, otherwise the whole term "side-effect" makes no sense. – Doc Brown Oct 27 '22 at 14:17
  • @Barmar The question uses the word "pure" which refers to explicit parameters. – Izkata Oct 27 '22 at 14:30
  • @Izkata But the question explicitly asks for a function that is **not** pure because it does **not** fulfill one of the two conditions for pureness. – Philipp Oct 28 '22 at 14:28
  • @Philipp Exactly, Barmar seems to be confused about what inputs the comments were talking about. The question uses the "pure" terminology to avoid that ambiguity, and is what others are using as well. – Izkata Oct 28 '22 at 15:28
  • In GNU C, this function could be marked `__attribute__((pure)` but not `__attribute__((const))`. (https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html). – Peter Cordes Oct 29 '22 at 22:56
1

Pure functions are modelled after functions in mathematics (hence pure), which are, fundamentally, just sets of pairs (with some restrictions). Anything that cannot be thought of as a lookup in some (potentially infinite) collection of pairs is, in this sense, impure.

The definition that pure functions are deterministic and without side effects depends on the meanings of "deterministic" and "side effects", which are usually distinct from what programming languages define. This distinction is useful mainly for one thing, crucial to functional languages: you can seamlessly cache the output of the function for the given input, i.e. memoize the function.

A deterministic function does not really have to be predictable in a sense that you can find out the result without ever calling the function, but once you have it, it will never change unless the arguments change. A function which returns the startup time of your program is deterministic, because it never changes during the run-time of the program (context matters here). A function which returns the current weather at the precise point in time it is called is not deterministic (if your program doesn't happen in an instant), but if the time (and location) is an argument, it is deterministic. Even a true random number generator can be deterministic here if you give it an increasing counter (but it is obliged to return the old value for an unchanged counter). Even reading a file or downloading it from web can be deterministic if you cache the result based on the location. See fn:json-doc for example.

The second restriction is that the result (including changes in by-ref arguments) should be the only observable effect of the function. This doesn't mean that the function cannot change any state, but such change must be effectively hidden to the outside world. Caching a value must definitely change some state (to store the value), but if you don't have any way of knowing that, the cache behaves as it was pre-filled with the correct values from the beginning of the program. This is also complementary to the first restriction ‒ if a function modifies externally observable state, another function that reads the state can no longer be deterministic. If there is no such function, then it doesn't matter whether the first function modifies something, as it is not observable. By this token, all deterministic functions in the previous paragraph are also side effect-free, with the exception of sending a web request (which could have side effects). This also depends on what you choose as the environment and what counts as observing: a whole program can have side effects, but its individual functions can all be pure; and using reflection usually does not count as breaking the purity of caching functions.

Based on these definitions, there are a lot of functions that are non-deterministic but without side effects: reading from any sort of sensor, reading from a dynamic web page, reading from a port or a changeable file, getting a (pseudo-)random number (if a non-deterministic function has a side effect only on itself, then it's not really a side effect) etc. All of these functions can also be made deterministic if you "bind" them to something, i.e. a counter, time and so on.

IS4
  • 215
  • 1
  • 7
  • "Even reading a file or downloading it from web can be deterministic if you cache the result based on the location." The act of caching itself makes the function non-deterministic. A deterministic function runs the _same exact steps_ for the _same input_. If you run it twice and it does different steps each time (read: goes into a different decision branch because of the cache), then it isn't deterministic. – T. Sar Oct 27 '22 at 13:19
  • @T.Sar That depends on the definition of "deterministic". For practical purposes, it doesn't matter whether it performs the exact same steps each time, only whether the result is the same or not. For outside code, the function is a black box anyway and it doesn't matter what it does (especially not if it's pure). See the linked XPath function listing for the meaning of "deterministic" I use here. – IS4 Oct 27 '22 at 20:23
  • 'For practical purposes, it doesn't matter whether it performs the exact same steps each time" It can absolutely matter in some cases, specially in time-sensitive or high-security contexts. A pure function that stores values inside itself has a different set of implications than one that doesn't, and while it is true that for _most_ applications this won't make a difference, it doesn't make the function deterministic - at least, not in the more usual sense of the world. That said, I agree it boils down to the definition of deterministic. – T. Sar Oct 27 '22 at 21:10
  • @T.Sar Yeah, I agree that the run-time of a function or other observable/measurable effects (space requirements and allocations) can matter too, but then I think it's either a side effect or a "result" of the function, once it is something you care about. – IS4 Oct 27 '22 at 21:49
-1

Any function whose result is dependent on state outside the computer, i.e. functions that read from external devices -- the network, monitoring equipment, user input. results.

Barmar
  • 324
  • 2
  • 5