15

I recently attended an online course on programming languages in which, among other concepts, closures were presented. I write down two examples inspired by this course to give some context before asking my question.

The first example is an SML function that produces a list of the numbers from 1 to x, where x is the parameter of the function:

fun countup_from1 (x: int) =
    let
        fun count (from: int) =
            if from = x
            then from :: []
            else from :: count (from + 1)
    in
        count 1
    end

In the SML REPL:

val countup_from1 = fn : int -> int list
- countup_from1 5;
val it = [1,2,3,4,5] : int list

The countup_from1 function uses the helper closure count that captures and uses the variable x from its context.

In the second example, when I invoke a function create_multiplier t, I get back a function (actually, a closure) that multiplies its argument by t:

fun create_multiplier t = fn x => x * t

In the SML REPL:

- fun create_multiplier t = fn x => x * t;
val create_multiplier = fn : int -> int -> int
- val m = create_multiplier 10;
val m = fn : int -> int
- m 4;
val it = 40 : int
- m 2;
val it = 20 : int

So variable m is bound to the closure returned by the function call and now I can use it at will.

Now, for the closure to work properly throughout its lifetime, we need to extend the lifetime of the captured variable t (in the example it is an integer but it could be a value of any type). As far as I know, in SML this is made possible by garbage collection: the closure keeps a reference to the captured value which is later disposed of by the garbage collector when the closure is destroyed.

My question: in general, is garbage collection the only possible mechanism to ensure that closures are safe (callable during their whole lifetime)?

Or what are other mechanisms that could ensure the validity of closures without garbage collection: Copy the captured values and store it inside the closure? Restrict the lifetime of the closure itself so that it cannot be invoked after its captured variables have expired?

What are the most popular approaches?

EDIT

I do not think the example above can be explained / implemented by copying the captured variable(s) into the closure. In general, the captured variables can be of any type, e.g. they can be bound to a very large (immutable) list. So, in the implementation it would be very inefficient to copy these values.

For the sake of completeness, here is another example using references (and side effects):

(* Returns a closure containing a counter that is initialized
   to 0 and is incremented by 1 each time the closure is invoked. *)
fun create_counter () =
    let
        (* Create a reference to an integer: allocate the integer
           and let the variable c point to it. *)
        val c = ref 0
    in
        fn () => (c := !c + 1; !c)
    end

(* Create a closure that contains c and increments the value
   referenced by it it each time it is called. *)
val m = create_counter ();

In the SML REPL:

val create_counter = fn : unit -> unit -> int
val m = fn : unit -> int
- m ();
val it = 1 : int
- m ();
val it = 2 : int
- m ();
val it = 3 : int

So, variables can also be captured by reference and are still alive after the function call that created them (create_counter ()) has completed.

Giorgio
  • 19,486
  • 16
  • 84
  • 135
  • 2
    Any variables that are closed-over should be protected from garbage collection, and any variables that are not closed over should be eligible for garbage collection. It follows that any mechanism that can reliably track whether or not a variable is closed over can also reliably reclaim the memory that variable occupies. – Robert Harvey Mar 09 '13 at 00:04
  • So having safe closures are only possible with garbage collection (and in this case, by using garbage collection properly)? – Giorgio Mar 09 '13 at 00:06
  • @delnan: In other words, C++ does not guarantee that I can safely call all the closures I create. Can it be that this is connected with the lack of garbage collection? – Giorgio Mar 09 '13 at 00:10
  • My statement didn't imply that garbage collection was the only possible mechanism for tracking closed-over variables. But it is reasonable to conclude that garbage collection is a convenient, and probably reliable, method for tracking such variables. – Robert Harvey Mar 09 '13 at 00:11
  • @Robert Harvey: I did not interpret it in this way (that you implied garbage collection was the only possibility). I am curious to understand what the possible alternatives are. – Giorgio Mar 09 '13 at 00:16
  • Do you count reference counting as garbage collection? A number of languages, including Perl, implement reasonably complete closures and use reference counting under the hood. – btilly Mar 09 '13 at 01:05
  • 3
    @btilly: Refcounting is just one of many different implementation strategies for a garbage collector. It doesn't really matter how the GC is implemented for the purpose of this question. – Jörg W Mittag Mar 09 '13 at 01:13
  • @JörgWMittag I've seen people draw a distinction between reference counting and true garbage collection. In particular do cycles get collected? – btilly Mar 09 '13 at 02:16
  • 3
    @btilly: What does "true" garbage collection mean? Refcounting is just another way of implementing GC. Tracing is more popular, probably because of the difficulties of collecting cycles with refcounting. (Usually, you end up with a separate tracing GC anyway, so why bother implementing two GCs if you can get by with one.) But there are other ways of dealing with cycles. 1) Just forbid them. 2) Just ignore them. (If you're doing an implementation for quick one-off scripts, why not?) 3) Try to explicitly detect them. (It turns out that having the refcount available can speed that up.) – Jörg W Mittag Mar 09 '13 at 02:52
  • @btilly: There are also algorithms which, instead of having two completely separate GCs like CPython has for example, combine (aspects of) refcounting and tracing into a *single* GC algorithm. – Jörg W Mittag Mar 09 '13 at 03:07
  • @Jorg: There's a very important difference between GC and reference counting: reference counting is non-deterministic, with no guarantee that any given object will even be freed at all, whereas reference counting is deterministic. – Mason Wheeler Mar 09 '13 at 13:39
  • @MasonWheeler you'll want to proof-read that last comment. GC is non-deterministic :) It can have some issues with not removing objects who are still "in use", the nature of a GC will encourage circumstances where this will occur. – gbjbaanb Mar 09 '13 at 15:05
  • @gbjbaanb: Argh, and now it's too late to edit that. But yeah, that's what I meant. – Mason Wheeler Mar 09 '13 at 15:09
  • 1
    It depends on why do you want closures in the first place. If you want to implement, say, a full lambda calculus semantics, you definitely *need* GC, period. There is no other way around. If you want something which distantly resembles closures, but does not follow the exact semantics of such (like in C++, Delphi, whatever) - do whatever you want, use region analysis, use fully manual memory management. – SK-logic Mar 10 '13 at 10:58
  • @MasonWheeler, mind explaining, why, for example, stop-and-copy is "non-dererministic"? – SK-logic Mar 10 '13 at 11:00
  • @SK-logic: For the same reason as any other automatic GC system is non-deterministic: there is *no deterministic relationship* between an object no longer being needed and it getting cleaned up. Like other GC systems, stop-and-copy does not even guarantee that any given object will ever be collected. (It also wastes half of your memory and address space, which is kind of silly.) – Mason Wheeler Mar 10 '13 at 14:10
  • Also, what specifically can "real closure semantics" do that Delphi's closures cannot? [You've been asked about this before](http://programmers.stackexchange.com/questions/129530/), and your response was a bunch of mystical mumbo-jumbo and non-explanatory non-answers. Is there a real, quantifiable, explainable difference? – Mason Wheeler Mar 10 '13 at 14:34
  • 1
    @Mason Wheeler: I looked at the question you cited and I agree it would have been helpful to have a more concrete example. However (beware, I do not have much experience working with closures) I think there are some closure idioms that you would like to use in practice but for which you need a more general notion of closures (like in SML and, e,g., without the restrictions of Delphi). But this is outside the scope of this question: whatever notion of closures one language introduces, I wanted to understand what is needed to make those closures safe (no undefined behaviour on invocation). – Giorgio Mar 10 '13 at 14:57
  • @MasonWheeler, stop-and-copy do indeed guarantee that an unlinked object will be collected. It also guarantee compactification, it guarantee a deterministic GC timing and therefore used in hard realtime. – SK-logic Mar 10 '13 at 17:22
  • @MasonWheeler, as for semantics, it makes no sense to discuss it with you, sorry. Your understanding of even basic lambda calculus is patchy at most. If you fail to understand that lifetime of a closure is in general unpredictable, and there is no way to determine it statically, there is no point to discuss it any further. – SK-logic Mar 10 '13 at 17:24
  • 1
    @SK-logic: Again with the non-answers. You can't claim that it's pointless to discuss something because I "fail to understand" the very point that I'm asking you to prove. A closure's lifetime being unpredictable is *not* a fundamental axiom. Now either you can back up your claims, or you should stop making them to people with experience that contradicts them. – Mason Wheeler Mar 10 '13 at 17:29
  • Also, you seem a bit confused as to the meaning of "deterministic" in this context. It does not mean "hard realtime;" it means "strict cause-and-effect." Or, more specifically, it means "an object is freed **immediately after it is no longer needed,** and not at some point in the future that is determined by some external factor not related to whatever caused the object to no longer be needed." – Mason Wheeler Mar 10 '13 at 17:33
  • 2
    @Mason Wheeler: Closures are just values, in general it is not possible to predict how they will be moved around at runtime. In this sense, they are nothing special, the same would be valid for a string, a list, and so on. – Giorgio Mar 10 '13 at 17:43
  • @MasonWheeler, your definitions are vague. The fact that all the unused objects are *guaranteed* to be eliminated every *n* seconds renders memory behaviour definitely deterministic. – SK-logic Mar 10 '13 at 19:07
  • @MasonWheeler, start with reading on limitations of region inference. Say, here: http://en.wikipedia.org/wiki/Region-based_memory_management – SK-logic Mar 10 '13 at 19:07
  • @SK-logic: That is still not deterministic memory behavior. Imagine I allocate something large, requiring 100 MB of memory. Then, sometime soon after I finish using it, I need another 100 MB of memory. If my allocation was deterministic, I am guaranteed that the previous 100 is no longer in use, and I can reuse it. Under a GC scheme, the question of whether I can reuse the previous 100 MB or end up allocating another 100 (even though there's 100 MB of unused memory sitting around) depends on whether a GC pass has been run in the meantime. This shows that memory usage is non-deterministic. – Mason Wheeler Mar 10 '13 at 19:20
  • Also, if an object needing finalization falls out of use, and then the program shuts down, will it be finalized? If the collector does not run in the interim, no finalization takes place. This shows that the correctness of a garbage-collection system is non-deterministic. – Mason Wheeler Mar 10 '13 at 19:21
  • 1
    @Mason Wheeler, SK-logic: This is a very interesting discussion but I think it would be better to move ti somewhere else. What about opening a chat room? – Giorgio Mar 10 '13 at 19:29
  • 1
    @MasonWheeler, not all the GC-based systems provide finalisation semantics (and functional ones *should never* do it). Also, allocator response timing *is deterministic*. Memory profile is deterministic on a given time scale. Your example is irrelevant, since you cannot allocate memory if there is no free space (which triggers premature GC generations swap). I can give you an equally stupid counter-example with your link-counting pseudo-GC: if you accidentally free a root of a large tree, cleaning it up will stop the world for an unpredictable amount of time. Mathematica does it all the time. – SK-logic Mar 11 '13 at 09:19

4 Answers4

14

The Rust programming language is interesting on this aspect.

Rust is a system language, with an optional GC, and was designed with closures from the beginning.

As the other variables, rust closures come in various flavors. Stack closures, the most common ones, are for one-shot usage. They live on the stack and can reference anything. Owned closures take ownership of the captured variables. I think they live on the so called "exchange heap", which is a global heap. Their lifespan depends on who owns them. Managed closures live on the task-local heap, and are tracked by the task's GC. I'm not sure about their capturing limitations, though.

barjak
  • 1,720
  • 12
  • 18
  • 1
    Very interesting link and reference to the Rust language. Thanks. +1. – Giorgio Mar 09 '13 at 23:10
  • 1
    I thought a lot before accepting an answer because I find Mason's answer to be also very informative. I have chosen this one because it is both informative and it cites a lesser known language with an original approach to closures. – Giorgio Mar 21 '13 at 17:28
  • Thanks for that. I am very enthusiastic about this young language, and I am happy to share my interest. I didn't know wether safe closures were possible without GC, before I heard about Rust. – barjak Mar 25 '13 at 20:11
9

Unfortunately beginning with a GC make you a victim of the XY syndrom:

  • closures require than the variables they closed over live as long as the closure does (for safety reasons)
  • using the GC we can extend the lifetime of those variables long enough
  • XY syndrom: are there other mechanisms to extend the lifetime ?

Note, however, than the idea of extending the lifetime of a variable is not necessary for a closure; it's just brought over by the GC; the original safety statement is just the closed over variables should live as long as the closure (and even that is shaky, we could say they should live until after the last invocation of the closure).

There are, essentially, two approaches that I can see (and they could potentially be combined):

  1. Extend the lifetime of closed over variables (like a GC does, for example)
  2. Restrict the lifetime of the closure

The latter is just a symmetrical approach. It's not often used, but if, like Rust, you have a region-aware type system, then it's certainly possible.

Matthieu M.
  • 14,567
  • 4
  • 44
  • 65
7

Garbage collection is not needed for safe closures, when capturing variables by value. One prominent example is C++. C++ has no standard garbage collection. Lambdas in C++11 are closures (they capture local variables from the surrounding scope). Each variable captured by a lambda can be specified to be captured by value or by reference. If it is captured by reference, then you can say that it is not safe. However, if a variable is captured by value, then it is safe, because the captured copy and the original variable are separate and have independent lifetimes.

In the SML example you gave, it is simple to explain: variables are captured by value. There is no need to "extend the lifetime" of any variable because you can just copy its value into the closure. This is possible because, in ML, variables cannot be assigned to. So there is no difference between one copy and many independent copies. Although SML has garbage collection, it is not related to the capturing of variables by closures.

Garbage collection is also not needed for safe closures when capturing variables by reference (kind of). One example is the Apple Blocks extension to the C, C++, Objective-C, and Objective-C++ languages. There is no standard garbage collection in C and C++. Blocks capture variables by value by default. However, if a local variable is declared with __block, then blocks capture them seemingly "by reference", and they are safe -- they can be used even after the scope that the block was defined in. What happens here is that __block variables are actually a special structure underneath, and when blocks are copied (blocks must be copied in order to use them outside the scope in the first place), they "move" the structure for the __block variable into the heap, and the block manages its memory, I believe through reference counting.

user102008
  • 510
  • 2
  • 8
  • 4
    "Garbage collection is not needed for closures.": The question is whether it is needed so that the language can enforce safe closures. I know that I can write safe closures in C++ but the language does not enforce them. For closures that extend the lifetime of captured variables, see the edit to my question. – Giorgio Mar 09 '13 at 10:40
  • 1
    I suppose the question could be reworded into: *for safe closures*. – Matthieu M. Mar 09 '13 at 14:39
  • 1
    The title contains the term "safe closures", do you think I could formulate it in a better way? – Giorgio Mar 09 '13 at 15:46
  • 1
    Can you please correct the second paragraph? In SML, closures do extend the lifetime of data that is referenced by captured variables. Also, it is true that you cannot assign variables (change their binding) but you do have mutable data (through `ref`'s). So, OK, one can debate whether the implementation of closures is related to garbage collection or not, but the above statements should be corrected. – Giorgio Mar 09 '13 at 23:15
  • 1
    @Giorgio: How about now? Also, in what sense do you find my statement that closures do not need to extend the lifetime of a captured variable incorrect? When you talk about mutable data, you are talking about reference types (`ref`s, arrays, etc) that point to a structure. But the value is the reference itself, not the thing it points to. If you have `var a = ref 1` and you make a copy `var b = a`, and you use `b`, does that mean you are still using `a`?No. You have access to the same structure pointed to by `a`?Yes. That is just how these types work in SML and have nothing to do with closures – user102008 Mar 10 '13 at 10:33
  • As I wrote above, I can accept that garbage collection is not the only mechanism for implementing closures, but to my knowledge it is the mechanism used in SML. It is true that when you construct an SML closure you only copy a variable binding, which is immutable, but the value that is bound to the variable must be stored somewhere and must be kept around as long as the binding exists. – Giorgio Mar 10 '13 at 14:07
  • In your example: (`val a = ref 1`, `val b = a`), as long as either `a` or `b` exist, the data object (an integer, in this case) referenced by the value contained in these variables must be kept alive. The same happens e.g. with `val a = [1, 2, 3]`, `val b = a`: you allocate only one list of integers and you keep it alive as long as one of the two bindings (`a -> [1, 2, 3]`, `b -> [1, 2, 3]`) are still in the current environment. – Giorgio Mar 10 '13 at 14:12
  • @Giorgio: Right, so I am saying that the "variable"'s lifetime is not extended. The thing that the variable points to is another matter. For things like `ref`s, arrays, lists, and other structures like that, the structure's lifetime is always dynamic anyway, and not tied to any scope. For example, with `val a = ref 1`, you can then return `a` from the function; `a`'s lifetime is not extended, but the data structure is still accessible in other scopes. So data structures in SML fundamentally use garbage collection, even without closures. Adding closures does not change anything. – user102008 Mar 11 '13 at 02:07
  • @Giorgio: Let's consider C++, a language without garbage collection but does have safe closures, if you do not capture by reference. A lambda in C++ is just an anonymous functor class, where captured variables are stored as instance variables. So your question is equivalent to "Is garbage collection needed for implementing safe classes?" e.g. if you are asking, how would I capture xyz safely in a closure that may outlive this scope? you can equivalently ask, how would I store it safely in an object which may outlive this scope? The answer to that will answer your original question. – user102008 Mar 11 '13 at 02:09
  • @user102008: Thanks for your clarification. Maybe I should change the formulation of my question to "Is garbage collection needed for **enforcing** safe closures?", or something like that. My point is that "every closure I can create in a language can always be invoked safely", not only the ones that are implemented properly. Regarding SML and garbage collection, I did not mean that garbage collection was introduced in SML only to ensure safe closure, but that without having garbage collection one would have to look for another mechanism. – Giorgio Mar 11 '13 at 08:31
  • @user102008: "if a variable is captured by value, then it is safe". Not it isn't. The value can contain a pointer. – J D Jan 27 '16 at 00:26
  • @user102008: "in ML, variables cannot be assigned to". SML is an impure language with full support for mutation. – J D Jan 27 '16 at 00:27
  • @user102008: "the block manages its memory, I believe through reference counting". RC is a form of GC. – J D Jan 27 '16 at 00:33
  • @JonHarrop: "Not it isn't. The value can contain a pointer." So? There are also pointer variables outside closures. A pointer variable captured by a closure is no less safe than a pointer variable outside a closure. – user102008 Jan 27 '16 at 01:25
  • @JonHarrop: "SML is an impure language with full support for mutation." Did I every say that SML is a pure language or doesn't mutation? No. Read what I said. – user102008 Jan 27 '16 at 01:25
  • @JonHarrop: "RC is a form of GC." Under some definitions, yes. I should have said "tracing garbage collection", which is what people are talking about in this question. – user102008 Jan 27 '16 at 01:28
  • @user102008: Yes, see: https://www.cs.virginia.edu/~cs415/reading/bacon-garbage.pdf – J D Jan 27 '16 at 12:54
6

Garbage collection is not necessary in order to implement closures. In 2008, the Delphi language, which is not garbage collected, added an implementation of closures. It works like this:

The compiler creates a functor object under the hood that implements an Interface representing a closure. All closed-over local variables get changed from locals for the enclosing procedure to fields on the functor object. This ensures that the state is preserved for as long as the functor is.

The limitation to this system is that any parameter passed by reference to the enclosing function, as well as the function's result value, cannot be captured by the functor because they are not locals whose scope is limited to that of the enclosing function.

The functor is referred to by the closure reference, using syntactic sugar to make it look to the developer like a function pointer instead of an Interface. It uses Delphi's reference-counting system for interfaces to ensure that the functor object (and all the state it holds) remains "alive" as long as it needs to, and then it gets freed when the refcount drops to 0.

Mason Wheeler
  • 82,151
  • 24
  • 234
  • 309
  • 1
    Ah, so it is only possible to capture local variable, not the arguments! This seems a reasonable and clever trade-off! +1 – Giorgio Mar 09 '13 at 00:28
  • 1
    @Giorgio: It can capture arguments, just not ones that are **var** parameters. – Mason Wheeler Mar 09 '13 at 00:31
  • Yes, sure, I overlooked that. And I guess the enclosing function / procedure accesses its capture local variable transparently, even though in the implementation they are stored inside the functor object. – Giorgio Mar 09 '13 at 00:33
  • @Giorgio: Exactly. The compiler takes care of that for you. – Mason Wheeler Mar 09 '13 at 00:34
  • 1
    I haven't used Delphi since the time it was called Turbo Pascal, but this approach to closures seems coherent with Pascal's design principles. – Giorgio Mar 09 '13 at 00:38
  • 2
    You also lose the ability to have 2 closures who communicate through shared private state. You won't encounter that in the basic use cases, but it limits your ability to do complex stuff. Still great example of what is possible! – btilly Mar 09 '13 at 01:01
  • 3
    @btilly: Actually, if you put 2 closures inside the same enclosing function, that's perfectly legal. They end up sharing the same functor object, and if they modify the same state as each other, changes in one will be reflected in the other. – Mason Wheeler Mar 09 '13 at 01:03
  • @Mason Wheeler: I was going to ask the same question, i.e. how is the case of two closures capturing common variables handled. Nice to find the answer to that already. Regarding the way in which the common functor object is handled (ref counting): can this be considered some kind of simple garbage collection? – Giorgio Mar 09 '13 at 09:51
  • @Giorgio: No. Garbage collection is non-deterministic in nature; [there is no guarantee that any given object will ever be collected, let alone when it will happen.](http://programmers.stackexchange.com/questions/129530/what-are-the-complexities-of-memory-unmanaged-programming/129555#129555) But reference counting is deterministic: you are guaranteed by the compiler that the object will be freed immediately after the count falls to 0. – Mason Wheeler Mar 09 '13 at 13:37
  • @Mason Wheeler: I also find it interesting that reference counting is sufficient here: since there is one shared functor object collecting all the captured variables defined inside one function, it is impossible to create cyclic references between such objects. At list this is my intuition. – Giorgio Mar 09 '13 at 13:49
  • @Giorgio What happens if you close over a function pointer, and assign one of the functions that closed over it to it? – CodesInChaos Mar 09 '13 at 14:51
  • @CodesInChaos: Do you mean a closure that (1) is assigned to a function pointer and (2) references (invokes) the pointer in its body? This would seem like a recursive definition to me. – Giorgio Mar 10 '13 at 17:57
  • 2
    @MasonWheeler: "No. Garbage collection is non-deterministic in nature; there is no guarantee that any given object will ever be collected, let alone when it will happen. But reference counting is deterministic: you are guaranteed by the compiler that the object will be freed immediately after the count falls to 0.". If I had a dime for every time I heard that myth perpetuated. OCaml has a deterministic GC. C++ thread safe `shared_ptr` is non-deterministic because destructors race to decrement to zero. – J D Jan 27 '16 at 00:30
  • 1
    @MasonWheeler: "The limitation to this system is that any parameter passed by reference to the enclosing function, as well as the function's result value, cannot be captured by the functor because they are not locals whose scope is limited to that of the enclosing function". If it cannot close over its environment then it is not a closure. You might as well say that C functions are closures than cannot close over anything. – J D Jan 27 '16 at 00:31