3

I think I generally know what the Garbage Collector in Java does, but It's praised a lot, so I thought maybe I'm missing something about it's functionality.

What I know is, that the GC takes care of erasing from the memory objects that have no reference to them, and thus are unreachable by the programmer.

For example, if inside a loop I'm constantly creating a new Object(), the previous ones will eventually get deleted by the GC (correct me if I'm wrong).

This is very useful for me as a Java programmer, and spares me a lot of headache that I assume C++ programmers have to deal with - manually deleting objects from the memory (again, correct me if I'm wrong).

This feature is awesome, but is there anything more to the Garbage Collector? Am I missing something?

Kilian Foth
  • 107,706
  • 45
  • 295
  • 310
Aviv Cohn
  • 21,190
  • 31
  • 118
  • 178
  • 1
    [This](http://lifeofasoftwareengineer.tumblr.com/post/62681511158/java-garbage-collection) illustrates it pretty well ;) – back2dos Mar 01 '14 at 12:19
  • Your assumption about C++ programmers is wrong. A C++ programmer would create the Object on the stack (allocated without 'new'), so it would get destructed when the loop ends. All generated by the C++ compiler, so the C++ programmer doesn't need to worry about it. No leak, no problem, no headache. – Sjoerd Mar 04 '14 at 00:24
  • @Sjoerd If so, than why is the GC praised so much? – Aviv Cohn Mar 04 '14 at 08:27
  • @Prog, because this way you can only serve relatively primitive data structures. Explicit memory management is incompatible with higher level semantics. Think of, say, a system equivalent to a simple untyped lambda calculus, but without an implicit GC. – SK-logic Mar 04 '14 at 09:17

5 Answers5

12

For the visually minded, here's a parable by Kieron Briggs:

Picture a big empty room with a big furnace/incinerator type thing at one end. Hanging form the roof are a number of ropes, called Threads. Attached to the various threads are little sparkly Objects, and those Objects can have other Object attached to them in turn by little rods called References. This creates a (hopefully) beautiful structure of Objects all attaches (either directly or indirectly) to a Thread. You can even have Objects which link to more than one Thread.

When an Object isn't needed any more, the Reference to it disappears. If that leaves the Object (or a whole collection of Objects) unattached to any Thread, it will fall down onto the floor and shatter. This gradually builds up a layer of broken objects lying around the floor. In a language like C, eventually the floor would become so full that there was no room for more Objects, and Bad Things would happen.

But in Java, we have a Garbage Collector. This is a little dude in overalls that climbs down a special Staff Only Thread, sweeps up all the broken Objects on the floor, and shovels them into the incinerator. You never know exactly when he's going to come along, but he's always there keeping an eye on the mess on the floor to make sure that it doesn't fill up too much...

  • more formal description: mark-and-sweep algorithm

    ...each object in memory has a flag (typically a single bit) reserved for garbage collection use only. This flag is always cleared, except during the collection cycle. The first stage of collection does a tree traversal of the entire 'root set', marking each object that is pointed to as being 'in-use'. All objects that those objects point to, and so on, are marked as well, so that every object that is ultimately pointed to from the root set is marked. Finally, all memory is scanned from start to finish, examining all free or used blocks; those with the in-use flag still cleared are not reachable by any program or data, and their memory is freed. (For objects which are marked in-use, the in-use flag is cleared again, preparing for the next cycle.)

https://i.stack.imgur.com/7Meuu.jpg

gnat
  • 21,442
  • 29
  • 112
  • 288
  • 1
    I'd suggest that in a typical system, objects don't fall to the floor when no references exist; rather, objects spend most of their time sitting on the floor. Periodically, however, the system will pull up on the threads so every object which is attached will get lifted off the floor. At that point, anything which isn't lifted off the floor may be swept up wholesale. – supercat Dec 29 '14 at 18:41
  • @supercat your way seems to fit better to formal description of mark-and-sweep referred here, "flag... always cleared, except during the collection cycle". First stage of collection (tree traversal) indeed sounds like lifting objects from the floor. I think though that Kieron Briggs (parable author) just didn't want to dive into these details – gnat Dec 29 '14 at 20:34
  • It may have been an effort at simplification, but I think a key aspect of GC centers around the fact that no mechanism needs to distinguish the detachment of the last reference to an object from that of any other. An object will semantically cease to exist when the last reference is detached, but it won't generally cause anything to happen that could be considered analogous to "falling to the floor". – supercat Dec 29 '14 at 20:37
7

No, not missing anything about Garbage Collection.

Garbage collection frees developers from the minutia of allocating and de-allocating memory themselves.

Being part of the platform means there is a whole class of problems that Java does not have (and .NET and other garbage collected runtimes as well).

What you may be missing is just how much trouble having to deal with memory allocations and de-allocations and the bugs inherent with not getting them right really is ;)

Oded
  • 53,326
  • 19
  • 166
  • 181
  • 1
    My sentiment exactly. You cannot possibly grasp just how much effort and trouble garbage collection saves you until you've had to do without it on a largish project. – Kilian Foth Mar 01 '14 at 09:24
  • @KilianFoth Thanks both of you. As a person who only ever developed 'seriously' in Java, I want to ask: Does the need to allocate and de-allocate memory manually, affect the structure of the entire application, or at least it's functions, significantly? Does it force you to design things differently than you would in a language with a GC like Java? – Aviv Cohn Mar 01 '14 at 09:26
  • @Prog - simple example. You add a field to a class. You now need to ensure that every place you use it does allocate the correct memory for it. If you do not, you will overwrite some other memory with data from the changed class. And you don't know **what** you will overwrite and what the effects on the system will be - they can be different on every run. – Oded Mar 01 '14 at 09:31
  • You make it sound as though garbage collection were the only way to deal with these issues. And as if it had no down sides. – back2dos Mar 01 '14 at 12:23
  • @back2dos - where did I say that? You are reading more into my answer than I put there. – Oded Mar 01 '14 at 12:25
  • 1
    GC is nice for small applications, with large applications it won't save you from thinking about object life times and data structures to be memory efficient. There it an even be a hindrance as you have less control. i.e. in C++ I can decide whether to put my objects on the stack or free store (heap) and can design my data structures to fit in a continuous memory block whereas java forces me to use references to other objects adding indirection levels. – johannes Mar 01 '14 at 14:02
  • @Oded: Maybe you should reread your very first sentence. – back2dos Mar 01 '14 at 16:18
  • @back2dos - perhaps you should read the question. OP is asking if their high level understanding of GC is complete. – Oded Mar 01 '14 at 16:19
  • @Oded Or maybe you are reading too much into the question - which asks a whole lot of things (and hasn't been flagged "too broad" without reason). Your answer is misleading and superficial. And really seems just like an endorsement of your preferred style and platform. I felt it important to point out what's wrong with it. You're free to ignore my comments ;) – back2dos Mar 01 '14 at 16:39
  • 2
    Note that I hope no large and serious project in a language like C++ still manages memory totally automatically. With things like unique_ptr, shared_ptr, and so on you essentially get "garbage collection". The problem is that it's essentially a reference counting GC, which most languages have moved away from for good reason--it's slow and precludes a lot of really sweet optimisations. Still, 95% of the time that performance is irrelevant, so I hope that people are only actually doing manual memory management that last 5%. – Phoshi Mar 03 '14 at 11:20
  • 2
    Your answer maybe understates the power and performance of algorithms. Java's gc (probably other languages too) is extremely good at managing short lived objects making object allocation an extremely cheap operation. Not only does it prevent bugs but it can also improve performance vs a naive manual approach. – assylias Mar 04 '14 at 09:46
  • @assylias: You have absolutely no numbers to back that up. I would also point out that object pools are common place in Java for performance critical sections. And that the non-determinism and blocking of conventional GC can become an issue just as easily. GCs are *one* of *many* choices of dealing with memory management. There's really no need to glorify any of them. – back2dos Mar 07 '14 at 10:55
  • @back2dos See for example http://programmers.stackexchange.com/a/149569/47845 - I did not say GC is the only option, only that it has become very efficient (BTW object pooling can actually perform worse in some cases). As for the blocking of the GC - it may or may not be an issue depending on your use case (and even that issue has been reduced significantly by modern concurrent GC algos) - if you need pure real time then yes, GC will be an issue. So I do believe that a modern GC will perform way better than what 99% developers could do by managing memory manually. There is the 1% of course. – assylias Mar 07 '14 at 11:11
3

You're not missing anything. However, I would like to point out something that I don't think any of the other answers have addressed clearly. Garbage collection is about program correctness and security first and foremost; convenience is secondary (although it is very convenient.)

If the programmer manages memory manually, making a mistake means you corrupt the program's state. The program keeps going but execution's gone off the rails and no longer has any meaning. It could fail catastrophically right away, but will probably keep going in a way that's seemingly correct but very subtly wrong, possibly for days or weeks before anyone notices the problem. Even worse is that memory bugs could be exploited by attackers. If the buggy program runs with administrator privileges, and the bug allows an attacker to run arbitrary code, the potential for damage is almost unlimited. This sort of bug is very hard to detect and impossible to avoid; sooner or later you will introduce a memory bug because we're all human. A language that outright forbids manual memory management is guaranteed to be free of this type of bug (though memory bugs are definitely not the only way to introduce a security vulnerability.)

Lack of garbage collection generally precludes complicated object structures. For the most part you're limited to creating objects that either always exist, or have a single owner that's responsible for deleting them. That rules out, for example, persistent data structures, because any given node could be shared by many objects and there's no reliable way of managing that. Using smart pointers with reference counting is not a solution - their performance is bad and cylic references (two objects pointing to each other) won't be freed. You can introduce weak references to avoid the latter problem, but now you've made manual memory management even harder to get right (and the performance is still bad). Garbage collection enables these complicated data structures to be cleaned up correctly and in an efficient manner.

Ironically, being deprived of persistent data structures makes the program harder to reason about, because changes to the data structure are destructive and now you have to be very careful about who has references to it and who's making changes to it. So on top of memory bugs, you have to worry about additional state/aliasing bugs as well.

Doval
  • 15,347
  • 3
  • 43
  • 58
0

A similar analogy to gnat's answer is to imagine someone sat at a desk in a paper based office. The person performs tasks using paper documents (objects) and needs to have these bits of paper on their desk. When they have finished and no longer need to reference a document they throw it in the bin.

The desk and the bin have a finite capacity (memory). The GC is the janitor who comes along occasionally and empties the bin so that capacity isn't exceeded*.

*Of course you can always run out of memory by putting too many documents on your desk. There's nothing the janitor can do about this.

Qwerky
  • 1,582
  • 8
  • 17
  • 3
    That's not really a good analogy, as the person is deliberately putting the paper in the bin (effectively, it's explicitly marked as "discarded" when its placed in the bin). A better analogy is someone not putting rubbish in the bin at all, and then expecting the janitor to search through everything in the office looking for papers that could be discarded. – Brendan Mar 03 '14 at 14:05
0

There is one more thing you can do with GC besides simply offloading your memory management to it: weak references, which could be used to offload your caches management onto GC as well.

Your weak objects are reachable and can still be used at some point, but you're instructing GC that it's ok to get rid of them at any moment, and you don't really mind losing them.

SK-logic
  • 8,497
  • 4
  • 25
  • 37