Why not free memory as soon as its reference counter hits zero

Question

A lot of languages like Java and C# have garbage collectors that free memory when that memory no longer has any reference. Yet they don't immediately free it after the reference counter hits zero but instead every once in a while they check on all the memory to see if that memory has any reference and delete it if it doesn't. What is the benefit of doing that way?

The downside to doing it that way is that you lose the destructor as you can't guarantee when it will be called.

I would imagine that it is done that way because of performance, but has there been any study that shows that a garbage collector that works like that has a better performance then std::shared_ptr found in C++?

because then you'd need to keep a reference counter and that can cascade down heavily — ratchet freak, Dec 09 '13 at 23:47
`std::shared_ptr` cannot properly detect when some data needs to be freed. Example: Cyclic references. Functional languages often need the full power of a non-reference counting collector to deal with closures and such. The alternative (counted pointers) is too messy for users to be a real option. — Thomas Eding, Dec 09 '13 at 23:55
Strongly related: http://programmers.stackexchange.com/questions/113209/is-garbage-collection-necessary?rq=1 — Robert Harvey, Dec 10 '13 at 04:11
What is the benefit of spending time freeing memory that isn't needed immediately, freeing when it is demanded is a much more intelligent heuristic. — , Dec 10 '13 at 05:28
Answer could be very simple: "Yet they don't immediately free it after the reference counter hits zero" : there is no reference counter to hit 0. — Pieter B, Dec 10 '13 at 09:13
say A refers to B and B refers to A, but nothing refers to either of them. They both have a reference count of 1. — Peter Lawrey, Dec 10 '13 at 10:26
I believe Python uses a reference counting GC with occasional heap traversals to break cyclic references, which is another option. You don't have to be *totally* purist about it. That said, there are reasons why other, faster languages don't do this, and I'd wager the main reason Python is fine with that big heavy atomic inc/dec of a reference counter is that it's all kept synchronous by the GIL anyway. — Phoshi, Dec 10 '13 at 10:46

score 23 · Accepted Answer · answered Dec 10 '13 at 00:41

23

Because in order to free memory as soon as the reference counter hits zero, you have to keep a reference counter. And that doesn't come for free. Typically, it limits your throughput.

There are generally two major strategies for implementing garbage collectors: tracing collectors and reference counting collectors. (There are others, but those are the ones in use by most mainstream automatic memory management systems.)

Typically, reference counting GCs tend to have worse throughput but better (and more predictable) latency than tracing collectors whereas tracing collectors have better throughput but higher and less predictable latency. Another big problem with (at least with simple implementations of) reference counting garbage collectors is that they can't garbage collect cycles. You typically need to run a tracing collector in conjunction with a reference counting collector anyway (that's what CPython does, for example).

Practically speaking, all modern industrial-strength high-performance automatic memory management systems (all of the collectors in Oracle JDK, Oracle JRockit and most other JVMs, Microsoft CLR, Mono, most ECMAScript implementations, all Ruby implementations, almost all Python implementations, all Smalltalk implementations, all Lisp implementations etc.) are tracing collectors, so there is a bit of a self-reinforcing feedback loop here: more money gets put into research on tracing GCs because they are popular, and they become more popular because they get better because of the money spent on their research … and so on.

answered Dec 10 '13 at 00:41

Jörg W Mittag

101,921
24
218
318

Perl (see [perlref](http://perldoc.perl.org/perlref.html)) and objective C ([SO](http://stackoverflow.com/questions/6578/understanding-reference-counting-with-cocoa-and-objective-c) ... even with [ARC](http://clang.llvm.org/docs/AutomaticReferenceCounting.html) (static code analysis to generate retain/release)) use a reference count system. The later (clang & Apple) is certainly not without its own research being done. In particular, iOS had a trade off on how garbage collection worked for cpu/battery being a premium. – Dec 10 '13 at 01:51
I'm sure Jörg already knows, but [Concurrent Cycle Collection in Reference Counted Systems](http://academic.research.microsoft.com/Paper/183901.aspx) is certainly possible, though fiddly. It's impossible to choose a safe order for destructor/finalizer calls in a garbage cycle, though - irrespective of how garbage is found. – Dec 10 '13 at 04:44
@Steve314: the link is a 404. Are you referring to the [Recycler](http://researcher.watson.ibm.com/researcher/view_project.php?id=3385)? – Jörg W Mittag Dec 10 '13 at 10:49
1

@MichaelT: I think the iphone is a pretty special case here, though. The majority of modern systems are not heavily RAM constrained computers with limited scope for concurrency. Reference counting really shines under that sort of environment, but it isn't representative of, say, your average desktop computer. (Nor will it necessarily be representative of mobile devices for long, I'd be willing to bet iOS will move to a tracing GC within the next decade or so) – Phoshi Dec 10 '13 at 10:49
1

@Phoshi : The majority of computers sold ARE memory contrained: smartphone sales have already in 2011 overtaken traditional computer sales and tablets are projected to overtake traditional computer sales end of this year. – Pieter B Dec 10 '13 at 10:54
I've seen Azul's tracing GC handling 700 parallel threads generating 20 GB/s of garbage on an 864 core system collecting a 500 GB heap without breaking a sweat or introducing noticeable pause times. I don't know enough about Perl or Objective-C, so maybe they can do that, too, but the other reference counting collectors I know of certainly can't. – Jörg W Mittag Dec 10 '13 at 11:00
@PieterB: The majority of *sales* are, the majority of existing systems are not, and the majority of *useful* existing systems are not. Furthermore, most non-apple flagships these days are far from what I'd call memory-constrained, and flagships are only a generation or two away from being low/midrange these days. I don't think the days of low-memory phones will last long enough for the higher sales to make a dent. – Phoshi Dec 10 '13 at 11:06
@Jörg W Mittag - the link is working for me, but [here's an alternative link](http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.18.9603) for the same paper. The authors seem to have done a lot of work in reference-counting garbage collection. This paper does also seem to be linked in that publications section of that "Recycler" page, along with more recent papers. – Dec 10 '13 at 13:13
1

@Steve314: It works for me, now, too. Must have been a temporary glitch. Yes, that's the Recycler paper, I'm well aware of that. PHP has adopted the non-concurrent version of that algorithm as of PHP 5.3. Interestingly, the same authors have also developed a tracing GC which has *even lower* pause times than their reference counting GC: Metronome, whose pause times are lower than thread context switching times of typical modern OSs (~250 µs as opposed to 6 ms for the Recycler). It is sold as part of IBM WebSphere RealTime. – Jörg W Mittag Dec 10 '13 at 13:19
1

@PieterB is correct, smartphones and tablets dominate computer sales and all of them have heavily constrained memory. See http://sealedabstract.com/rants/why-mobile-web-apps-are-slow/ for a great overview of how garbage collection issues are very real in the mobile/tablet world. – Michael Shopsin Dec 11 '13 at 16:07
Relocating tracing garbage collectors can operate under very tight memory constraints; while it's possible to design a relocating collector that keeps for each object a list of references to it, the overhead generally ends up being higher than for a tracing collector. On a machine with a 64K heap, one could have an efficient tracing collector store arrays of strings references at a cost of two bytes per string reference, and store string objects with an overhead of only one byte for short strings (e.g. under 64 bytes), and slightly more for longer ones (it may be helpful... – supercat Mar 18 '15 at 17:07
...to limit the maximum size of any single GC object, and accept a little extra overhead on longer strings to facilitate such subdivision). There's no way a reference-counting collector would get its overhead that low. – supercat Mar 18 '15 at 17:14
You are hinting that there are other strategies than tracing and reference counting. Of course, you also have mixed strategies. But what do you have in mind precisely? Can you give some reference? – babou Apr 22 '15 at 15:11

score 14 · Answer 2 · answered Dec 09 '13 at 23:42

14

Because it would be too expensive to do it continuously.

"Mark and sweep" garbage collectors periodically sweep the memory space for objects that have been dereferenced. Generational garbage collectors sweep those objects first that are most likely to have been disposed of recently (which turns out statistically to be the objects most recently created). Objects that have been around for awhile get put into a second or third generation for sweeping later.

Garbage collection works very much like your trash bin: your trash man does not check your can every minute for trash, or even every day, although you might check the cans inside your house every day. Checking every minute or every hour would be too inefficient, for both you and the trash man.

answered Dec 09 '13 at 23:42

Robert Harvey

198,589
55
464
673

You don't need to do it continuously, just when the object has been dereferenced. The same way `std::shared_ptr` does in C++. – Caesar Dec 09 '13 at 23:46
7

@Caesar but you'd need to increment the counter when you add a reference, and that needs to happen in a thread-safe manner, plus the freeing can cascade down – ratchet freak Dec 09 '13 at 23:48
@ratchetfreak nothing prevents you from doing it asynchronously, or in steps, so it's not an issue. – Ismael Luceno Dec 10 '13 at 01:22
1

@Ismael you still need a thread safe counter with atomic increment/decrement. These things don't grow on trees. At the very least, you prevent two threads from referencing/unreferencing the same object at the very same clock cycle and you cause a lot of cache stalling because the cache page ownership jumps around the cores like crazy, for pretty much every memory page. Cache stalling hurts. – John Dvorak Dec 10 '13 at 06:08
@JanDvorak if it's atomic, it's thread-safe, and you have at least atomic exchange in most architectures. – Ismael Luceno Dec 26 '13 at 20:00
@Ismael "atomic exchange" is a concept that _still_ needs the CPU core to own that memory page. The L1 cache is per-core, meaning that at most one core can own a memory page. If one CPU core wants to read/write a page, its cache has to ensure the core sees a fresh page. If you don't want to announce every write to every cache (that would be too much data to handle for the L1-L2 bus), only one cache can have fresh data at a given time, only uploading it to the L2 or another L1 once it's done using it or another cache wants to write. Transferring a 32-byte page takes 32 clock cycles. That's LOT. – John Dvorak Dec 26 '13 at 20:08
The only thing "atomic exchange" ensures is that some other core does not steal that page between your read and your write. All it does is that the core asks for a "read and lock until write" instead of just plain "read". "Read" is pretty expensive if some other core has just did "write". You need to actually fetch that memory page, and the other core will forget that page as soon as _you_ write. It _will_ want to read from it in the very next cycle. Congratulations, CPUs now spend 97% time passing data around over a congested highway that normally sees one road train per hour. – John Dvorak Dec 26 '13 at 20:15
@JanDvorak I don't see why would you be messing with the counter all the time, it just needs to be done twice for each "task". – Ismael Luceno Dec 28 '13 at 04:45
If you go like mad, adding and removing refs, you got something wrong in the design. shared_ptr is probably too naive. At worst, the penalty should be still below 10%, even if the GC works linearly and doesn't care about sorting objects by size/type (which could tell a lot about their lifetime too), but if it's clever, you could get much better performance. – Ismael Luceno Dec 28 '13 at 05:06
BTW, I meant atomic test-and-exchange. – Ismael Luceno Dec 28 '13 at 05:21

score 2 · Answer 3 · 2013-12-10T15:04:32.600

2

Garbage collection based on reference counting is much slower than tracing garbage collectors (Boost shared_ptr vs other solutions, or you can find links to more scientific comparisons at the Boehm GC's website). In addition, reference counting alone is insatisfactory, because it cannot deal with reference cycles.

C++ programmers tend to think about memory as just a resource, since C++ manages memory and other resources the same way. Garbage collection is special approach that can only manage memory, which is in fact the most often used resource.

This often confuses C++ programmers - they tend to think of finalizers as a poor substitute for C++ destructors, which they are not. To release sparse resources (not memory) in a deterministic manner, use finally blocks.

edited Dec 10 '13 at 15:04

answered Dec 10 '13 at 11:43

Can you please add a comment briefly stating the reason of the downvote? – Dec 10 '13 at 15:05
I assume they didn't like the idea C++ developers get confused... I find its the GC devs that think finalisers are an equivalent of C++ destructors, not realising that GCs require manually managing object lifetimes, like C. Have a +1 for pointing that out though. – gbjbaanb Dec 10 '13 at 16:35

score 1 · Answer 4 · answered Dec 10 '13 at 10:32

A reference in a 64-bit JVM typically use 4-bytes of memory (using compressed oops) This means using a reference count will increase the size of memory per object, but also increase complexity of passing references around. Every time a thread anywhere has to obtain a reference to an object it need to increment and later decrement the counter, something a GC basis system doesn't need to to do.

BTW in C++ a shared pointer has two pointers (each 64-bit) one to the object and one to the reference counter. On many systems this reference counter uses 32-bytes as this is the minimum allocation cost. In total shared pointers replace a 4-byte reference with a 16 bytes shared pointer and a 32 byte reference. That doesn't sound very CPU cache friendly.

The problem with JVM GCs is the stop-the-world collections which are fine for throughput processing but a real pain for low latency systems. One way around this is to use a concurrent collector, the best being Azul's Zing which is pause less (though not pauseless ;)

Why not free memory as soon as its reference counter hits zero

4 Answers4