What's the canonical way to do compaction on a refcounted heap?

Question

The only way I know of is to copy the whole heap by allocating copies of all objects on a new heap and dropping the old one, like couchbase db does for example. Presumably you could also do the same on a subset of the heap to copy less data around each time. Google isn't helping me much, since it only talks about heap compaction in gc. Are there other ways to do this?

Since refcounting is a kind of GC – just with explicitly represented liveness – could you explain why compacting GC techniques don't translate to refcounting? Also lol @ all the people downvoting this because they don't understand the question? — amon, Mar 08 '20 at 10:44

score 1 · Answer 1 · answered Mar 08 '20 at 13:49

Compacting allocators are unusual, because most of the time you'd rather use GC instead. However, there are legitimate scenarios where compaction becomes necessary, such as long-lived native programs where no GC is available.

The first important point is that compaction makes no sense for very large or very small objects.

Very large objects should not be part of typical allocator arenas, but should be mmap()ed to separate pages that can be released back to the OS without memory fragmentation. The glibc malloc() implements this.
Very small or fixed-sized objects can be managed by a simple arena allocator. When there are holes in the arena after deallocation, they can be reused for the next allocation of same size without any fragmentation. The glibc malloc() implements this as a heuristic.

Furthermore, objects with a fixed lifetime do not need to be heap-allocated, instead stack allocation is preferred. If the default stack size is too small, that can be tuned.

This leaves medium-sized and variable-sized objects of indeterminate lifetimes as the problematic candidates for compacting memory management.

Compaction requires that all pointers to a relocated object can be updated. This is most easily done when there's only a single pointer, which we can ensure by a level of indirection: the refcounted pointer doesn't go directly to the object, but to an arena-allocated fixed-size pad that contains the actual pointer:

    ptr                user's smart pointer object
     |
     v
----+-----+----        arena of indirection pointers
... | rc  | ...
    | ptr |
----+-----+----
      |
      v
      +--------+       heap with relocatable objects
      | object |
      +--------+

The fixed-sized indirection pointers are also a good location for tracking additional memory management metadata such as refcount and object size.

The difficult part is deciding when compaction can be performed. Often, parts of the relocatable object will be referenced temporarily by non-refcounted pointers, e.g. when reading and writing a field of the object. Relocation cannot be performed while such pointers are active. As a consequence, the application must be designed in a manner so that either

there are clear points when there are no non-managed pointers active (could be the case if there's some kind of request lifecycle); or
access through the refcounted pointer to the object is only allowed within critical sections, which are tracked via a semaphore-like counter, letting compaction skip objects that are currently in use.

However, all of these techniques – indirection, usage counters, refcounting in general – will degrade performance and will somewhat thrash your caches. For throughput-oriented applications, true GC is better.

Note that you can use the same pointer mechanism for all managed objects, whether they use indirection for relocation or not. For example, you could tag the pointers to indicate whether they should be dereferenced through a relocation pad.

Personally, I would avoid implementing relocating refcounting. If I have full control over the object model I might as well use stronger GC techniques. And long-lived processes can almost always be transformed into shorter-lived processes, e.g. by extracting long-lived state into a process-external database and regularly restarting worker processes.

score 0 · Answer 2 · answered Mar 08 '20 at 11:43

0

Most people don’t try to do any heap compaction. You’d have to justify that you get anything useful out of compaction, in comparison to the time spent, the coding effort, and possible restrictions.

(For example, heap compaction in Swift would require the ability to move arbitrary C++ objects. Even C++ doesn’t do that, so that would be very tricky. )

answered Mar 08 '20 at 11:43

gnasher729

42,090
4
59
119

1

Sure, I'll have to update pointers too. I have a 100gb+ heap that's mostly untouched for weeks on end so compaction makes sense. Think zipfian distribution of updates on the heap. I'm currently considering rebooting the app every week or so by using a live/failover pair but there might be a better way, hence the question :) – Filip Haglund Mar 08 '20 at 12:14

What's the canonical way to do compaction on a refcounted heap?

2 Answers2