How can I efficiently implement batch allocation in a freelist?

Question

I've been implementing Andrei Alexandrescu's allocators as an exercise, but I've gotten stuck on his Freelist. Andrei's Freelist looks something like this:

template<class BaseAllocator,
    std::size_t kMinSize,
    std::size_t kMaxSize /* Actual size of each block on the freelist */,
    std::size_t kBatchSize,
    std::size_t kMaxRemembered>
class Freelist
{
    // ...
};

Everything but the kBatchSize has a clear implementation to me. Let's take a specific example:

Freelist<A, 8, 8, 8, someLargeNumber>

Basically, a freelist which stores blocks of size 8 on its list, but allocates 8*8=64 bytes from the parent allocator when asked to allocate 8, threading the remaining bytes onto the freelist.

Some of my thoughts for how to implement this are as follows (assume n is the number of blocks on the freelist):

Join adjacent memory blocks when adding free'd blocks to the freelist. This loses the O(1) complexity that we previously had, and also has the problem that it's possible to join blocks that are actually from different allocations.
I could modify the previous technique to always allocate memory aligned to the batch size. Then, I could use the alignment to determine if two blocks are from different allocations. This still has the complexity problem, and restricts kMaxSize and kBatchSize to powers of two (as alignment must be a power of two)
I could have a std::vector<Batch, BaseAllocator> to store extra information about a given batch. I strongly dislike this idea because it changes the memory requirement from O(1) to O(n). Freelists without batch allocation can be implemented in almost no extra memory (a pointer to the start and maybe a size of the list)
I could require BaseAllocator to be what I would call a "segmentable allocator"; that is, segments of a block can be deallocated separately, or joined together and deallocated together. Although this would make the Freelist implementation trivial, this seems like too big of a requirement to place on BaseAllocator

None of these approaches feels right to me. If allocating in batches is a worthwhile optimization for a freelist, it seems that there ought to be a way to implement it in O(1) space and time.

There is a github repo with an implementation based on the talk here: https://github.com/FelixPetriconi/AllocatorBuilder/blob/master/alb/freelist.hpp seems like they went for extending requirements on BaseAllocator via a traits system. — Erik Man, Mar 26 '21 at 23:01

score 1 · Answer 1 · 2017-12-09T22:35:22.850

To me if you are going to add "batch" support to a free list which can also allocate smaller chunks, the point is not to allocate the batch more quickly than allocating kBatchSize elements individually through the free list. It's to have contiguity guarantees on the result as would be necessary to implement a small contiguous sequence using it, e.g.

The work required for allocating and freeing from a free list is already so trivial that there aren't any reasonable speed improvements to be had by batching, but individually allocating single elements from a free list doesn't give you those contiguity guarantees.

As for how you'd optimally implement it, I can't say, but this might be an important hint. Personally I'd rather just use the parent allocator in these cases or just use a separate free list that's designed to allocate kMaxSize*kBatchSize byte chunks.

In my humble opinion Alexandrescu is a hit-n-miss type as with the case of many genius pioneers pushing the boundaries of a programming language. His ideas range from brilliant and widely applicable to unwieldly and impractical. If I was to consider batch support for a free list, the first thing I'd consider is another free list or a different allocator outright. Besides that, an allocator with 5 template parameters is already pretty cumbersome to use. The idea of being able to specify parent allocators is a marvelous idea I want to steal. Not so sure about specifying min/max sizes, batch sizes, and upper bounds. Dunno, maybe he'll astound me by showing off a super efficient and simple implementation that eludes me for this. At the moment I can't see one, and I thought about it for an hour since I first answered this.

So much of the uber simplicity and blazing speed of free lists comes from the narrowing assumption that they are fixed-sized allocators, not variable-sized ones, and that each chunk can serve as a union of used memory for the particular element type or a singly-linked list pointer to the next free chunk (for which it's crucial that the chunks are all the same size to trivialize the deallocation logic while still allowing the freed memory to be reclaimed in its entirety upon subsequent allocations). The ability to allocate in batches throws a monkey wrench into the concept for which I don't think there's any solution that can achieve the same speed for all input cases.

Attempt

Ugh, this question had me thinking for so long about it as well as visiting the whiteboard and tossing away idea after idea.

The only ideas I've had so far that come close to being able to achieve all desired operations in constant-time with batch allocation support in there requires metadata to be associated with the allocation and some assumptions to be made about exactly how the memory pools the chunks are pooled from align (with huge alignments used for the pools) to allow us to get from a pointer to the chunk back to the pool to get access to the metadata using some arithmetic.

The memory overhead can be very small, less than 1 byte per chunk (just the length of the batch) unless kBatchSize is huge (which it shouldn't be anyway or else we should probably use parent allocator), but it'll also require an array of kBatchSize free head pointers to be stored in the allocator itself.

I can almost picture the solution but I don't like where it's headed because providing batch support to a free list isn't a practical enough use case to justify all this work (both mentally and computationally) unless there's some magically simple and beautiful solution that's eluding us which incurs zero cost for the common cases.

How can I efficiently implement batch allocation in a freelist?

1 Answers1