Data Structures/Allocators With Holes to Keep Stable Indices/Pointers
Typically a high-performance ECS where element removal is very common [*] will use a random-access sequence with holes in them to make removal of an element a constant-time operation. Vacant spaces get reclaimed upon subsequent insertions, similar to a free list. So it's similar to the dummy solution you mentioned, although in a language like C++, you can actually invoke destructors on the elements being removed (placement new and manual destructor invocation: this is how std::vector
avoids the need for default constructors in push_backs
in spite of often allocating a memory capacity that exceeds the number of elements inserted so far) and can associate data in parallel to indicate what elements are removed.
- I should note that there are even ones that will completely deep copy and rearrange components on the fly to perfectly suit the contiguous access
patterns (no holes) of a particular system based on the component types they
access. Those can still end up using a stable index or pointer into some main
component list that's accessible to all systems. I don't think the production ones will generally use associative data structures for this purpose. I suspect in those cases that entity removal is relatively infrequent and that components are relatively cheap to deep copy around.
There are a lot of ways to go about what I'm talking about with "holes" in your sequence. There's the straightforward solution of associating a bitset or array of booleans in parallel to (or possibly even interleaved with) your components and a list of free indices to reclaim for subsequent insertions. The former tends to outperform the latter if you use very efficient bitwise operations like FFS/FFZ, but worse if not (surprisingly bitsets can perform worse in spite of requiring 1/8th the memory of a boolean array even for sequential access if you have to test bits individually, at least with std::bitset
and operator[]
although that has to conceptually have an additional bitwise AND for what would normally be modulo and right shift for what would normally be divison to know what byte and relative bit to access given the random-access nature of operator[]
).
Another way I've seen used in game engines is to separate some data structure of valid indices from the invalid ones and even using erase-swap-back methods (swap the element to remove with the element at the back and pop back in constant time, but update indices or pointers to elements in linear time with a level of indirection) to keep element removal in constant time (although the update of the indices can degrade to linear time with such solutions, but at least it's only updating integers and not hefty data)*.
One trick I rely on a lot if you're iterating over indices to elements is to store ranges of them as pairs of indices in the form, [first, last), not every single index. That allows us to represent an integer range [0, 1,000,000] with just two integers instead of a million integers we have to load in memory. That works best when your containers have very few holes in them most of the time which will end up being reclaimed on subsequent insertions. Efficient bitsets can be a nice alternative otherwise, including atomic bitsets where 64-bit bitsets can be manipulated atomically across threads (required for thread-safety with parallel loops over bitsets given the overlap in data without formal locks but ideally you distribute the work among your threads such that they aren't writing to the same 64-byte aligned set of 64 bytes in parallel to avoid false sharing for your most common cases -- assuming a 64-bit architecture with a cache line size and alignment of 64 bytes or 512 bits).
There are also intrusive solutions that aren't very generalized relying on the specific component type's representation. As a simple example, a component type of int32_t
(although I doubt few will find use for it -- admittedly contrived example) might only store non-negative integers in the normal case. Then you might use -1
to indicate that it's removed. That's usually too much effort than is so beneficial IMO to have to x-ray your component type's representation in your ECS unless you have a lot of extra time to specialize your class templates for specific component types and don't mind the extra coupling, especially since real-world use cases will generally have component types more complex than a scalar number, and will also often require destructors to be called to properly free their memory.
Iteration Overhead
All of these solutions incur a branching overhead to iterate over elements to check for vacant spaces or a memory access overhead (more cache misses) to iterate through parallel arrays storing valid indices to elements, but it tends to be quite trivial in exchange for avoiding the need for an associative structure with key searches to access elements (hash table,red-black tree, trie, etc). You end up with a trivial constant-time way to reference and access elements in your scene with stable indices.
Bitwise FFS For Component Iteration
Bitwise FFS is my favored solution and the best-performing in our particular use case for skipping over vacant spots in arrays after exhaustively trying every solution out there that I could find, as it avoids the need to test for vacant spaces on a per element level (it can check many at once, like 64+ on 64-bit architectures) and only requires ~1 bit of extra memory per entity.
Peculiarly, it's a solution I don't see used much even among really serious gamedevs even though I've had -- by far -- the best (measured) performance with those, but maybe because there's no C or C++ standard FFS/FFZ
intrinsic or function the last time I checked (we have to get a bit compiler and/or platform-specific to use such instructions). That's a really glaring omission from the C and C++ standard lib IMO for those of us who have to obsess with real-time applications and consistently smooth frame rates. But it's not too difficult to write a cross-platform version of FFS/FFZ functions yourself; GCC even offers it as a GCC-specific intrinsic (ffs*
) while you can find versions in <intrin.h>
on MSVC and ICC and likely similar on other compilers besides those three (ffz
is just ffs(~bitset)
). Still, I really wish we had a std::ffs
as I hate putting platform or compiler-specific code into our codebase and require an approval process from my team to do it where I have to write a proposal arguing for the merits of allowing such code in our codebase.
Stable Keys
In any case, referencing other entities/components from a scene wants some sort of stable key. Typically the fastest is going to be an index into a random-access structure, not a key that requires searching (not even the most optimized hash table will beat that). So in summary I suggest, if you actually need it, to stop thinking in terms like "IDs" or "Keys" and more in terms of "indices" or "pointers" which offer not only constant-time access to another element in your scene, but the cheapest form of constant-time access possible. The second question then becomes how to avoid invalidating your indices/pointers; I've seen a lot of AAA devs solve this with a layer of indirection (even though I think there are more efficient ways, but that's a way, and it's used widely in serious productions at least from what I can tell). In any case, it's always going to require a structure of some sort that has holes in it for vacant spaces at some fundamental level in memory (this includes the lowest-level memory allocators which are required to have holes if they allocate multiple elements together in a single block to avoid invalidating existing pointers).
Getting to Gross Details
One thing to note is that when your scene data is starting to resemble more of a graph than an array with lots of links from one element to another (be they indices or pointers, as with the case of a lot of motion parenting for FK/IK), you're typically going to incur a lot of cache misses with the sporadic access patterns following the graph links. It can be really handy in those cases to have a super optimized, parallelized sorting algorithm to sort things based on their links (if you can afford it somewhere in between your frame outputs) so that the memory access patterns take on a more orthodox and cache-friendly pattern with improved spatial and temporal locality. Having a super fast sorting algorithm available at your fingertips can help a lot to prevent you from rolling a fancier solution which ends up performing worse. I spent a few days coming up with a multithreaded SIMD radix sort drawing from the best papers and I haven't looked back since -- it allowed me to delete boatloads of code I have in my ECS by trying to get way fancier to reduce cache misses by just sorting everything periodically with an algorithm so fast that it never causes frame hiccups (it can sort over 100 million elements on a dinky i3 in less than 60ms, and at least in my use cases, I'm dealing with way less than 100 million elements so the actual time to sort is often less than a millisecond, and affordable to do periodically without any noticeable frame rate drops getting us below that desirable ~16.67ms for 60+ FPS).