What's an efficient data structure to make a lot of queries between parents/childs?

Question

I have the following scenario for which I've been experiencing performance problems:

The context is a level editor of a new engine for an old video game (whose source is unavailable) in Unity. Basically I'm writing a level editor for Unity to shape old data structures into contemporary, more usable ones, i.e. parts/entities instead of a bunch of unrelated models/polygons.

Here's a tree of how things are organized in the old game:

Container (dozens)
- Models (hundreds)
  - Polygons (thousands)

Here's what I'd like to achieve:

Group(s): for grouping related objects
- Object(s): a tree, a ship etc ...
  - Part(s): an object's parts with specific material etc ...

Depending what the final object looks like on the screen, the initial data (models/polygons) is a mess, where there are merged or unrelated parts in a model and its polygons.

Other problems emerge due to this, for instance, what used to be a simple pointer update to update a texture or color is simply not possible from within Unity.

Here things I need and what I've done:

I need to be able to track implicit and explicit references to models and polygons. This is to ensure that valid objects are built and therefore prevent incorrect content.

Basically the user can create a part from either the model or or one of its polygons, the opposite being disallowed depending what's chosen first.

Here's a picture with explanations:

Scene9 is of type Container
M123_456_ABC... are of type Model
P123_456_ABC... are of type Polygon
A green icon is an explicit reference
A yellow icon is an implicit reference
A red icon is a non-referenced item

Also, the user interface is augmented by elements getting disabled whenever appropriate to help user picking the right actions when building the level, etc ...

Here, the issues I am experiencing:

Querying for explicit/implicit references between around 300 models and 7000 polygons becomes slow over time as relationships gets created.

I've been using a couple of List<T> and do LINQ queries between them:

everything
all models
all polygons
implicit models
implicit polygons
explicit models
explicit polygons

Minus the spaghetti logic, it does works but it's slow. Then I've upgraded to HashSet<T> with custom Comparer<T> which greatly improved speed but it's still not fast enough. All this results in freezes in the user interface.

Main consideration is that within Unity, loops are rather tight, calls at high rate, immediate mode UI and no asynchronous calls features (Mono is roughly .NET 3.5).

I am starting to consider that my approach using lists is flawed and was thinking that a better data structure could be used for performant queries.

Can you suggest a more appropriate data structure to use for such scenario ?

EDIT

Here is a diagram showing the new approach I'm trying after your suggestions:

Notes:

IScenePrimitive consists of 3 int and an Enum
ISceneReference.IsRef is simply IsRefExplicit || IsRefImplicit
Missing things here are a dictionary in Scene for quick access to a model/polygon, when user selects something I retrieve these using hashes without having to enumerate, etc ...

Interesting parts in code:

SceneModel:

    public override bool IsRefExplicit
    {
        get { return _isRefExplicit; }
        set
        {
            _isRefExplicit = value;

            if (value)
            {
                Debug.Assert(_polygons.None(s => s.IsRefExplicit));
                foreach (var polygon in _polygons)
                    polygon.IsRefImplicit = true;
            }
            else
            {
                Debug.Assert(_polygons.All(s => s.IsRefImplicit));
                foreach (var polygon in _polygons)
                    polygon.IsRefImplicit = false;
            }
        }
    }

    public override bool IsRefImplicit
    {
        get { return _isRefImplicit; }
        set
        {
            _isRefImplicit = value;
            if (value)
                Debug.Assert(_polygons.Any(s => s.IsRefExplicit));
        }
    }

ScenePolygon:

    public override bool IsRefExplicit
    {
        get { return _isRefExplicit; }
        set
        {
            _isRefExplicit = value;
            if (value)
            {
                Debug.Assert(!Model.IsRefExplicit);
                Model.IsRefImplicit = true;
            }
            else
            {
                Debug.Assert(Model.IsRefImplicit);
                Model.IsRefImplicit = false;
            }
        }
    }

    public override bool IsRefImplicit
    {
        get { return _isRefImplicit; }
        set
        {
            _isRefImplicit = value;
            Debug.Assert(Model.IsRefExplicit);
        }
    }

That's all that really happens, LINQ on a dozen items instead of many more (300 models querying each 7000 polygons X). I still have to test it to see how well it performs but it should perform much faster, to be continued.

"Mono is roughly .NET 3.5" -> [Theraot.Core](https://www.nuget.org/packages/Theraot.Core/) will give you all async goodness even in .NET 2.0 #ShamelessPromotion - I'm having trouble wrapping my head around the queries you are doing, yet I think you need some [Memoization](https://en.wikipedia.org/wiki/Memoization) scheme. — Theraot, Oct 12 '16 at 21:23
Thanks for the links, taking a look at them ! Well, I too but considering there are about 40 levels to build using mouse, each with ~10K polys, whatever 'helper' that can alleviate this error-prone process is welcome :) — aybe, Oct 12 '16 at 23:55
Try a B-tree. You can use a binary search to get nodes, which means you'll have O(log n) efficiency. — imnota4, Oct 13 '16 at 12:21
Thank you all, I am definitely going to use a custom tree structure with some form of monoization. Still working on it, any ideas are still welcome. Will come and back post an answer whenever I get something decent. — aybe, Oct 13 '16 at 18:51
Having worked on many graphical models with higher complexity, I am having difficulty understanding why you need queries at all, and especially why you would use LINQ. The most compact, fastest form of a "reference" is a pointer (reference in C#), but that can be replaced with an integer indexing an object array if you need to be able to swap an object without affecting references. — Frank Hileman, Oct 14 '16 at 19:53
Additionally, all child objects should have a reference (integer or pointer) to its parent. — Frank Hileman, Oct 14 '16 at 19:55
Right, coincidentally, I've just started using your approach ! Childs have parents and LINQ has mostly vanished. I've used tips from @Theraot, memoization, a tree-like structure. I couldn't find out how to sketch a B-tree for 3 keys though (https://www.cs.usfca.edu/~galles/visualization/Algorithms.html) ... Still have to test that new approach because I did it offline. And basically I need this to assist the user to build a level (there are ~40) as easily as possible by disabling some actions in UI and providing visual cues ! — aybe, Oct 14 '16 at 20:30
By the way, to put things in context, this is a new engine using Unity for an old PSX game, two platforms, two paradigms. In my answer I'll go in details on why I made and had to make some of these choices in the implementation (basically each of these platforms have their quirks and I have to deal with that). — aybe, Oct 14 '16 at 20:38
@Aybe You said three keys, why not three `Dictionary`? Or since you want async operations, why not three `ConcurrentDictionary`? — Theraot, Oct 14 '16 at 21:09
I don't understand what you mean by using dictionaries, where should they be in the tree (if at all, if that's not what you meant) ? — aybe, Oct 14 '16 at 21:13
@Aybe Let's see... you are using (or were trying to use) a B-tree. Now tradicionally a tree is implemented with an interface similar to a set, where you can check if an item is present or not... but they can also be implemented like dictionaries where you query by a key to retrieve a value. I'm guessing you are doing that. So, if you are using a B-tree as a Dictionary, why not the standard issue dictionaries instead? Do they perform badly or is there another reason to not use them? - I have another question: how are implicit / explicit references represented in the source data? — Theraot, Oct 14 '16 at 21:35
Ok I get it ! Now I'm using a *sort* of tree only by the name (only 3 typed levels: containers/models/polygons); it's a tree in the sense it's enumerable :) For references it's a bit messy, in picture above, each container/model/polygon has a component defining its location and type (tiny struct with 3 `int` + 1 `enum`). That said, when user selects something I can *match* it on my internal structure (by key/hash) representing the tree you see and annotate it (isRefImplicit, isRefExplicit etc ...). — aybe, Oct 14 '16 at 21:56
Basically, since this is (going to be an open source) remake of a copyrighted game, I have to *indirectly refer* to game data this way because I can't put any content in the repository... If this was my very own game, life would be much easier as things would have been imported in Unity as *assets*. — aybe, Oct 14 '16 at 22:01

score 2 · Accepted Answer · answered Oct 14 '16 at 23:50

Let's take a look at SceneModel's IsRefExplicit. I've added comments:

SceneModel

public override bool IsRefExplicit
{
    get { return _isRefExplicit; }
    set
    {
        _isRefExplicit = value;

        if (value)
        {
            // If the model has an explicit reference
            // then all its polygon have an implicit reference
            Debug.Assert(_polygons.None(s => s.IsRefExplicit));
            foreach (var polygon in _polygons)
                polygon.IsRefImplicit = true;
        }
        else
        {
            // If the model doesn't have an explicit reference
            // then all its polygon don't have an implicit reference
            Debug.Assert(_polygons.All(s => s.IsRefImplicit));
            foreach (var polygon in _polygons)
                polygon.IsRefImplicit = false;
        }
    }
}

This can be done by simply changing the implementation of ScenePolygon to query the Model:

SceneModel

public override bool IsRefExplicit
{
    get { return _isRefExplicit; }
    set { _isRefExplicit = value; }
}

ScenePolygon

public override bool IsRefImplicit
{
    get { return Model.IsRefExplicit; }
    set { throw new NotSupportedException(); }
}

No more loop.

Of course the above is under tha assumption that IsRefImplicit on ScenePolygon will only be set by SceneModel. If this assumption is not true, then perhaps you shouldn't do polygon.IsRefImplicit = false; on SceneModel because the IsRefImplicit may have been set to true by another code.

Now, let's take a look at ScenePolygon's IsRefExplicit:

ScenePolygon

public override bool IsRefExplicit
{
    get { return _isRefExplicit; }
    set
    {
        _isRefExplicit = value;
        if (value)
        {
            // If the polygon has an explicit reference
            // then the model has an implicit reference
            Debug.Assert(!Model.IsRefExplicit);
            Model.IsRefImplicit = true;
        }
        else
        {
            // If the polygon doens't have an explicit reference
            // then the model doens't have an implicit reference?
            // ...
            // Could there be another polygon of the model with an explicit reference?
            // If there is, then the model should remain with an implicit reference
            Debug.Assert(Model.IsRefImplicit);
            Model.IsRefImplicit = false;
        }
    }
}

In this case what you need is reference counting. Keep in the SceneModel how many ScenePolygon does it have with IsRefExplicit and let ScenePolygon increment it or decrement it as needed.

public override bool IsRefExplicit
{
    get { return _isRefExplicit; }
    set
    {
        if (_isRefExplicit == value)
        {
            return;
        }
        _isRefExplicit = value;
        if (value)
        {
            Model.IncrementImplicitRefCount();
        }
        else
        {
            Model.DecrementImplicitRefCount();
        }
    }
}

Then the model can implement IsRefImplicit by checking if the current reference count is greater than 0.

For abstract, we could write the rules like this:

Model.IsRefImplicit = contains at least one polygon with IsRefExplicit.
    Have a counter, and check if it is greater than 0.
    There is no setter.

Model.IsRefExplicit = all the polygons have IsRefImplicit.
    Store a bool backing field.

Polygon.IsRefImplicit = belongs to an model with IsRefExplicit.
    Read Model.IsRefExplicit
    There is no setter.

Polygon.IsRefExplicit = should mark the model IsRefImplicit.
    Store a bool backing field.
    The setter will increment the counter of the Model when set to true,
                and decrement it when set to false,
                do nothing if the value didn't change.

Then the concern is to annotate IsRefExplicit and let IsRefImplicit be populated by the code you have. So, you would be reading the source data and looking in some dictionary for the object it references, and annotating it. If the object is not in the dictionary you create it and add it.

If you change your code to use Interlocked (use Increment and Decrement for the reference count, and use CompareExchange and Exchange on an int set to 0 or 1 instead of the bool backing fields) then the resulting code will be thread-safe, and you will be able to have multiple threads writing IsRefExplicit.

If you change your lists/dictionaries to thread-safe solutions such as ConcurrentDictionary, then the structure can also be populated by multiple threads.

Looks great. To make it clear, all primitives initial state is *not referenced*, I will only mark a primitive as *explicit* from the outside, I will never set *implicit* but just read it. First comes first served for *explicit*, between a model and one of its polygon (i.e. if model is used, polygons cannot; if polygon(s) is/are, model can't be; it's really a matter of preventing duplicates in the output). The counter thing is great and I'll use a dictionary so I can implement a *select first non-referenced feature*, etc... Going to improve from your answer and come back, thanks ! — aybe, Oct 15 '16 at 01:29
@Aybe consider having a set (set, bag, list or dictionary) with all the elements that are created and non-referenced - then if you give the elements that set, they can remove themselves when they are marked as referenced, and finally you can iterate over all the non-referenced items using that set. Edit: it means that you won't need to iterate over the set of all elements and check if they are not referenced. Similar solution can be done for referenced elements. — Theraot, Oct 15 '16 at 01:57
Yes, I'm on it, implementing counters, removing LINQ, adding global dictionaries ... *I'll be back*, thanks ! — aybe, Oct 15 '16 at 17:27

What's an efficient data structure to make a lot of queries between parents/childs?

1 Answers1