Best way to implement a heterogeneous graph that multiple algorithms have to work with, each requiring extra data and behavior for each node

Question

I'm writing in C++, but this problem could apply to any other language. This post is a continuation of this question (I thought the question is different enough to fit a new post better. If I'm wrong, I'm happy to edit the old one).

I have a graph of heterogeneous nodes. All the nodes derive from a base Node class that looks something like this:

struct Node {
  virtual void visitNeighbours( std::function<void(Node&)> );
  virtual void process( Data );
  // ...
};

Some of the algorithms working on this graph need extra data and behavior for the nodes they work with. This algorithm-specific extra data and behavior can be generated the same way for almost all node types, with just few exceptions. For some algorithms it's just an integer, while for others it can be functions and non trivial data structures.

Some algorithms are computationally intensive (they process some nodes hundreds of thousand of times), so I have efficiency in mind. Usually the number of graph nodes is less than 1000.

I'm trying to understand how I can achieve what I need and the pros and cons of my options, in order to choose the best solution for my use case.

The following are some ideas I had so far, with my considerations:

Change every node to take the base node type a template parameter. This way I could effectively have a graph of nodes of a base node type different for every algorithm.
A problem is that everything becomes a template: the code would be messier and heavier to compile, error messages harder to read, the binary fatter etc. It smells like a superpowerful solution that will end up exploding in my face.
Would it be wise to go down this path? Can you name any big or famous project that chose this option?
Create the nodes using a factory provided by the algorithm. When the factory needs to create a node, it also creates some extra data for it (this data can be allocated in the same memory block as the node, at a constant offset, to access it easily and efficiently).
This solution works as long as the user uses the factory to create all the types. But if the user connects to the graph a node created in a different way, everything breaks down. To make things worse, the compiler wouldn't be able to detect any such issues.
Attach extra data to the nodes dynamically. I could use a hashmap to attach the algorithm-specific data to a node pointer, and create the data for each node the first time I see it.
This approach is somewhat similar to the one above. It's more relaxed, but a bit slower. Besides it would require to use RTTI to identify the type of a node at runtime in order to specialize the algorithm-specific behavior (while I can do this statically with previous options).
I already implemented this option, a nice perk is that nodes can be created independently from an algorithm, but I don't need that, and it's giving me performance issues.
Add the algorithm-specific data and behavior to the base class (and/or a generic void pointer).
The idea of adding special-case data to a generic Node interface seems dirty, but it might not have real downsides.

Out of these options I'm tempted to choose 2 and/or 4, they have tiny runtime overhead, and the code remains simple.

Do I have other options?

In your experience, are my considerations correct and is my choice wise enough?

By "extra data" I mean data (variables, function pointers etc) that the algorithm needs to store for every node in order to work. Different algorithms require different data. Some don't require any, others require a couple of ints, other require a stack of pointers or other complex objects. The graph is fixed and it's a directed acyclic graph. The algorithms never modify the nodes (except for their algorithm-specific "extra data"). Usually the nodes are between 50 and 1000. An algorithm might process some nodes hundreds of thousand of times. — Helloer, Dec 22 '20 at 07:16
About "place extra data right before the node" I mean that, when the factory allocates a node, it can allocate a `struct { ExtraData; Node; }`. Then it returns a pointer to the Node. By using that pointer it should be able to access the ExtraData. — Helloer, Dec 22 '20 at 07:18
Do you really need to frame this as "extra data" ? Why do you need this layer of polymorphisme which looks very cool in theory? Can't you just have one uniq type of node, and have your algorythm simply not use all available field? — s.lg, Dec 22 '20 at 08:03
@s.lg: yes, that's an option too. I'll edit my question to add it as a fourth option. My intuition suggested that adding specific-case to a generic interface was a very bad design. Given the alternatives it might be a good compromise though. — Helloer, Dec 22 '20 at 08:12
@DocBrown: one hour ago I tried to add more context to the example, but I couldn't find a way without making the question huge. It's almost bedtime for me. Tomorrow I'll try again to describe what I'm working on and highlight some examples while keeping the post concise. — Helloer, Dec 22 '20 at 08:35
... ok, I may not have understood everything, but I guess using #4. with a generic void pointer looks most promising. It adds only a little overhead to your nodes, allows you to add arbitrary, algorithm-specific data to each node, and if it is not fast enough for your case, other alternatives will probably be too slow, too. — Doc Brown, Dec 22 '20 at 08:39
Hi you can use double dispatch to get extra information. So Node has a acceptVisitor(visitor) function and the Visitor class has a process(NodeTypeA, extra data) process(NodeTypeB, extra data) which is called by the node. — Richard Bamford, Dec 22 '20 at 16:30
@RichardBamford: yes, but I was more wondering where to store the extra data associated to a node. Every node instance needs to have its own data (and the type of the data depends on the algorithm). — Helloer, Dec 22 '20 at 16:50

Best way to implement a heterogeneous graph that multiple algorithms have to work with, each requiring extra data and behavior for each node

0 Answers0