I'm writing in C++, but this problem could apply to any other language. This post is a continuation of this question (I thought the question is different enough to fit a new post better. If I'm wrong, I'm happy to edit the old one).
I have a graph of heterogeneous nodes. All the nodes derive from a base Node class that looks something like this:
struct Node {
virtual void visitNeighbours( std::function<void(Node&)> );
virtual void process( Data );
// ...
};
Some of the algorithms working on this graph need extra data and behavior for the nodes they work with. This algorithm-specific extra data and behavior can be generated the same way for almost all node types, with just few exceptions. For some algorithms it's just an integer, while for others it can be functions and non trivial data structures.
Some algorithms are computationally intensive (they process some nodes hundreds of thousand of times), so I have efficiency in mind. Usually the number of graph nodes is less than 1000.
I'm trying to understand how I can achieve what I need and the pros and cons of my options, in order to choose the best solution for my use case.
The following are some ideas I had so far, with my considerations:
Change every node to take the base node type a template parameter. This way I could effectively have a graph of nodes of a base node type different for every algorithm.
A problem is that everything becomes a template: the code would be messier and heavier to compile, error messages harder to read, the binary fatter etc. It smells like a superpowerful solution that will end up exploding in my face.
Would it be wise to go down this path? Can you name any big or famous project that chose this option?Create the nodes using a factory provided by the algorithm. When the factory needs to create a node, it also creates some extra data for it (this data can be allocated in the same memory block as the node, at a constant offset, to access it easily and efficiently).
This solution works as long as the user uses the factory to create all the types. But if the user connects to the graph a node created in a different way, everything breaks down. To make things worse, the compiler wouldn't be able to detect any such issues.Attach extra data to the nodes dynamically. I could use a hashmap to attach the algorithm-specific data to a node pointer, and create the data for each node the first time I see it.
This approach is somewhat similar to the one above. It's more relaxed, but a bit slower. Besides it would require to use RTTI to identify the type of a node at runtime in order to specialize the algorithm-specific behavior (while I can do this statically with previous options).
I already implemented this option, a nice perk is that nodes can be created independently from an algorithm, but I don't need that, and it's giving me performance issues.Add the algorithm-specific data and behavior to the base class (and/or a generic void pointer).
The idea of adding special-case data to a generic Node interface seems dirty, but it might not have real downsides.
Out of these options I'm tempted to choose 2 and/or 4, they have tiny runtime overhead, and the code remains simple.
Do I have other options?
In your experience, are my considerations correct and is my choice wise enough?