22

I have a very large in memory node tree and need to traverse the tree. Passing the returned values of each child node to their parent node. This has to be done until all the nodes have their data bubble up to the root node.

Traversal works like this.

private Data Execute(Node pNode)
{
    Data[] values = new Data[pNode.Children.Count];
    for(int i=0; i < pNode.Children.Count; i++)
    {
        values[i] = Execute(pNode.Children[i]);  // recursive
    }
    return pNode.Process(values);
}

public void Start(Node pRoot)
{
    Data result = Execute(pRoot);
}

This works fine, but I'm worried that the call stack limits the size of the node tree.

How can the code be rewritten so that no recursive calls to Execute are made?

Móż
  • 1,252
  • 1
  • 8
  • 16
Reactgular
  • 13,040
  • 4
  • 48
  • 81
  • 8
    You would either have to maintain your own stack to keep track of the nodes, or change the shape of the tree. See http://stackoverflow.com/q/5496464 and http://stackoverflow.com/q/4581576 – Robert Harvey Jan 30 '14 at 21:13
  • 1
    I also found a lot of help at [this Google Search](https://www.google.com/search?q=write+tree-traversal+calls+non-recursively), specifically [Morris Traversal](http://en.wikipedia.org/wiki/Tree_traversal#Morris_in-order_traversal_using_threading). – Robert Harvey Jan 30 '14 at 21:17
  • @RobertHarvey thanks Rob, I wasn't sure what terms this would go under. – Reactgular Jan 30 '14 at 21:19
  • 2
    You might be surprised at the memory requirements if you did the math. For example, a perfectly balanced teranode binary tree only needs a stack 40 entries deep. – Karl Bielefeldt Jan 30 '14 at 21:37
  • @KarlBielefeldt That assumes the tree is perfectly balanced though. Sometimes you need to be modeling trees that *aren't* balanced, and in that case it's *very* easy to blow the stack. – Servy Jan 30 '14 at 21:47
  • @Servy, I agree. That's why I said do the math. – Karl Bielefeldt Jan 30 '14 at 21:54
  • I'm trying to write a simple more elegant solution for you here but I'm getting totally caught on one thing I don't understand at all - *why are you converting from a tree with `Node` to an N-Dimensional list in `Data[]`* ?? Ok the snippet you showed is just really confusing and weird... – Jimmy Hoffa Jan 30 '14 at 22:19
  • @JimmyHoffa Think of each node in the tree as a function that takes arguments, and child nodes as functions that provide data for those arguments. If written like a language it would be `Data root = node1(node2(),node3(node4()))` where `node` has `node2,node3` as children, and `node3` has `node4` as child. But my tree is very large so the length of the single line of code would be huge. – Reactgular Jan 30 '14 at 22:32
  • It is possible to traverse a recursive data structure without using recursion: [The Schorr-Waite-Algorithm](http://link.springer.com/chapter/10.1007/978-3-540-69061-0_15) [Schorr-Waite graph marking algorithm](http://xlinux.nist.gov/dads/HTML/SchorrWaiteGraphMarking.html) – rzzzwilson Jan 31 '14 at 04:17
  • It's possible to "factor out" tree traversal into a method that exposes the traversal as a lazy IEnumerable. Check out the implementation in [this library](https://github.com/madelson/MedallionUtilities/tree/master/MedallionCollections#traverse) – Chase medallion May 12 '16 at 23:45

3 Answers3

32

Here is a general purpose tree traversal implementation that doesn't use recursion:

public static IEnumerable<T> Traverse<T>(T item, Func<T, IEnumerable<T>> childSelector)
{
    var stack = new Stack<T>();
    stack.Push(item);
    while (stack.Any())
    {
        var next = stack.Pop();
        yield return next;
        foreach (var child in childSelector(next))
            stack.Push(child);
    }
}

In your case you can then call it like so:

IEnumerable<Node> allNodes = Traverse(pRoot, node => node.Children);

Use a Queue instead of a Stack for a breath first, rather than depth first, search. Use a PriorityQueue for a best first search.

Servy
  • 1,966
  • 14
  • 12
  • Am I correct in thinking this will just flatten the tree into a collection? – Reactgular Jan 30 '14 at 21:56
  • 1
    @MathewFoscarini Yes, that's its purpose. Of course, it doesn't need to necessarily be materialized into an actual collection. It's just a sequence. You can iterate over it to stream the data, without needing to pull the entire data set into memory. – Servy Jan 30 '14 at 21:56
  • I don't think that solves the problem. – Reactgular Jan 30 '14 at 21:58
  • @MathewFoscarini How so? You can just `foreach` over the sequence and perform some operation on each item in the sequence, as your code appears to be doing. – Servy Jan 30 '14 at 21:59
  • 4
    He's not just traversing the graph performing independent operations like a search, he's aggregating the data from the child nodes. Flattening the tree destroys the structure information he needs in order to perform the aggregation. – Karl Bielefeldt Jan 30 '14 at 22:22
  • Thanks @KarlBielefeldt said it better than I could. Each node has to receive the output from it's children. – Reactgular Jan 30 '14 at 22:27
  • 1
    FYI I think this is the correct answer most people googling this question are looking for. +1 – Anders Arpi Jan 19 '15 at 09:24
5

If you have an estimate for the depth of your tree beforehand, maybe it is sufficient for your case to adapt the stack size? In C# since version 2.0 this is possible whenever you start a new thread, see here:

http://www.atalasoft.com/cs/blogs/rickm/archive/2008/04/22/increasing-the-size-of-your-stack-net-memory-management-part-3.aspx

That way you can keep your recursive code, without having to implement something more complex. Of course, creating a non-recursive solution with your own stack may be more time and memory efficient, but I am pretty sure the code will not be as simple as it is for now.

Doc Brown
  • 199,015
  • 33
  • 367
  • 565
  • I just made a quick test. On my machine I could make make 14000 recursive calls before reaching stackoverflow. If the tree is balanced only 32 calls is needed to store 4 billion nodes. If each node is 1 byte(which it wont be) It would take 4 GB of ram to store a balanced tree of height 32. – Esben Skov Pedersen Aug 22 '15 at 19:20
  • I we were to use all 14000 calls in the stack. The tree would take up 2.6x10^4214 bytes if each node is one byte(which it wont be) – Esben Skov Pedersen Aug 22 '15 at 19:23
-2

You can't traverse a data structure in the shape of a tree without using recursion - if you don't use the stack frames and function calls provided by your language, you basically have to program your own stack and function calls, and it is unlikely that you manage to do it within the language in more efficient manner way than the compiler writers did on the machine that your program would run on.

Therefore, avoiding recursion for fear of running into resource limits is usually misguided. To be sure, premature resource optimization is always misguided, but in this case it is likely that even if you measure and confirm that memory usage is the bottleneck, you will probably not be able to improve on it without dropping down to the level of the compiler writer.

Kilian Foth
  • 107,706
  • 45
  • 295
  • 310
  • 2
    This is simply false. It is most certainly possible to traverse a tree without using recursion. It's not even *hard*. You can also do so more efficiently, quite trivially, as you can only include as much information in the explicit stack as you're sure you need for your specific traversal, whereas using recursion you end up storing more information than you actually need in many cases. – Servy Jan 30 '14 at 21:46
  • 2
    This controversy comes up every once in a while here. Some posters consider rolling your own stack not to be recursion, while others point out that they are just doing the same thing explicitly that the runtime would otherwise do implicitly. There is no point in arguing about definitions like this. – Kilian Foth Jan 30 '14 at 21:51
  • How do you define recursion then? I would define it as a function that invokes itself within its own definition. You can most certainly traverse a tree without ever doing that, as I have demonstrated in my answer. – Servy Jan 30 '14 at 21:53
  • 2
    Am I evil for enjoying the act of clicking the downvote on someone with such a high rep score? It's such a rare pleasure on this website. – Reactgular Jan 30 '14 at 21:54
  • 2
    Come on @Mat, that's kid stuff. You may disagree, like if you are afraid of bombing out on a tree that's too deep, that's a reasonable concern. You can just say so. – Mike Dunlavey Jan 30 '14 at 22:01
  • `premature resource optimization is always misguided` -- Really? You would consider choosing the appropriate data structure for a particular algorithm premature? – Robert Harvey Jan 30 '14 at 22:03
  • @RobertHarvey I'm going to play devil's advocate on this one and say that choosing the proper data structure for an algorithm isn't entirely about resource optimization. Some data structures (in fact, most frequently the preferable from a performance standpoint) have operations that simply lend themselves to more easily solving a given problem. Having said that, I do generally think about the performance implications whenever deciding on a data structure. – Servy Jan 30 '14 at 22:04
  • If you remove the "You can't ..." phrase, your answer would sidestep the ['arguing "by definition"'](http://lesswrong.com/lw/nz/arguing_by_definition/) problem you yourself mention in your comment. – Kenny Evitt Aug 25 '14 at 19:55