When should an expression tree hold pointers and when should it hold values of subexpressions?

Question

I was thinking it should hold pointers:

struct Expr
{
    string sym;
    Expr*[] sub;

    this(self, string sym) {
        this.sym = sym;
    }

    @property auto dup() const {
        auto e = new Expr(sym);

        foreach (s; sub) {
            e.sub ~= s.dup;
        }

        return e;
    }
}

But then that .dup function will duplicate nodes; N node copies when originally there was only 1; so it needs to be much more complicated than that.

On the other hand, with Expr values, there may be a larger proliferation of Expr objects, say if I were pooling them.

So which way works better in symbolic computation projects?

When you ask "which way works better," what desired characteristics are you looking for that would satisfy your personal definition of "better?" — Robert Harvey, Jan 17 '17 at 20:13
Betterness = 1 / [(debug time) x (run time)]. I think I'm going with by-value. — MathCrackExchange, Jan 17 '17 at 20:16

score 2 · Answer 1 · answered Jan 18 '17 at 10:51

Most expression trees use pointers because recursive data structures are impossible by-value. E.g. something like this won't work:

struct Expr {
  Expr left;
  Expr right;
  ...
}

Additionally, expression trees usually contain multiple different expression types. There might be nodes for

terminal expressions like literals or variables,
unary expressions like field access or negation
binary expressions like addition, multiplication, or
ternary expressions like if-then-else conditionals.

In OOP languages, that is usually modelled with a class hierarchy with an Expr interface on top and various subclasses. However, subtype polymorphism requires pointer indirection. Since the size of a subtype isn't known, you can't assign a subtype instance by value to storage allocated for a base class. Some languages may hide this to some degree.

In your example, you are already using a pointer to get around this problem: the Expr[] array. Since the array storage is separate from the array variable, this circumvents the recursive-size problem. As you (quite unusually) only have a single Expr type, no subtyping problems arise. (Are you by chance modelling S-expressions? Even they generally have atoms for literals and symbols.)

Using pointers is attractive if your expression nodes are also immutable. Any AST transformations (like constant folding) will replace some nodes but reuse other nodes unchanged. If they can just be hooked into the new AST by pointer, that is much cheaper than duplicating the whole AST-subtree.

the first sentence say it all. In C/C++ you'll have compile error for this. — Walfrat, Jan 18 '17 at 10:58

When should an expression tree hold pointers and when should it hold values of subexpressions?

1 Answers1