Questions about compiler/interpreter design

Question

I am implementing a programming language, for fun, in C. I have most of the parsing code done and also the AST ready. I once did write a runtime for this language some time ago, but I had some trouble with the garbage collector. I then realized i needed to study a little more and that my previous AST walker implementation was really poor.

Now I'm restarting it, and I choosed to compile the AST to bytecode and then run it on a pretty basic stack-based VM. I have looked at others languages compilers and VMs and I learned a lot, but my research wasn't still enough to answer me somethings.

When compiling, should I maintain an array for each type of "basic" constants like strings, numbers, booleans, etc... And then compile that to bytecode as well, or a single array of constants but already wrapped by a custom struct? And if I do this, how should I pass this objects to the VM? Compile it's address in memory as void * or create the VM's own reference to this array of objects and pass the indices as an instruction argument? Also, should the Garbage Collector know about this objects immediately or just when they get pushed to the stack? It seems wrong to me the later approach because if I have a big program, it would allocate a very large amount of memory before even executing anything. However, that's the way I saw some languages implementations doing it.

To sum up, should I compile "raw" values, or keep a reference to language Objects through the Compiling-Executing process?

You've got a bit of a thought put into this, but you appear to be asking too many questions at once. Could you [edit] the question to focus on one particular aspect of the design? — , Oct 23 '15 at 16:36
Too many questions. Read the references I gave in [this answer](http://programmers.stackexchange.com/a/125776/40065) to a related question — Basile Starynkevitch, Oct 23 '15 at 16:43
Have a look at the Java Virtual Machine, byte code class file format. Sounds to me like a pretty good strawman for what you're trying to do. The JVM byte codes can be persisted to a file (hence the class file format), which means that it cannot contain references to memory, but instead uses indexes to virtual machine data structures (for things like constants). Of course, there are many viable variations/alterations on the JVM class format. It would be reasonable to segregate references from built in data types, for example (for easier GC root tracing) (JVM doesn't). — Erik Eidt, Oct 23 '15 at 17:04
In part the answers will also depend on the bytecode that your VM supports. In many cases, simple integer and boolean constants can be directly encoded into the bytecode without needing a separate table of constants for them. — Bart van Ingen Schenau, Oct 24 '15 at 09:06

score 0 · Answer 1 · answered Oct 24 '15 at 09:46

There are a lot of design tradeoffs to weigh. Factors that might influence the best design include whether or not you want to persist bytecode to a file, whether your vm has primitive types that it operates on directly or if you've gone with a pure object-oriented approach and the bytecode doesn't operate at an integer/float/etc level. Also, much depends on the type of gc you are using. With a copying collector, having live objects in the constant pool makes less sense than it does with mark & sweep, for example.

The best approach is probably to look at a few systems that have similar designs to your own, and look at what they have done, and why.

Questions about compiler/interpreter design

1 Answers1