0

I'm about to implement my own very simple programming language, and an interpreter to execute code in that language.

The language will be very basic. Example code:

var x = 3
if x > 2 print x
if x < 2 print "hello"

The language wouldn't feature anything more complex than single-line if statements. This is because it's the first time I'm attempting something like this, so I'm starting small.

The interpreter will be written in Java, and thus execute the code with Java operations.

My question is this:

I know that the topic of creating compilers and interpreters is very complex. But since I'm new to this, I believe I should start with basic techniques and approaches.

What should I learn before starting to work on this project? Especially what kind of knowledge regarding parsing and interpreting should I learn before starting?

Is it enough to just 'break down the text into substrings, and then more substrings', or should I learn more advanced techniques and apply them?

The knowledge and experience I acquire are meant to allow me to later build more knowledge on top of it when I continue learning and implementing interpreters. But shouldn't be 'too much' for a first attempt.

Aviv Cohn
  • 21,190
  • 31
  • 118
  • 178
  • 1
    I'd really suggest starting with something even simpler. Write a calculator that can handle `1 + 2 * 3`. Trying to reason about precedence in a language is difficult if you can't reason about it with basic math operations. –  May 01 '14 at 16:28
  • 2
    I *do* realize that dup is for a compiler, however an interpreter and a compiler are very similar until the last stage. –  May 01 '14 at 16:30

2 Answers2

2

Its going to depend a bit on the path you take for your interpreter. But the first thing you might want to try creating is a basic mathematical expression evaluator, because ultimately that can form the core of an interpreter. Calculating statements like (8 * 4) + 10 and ((10 + 2) * (11 / 5)). This will get you exposed to parsing code, breaking everything into tokens, stacks, and other concepts without completely overwhelming you.

Once you have it working with math expressions, extend it with functions.

(8 * 4) + max( 5, 10)

Then data types

"this" + "that"

That's a fairly large project, and should get you a good feeling for all that's involved.

For starters, look at the Shunting Yard Algorithm

GrandmasterB
  • 37,990
  • 7
  • 78
  • 131
  • Thanks for answering. I understand from your question that I should 'dive right in' to implementing basic stuff, and then less basic stuff, and this is supposed to grant me the basic parsing and interpreting understanding I need. *After that* I can read about theoretical stuff. Correct? – Aviv Cohn May 01 '14 at 16:41
  • A full blown interpreter can be a very complex thing to create (assuming you intend to do it by hand), and there's lots of theoretical concepts involved. I'm suggesting you start off with a small chunk of those concepts and solve a more discrete problem (an expression evaluator) rather than trying to dive right into the whole thing. – GrandmasterB May 01 '14 at 16:50
  • Should I implement the first basic program with terms like 'lexing' and 'tokens' in mind, or should I just implement it 'the way I feel like', even if it doesn't have any 'structure', 'tokening', etc? – Aviv Cohn May 01 '14 at 16:51
  • Yes, use the concepts of 'lexing' and 'tokens' and the like. I'm not suggesting you don't learn the concepts involved with an interpreter. I'm saying an interpreter is a much larger project than many people realize, so its not just something you sit down one day and learn everything about in one try. – GrandmasterB May 01 '14 at 17:23
0

I think you have to learn at least two main concepts in order to successfully implement your basic interpreter : language processing and expression evaluation.

You might be able to get somewhere with manual string operations, but it will be tedious and probably have serious limitations or bugs. To right way to do it is to use a language processing library. The purpose of this kind of software is to transform the source code into an easy to use memory representation called Abstract Syntax Tree. For example 3 * (5 + 1) will be transformed into a tree like this :

     *
    / \
   3   +
      / \
     5   1

There is a little bit of theory to understand in order to successfully implement this transformation. But the skill will be reusable for full-featured complex languages or domain specific languages. For Java, ANTLR is a pretty common library for language recognition, you should check it out : http://www.antlr.org/

The second skill will be about evaluating the AST. You need to learn about tree-like data structures and how to work with them. This will enable you to evaluate the AST and obtain the result and perform the actions requested by the program. Here is a very simple example of kind of code that you will need to write in order to evaluate the AST above :

def evaluate(tree):
    if tree.label == "*":
        return evaluate(tree.left) * evaluate(tree.right)
    elif tree.label == "+":
        return evaluate(tree.left) + evaluate(tree.right)
    else:
        return int(tree.label)

To go further you may also try to write a compiler instead of an interpreter. The first step is the same, but instead of evaluating the nodes of the tree, you'll have to output some kind of machine code that will do the job afterwards. Real hardware assembly can be difficult, but JVM bytecode is pretty simple and has a decent performance. There are libraries that can handle the binary aspect of it for you. For example ASM : http://asm.ow2.org/ This is how JVM languages work in general.

Good luck !

Grapsus
  • 109
  • 1