5

Possible Duplicate:
How do I create my own programming language and a compiler for it

I want to create a new general purpose language that will compile to JavaScript and I'd like to be able to write it in that same JavaScript.

I wouldn't go much into detail about why, but how can I do it? Currently I have professionally coded in JS, Java, Groovy, PL/SQL, PHP among the rest, so I'm covered on the language user side of the story, but I have no theoretical or practical knowledge from the create language side.

I have browsed the web for the past week and noticed a soup of acronyms and/or names like RR, LALR, BNF, AST etc. so I figure I need two things:

  • Good Book(s) about theory
  • Working projects or experiments that I can use to jump start my learning (something free like that narcissus)

EDIT:

No, this question does not cover the EXACTLY SAME content and it is not on the same topic even worse, even though there are two good answers, they don't answer how I can practically USE JS TO BUILD THE TOOLS (JS TRANSPILER) I NEED FOR THE NEW LANGUAGE.

Any ideas?

Azder
  • 193
  • 1
  • 8
  • 1
    So you want to learn how to write a compiler. Ever done something like that before? –  Aug 15 '12 at 07:47
  • No. Just created some DSL's in Groovy. – Azder Aug 15 '12 at 07:53
  • 4
    Great. Compilers are fun :) –  Aug 15 '12 at 11:14
  • I have read that one before posting. Too general, not helpful. – Azder Aug 16 '12 at 05:26
  • The approaches if you want to LEARN how to do it versus if you just want to a large toolbox to USE it, are vastly different. I thought it was the former, it appears to be the latter. My bad. –  Aug 17 '12 at 08:55
  • 1
    @Coder, you're so terribly wrong! Everyone have to design languages. Just literally everyone. Because using the DSLs is the most powerful technique out there. – SK-logic Aug 27 '12 at 08:41
  • @Coder What a terribly unconstructive thing to say - if everyone took your advice to heart then languages would very quickly stop improving. – sdasdadas Jan 31 '13 at 23:09
  • Haven't even seen what's been going on here... Not that it matters though: the StackExchange inquisition has been overzealous as usual in marking "duplicates" and "non-constructives" so as they would keep them selves feeling useful and these forums non-useful for anyone else. – Azder Feb 02 '13 at 01:13

2 Answers2

15

First, forget about LALR and the other intimidating words associated with the dragons. There is a nice and friendly one, behold: PEG. A ready to use implementation for JavaScript is available.

After you've done with the parsing you will have to transform your Abstract Syntax Tree in one or several steps into an AST of JavaScript and serialise it into text. There are many schools of thought in this area, but my personal preference is to use the very simple and obvious term rewriting model. You can draw your inspiration from things like this.

An unmodified pure JavaScript is not quite suitable for implementing compilers straight away, since it does not provide any decent pattern matching facilities, so I'd recommend to implement a eDSL layer as above first before starting to implement your language transforms.

Depending on what kind of type system your source language will have, you may want to implement various kinds of typing algorithms. For some of the practical strict type systems, an embedded Prolog might help a lot. E.g., implementing a Hindley-Milner algorithm in Prolog is trivial. Otherwise, a simple type propagation implemented with trivial term rewriting will be sufficient.

If you want to perform some optimisations, consider employing intermediate representations such as SSA or CPS. You may possibly need a decent graph handling library for it.

SK-logic
  • 8,497
  • 4
  • 25
  • 37
  • 2
    there is some good stuff here, but overall I fear it may be a bit intimidating, of course you could argue that's because the problem itself deserves a bit of fear and respect. – jk. Aug 15 '12 at 07:19
  • @jk., I'm sorry if it sounds intimidating. The message I've tried to convey was in fact that writing compilers is really, really *easy*. You do not even need a Turing-complete language for doing so, and if you think in terms of a sequence of trivial rewrites of the trees, there won't be any complexities at all. – SK-logic Aug 15 '12 at 07:22
  • 1
    Well, I'm not intimidated, I wouldn't have asked otherwise :) – Azder Aug 15 '12 at 07:53
  • Glad to hear it – jk. Aug 15 '12 at 09:36
  • 1
    @SK-logic "really, really, _easy_" might be over-simplifying a bit. –  Aug 15 '12 at 11:24
  • @ThorbjørnRavnAndersen, I'm not over-simplifying, I really mean it. It is easier than most of the "real" programming tasks, it does not even require "programming" as such and can be done in a 100% declarative way. For this very reason, using DSLs is such a powerful technique: it is easier to build a new language than to solve a problem using an inappropriate one. – SK-logic Aug 15 '12 at 11:48
  • 2
    I think the hardest thing about a compiler is providing useful error messages. Or maybe the developers just don't care? (Except Martin Odersky and the Clang crew.) – Jörg W Mittag Aug 15 '12 at 11:55
  • @JörgWMittag, it can also be done declaratively, and yes, using PEG instead of the old-fashion automata-based approaches helps a *lot* here. With PEG you can even define useful syntax highlighting, code indentation and pretty-printing rules declaratively, without a single bit of programming. – SK-logic Aug 15 '12 at 11:57
  • Just to make it clear, I'm not aiming for it being easy, I'm aiming for it to be able to execute the code I want :) – Azder Aug 16 '12 at 17:47
  • @Azder, even implementing something of a scale of C99 is relatively easy (unless you go into full standard compliance nuances). – SK-logic Aug 17 '12 at 07:25
  • "First, forget about LALR and the other intimidating words associated with the dragons.": What's wrong with them? – Giorgio Aug 23 '12 at 23:18
  • 1
    @Giorgio, Limited grammars, presence of a dedicated lexing step, lack of functionality for error reporting and error recovery, not possible to implement symmetric pretty-printing, autocomplete and other forms of online hinting, etc. It is easier to say what is not wrong with the old automata-based parsing approaches: performance. Everything else is just pathetic. – SK-logic Aug 24 '12 at 13:28
  • @SK-logic: What about performance with the new non automata-based parsing approaches? Does it match that of automata-based approaches? – Giorgio Aug 24 '12 at 14:18
  • @Giorgio, Packrat used to be considered a bit too heavy, but now it is not that much of a concern. It is still O(n) parsing, but it also requires O(n) memory to operate. Performance is matching now, so there is absolutely no benefits in choosing LALR and alike. – SK-logic Aug 24 '12 at 16:08
  • @SK-logic: I do not have much information on the topic. I imagine the industry is moving to these new methods pretty fast then. – Giorgio Aug 24 '12 at 16:26
  • 1
    @Giorgio, yes, and the industry never actually embraced yacc-like tools. All the production-quality compilers always featured the handwritten recursive descent parsing, and only now there is a slight shift towards the new declarative techniques. – SK-logic Aug 24 '12 at 17:05
6

It is told that Jeremy Ashkenas was inspired to create CoffeeScript after reading this very book. Since CoffeeScript is one of the most popular languages for javascript as target language, I guess you also can find some useful information there.

shabunc
  • 2,424
  • 2
  • 20
  • 27