12

I've read about meta-circular interpreters on the web (including SICP) and I've looked into the code of some implementations (such as PyPy and Narcissus).

I've read quite a bit about two languages which made great use of metacircular evaluation, Lisp and Smalltalk. As far as I understood Lisp was the first self-hosting compiler and Smalltalk had the first "true" JIT implementation.

One thing I've not fully understood is how can those interpreters/compilers achieve so good performance or, in other words, why is PyPy faster than CPython? Is it because of reflection?

And also, my Smalltalk research led me to believe that there's a relationship between JIT, virtual machines and reflection. Virtual Machines such as the JVM and CLR allow a great deal of type introspection and I believe they make great use it in Just-in-Time (and AOT, I suppose?) compilation. But as far as I know, Virtual Machines are kind of like CPUs, in that they have a basic instruction set. Are Virtual Machines efficient because they include type and reference information, which would allow language-agnostic reflection?

I ask this because many both interpreted and compiled languages are now using bytecode as a target (LLVM, Parrot, YARV, CPython) and traditional VMs like JVM and CLR have gained incredible boosts in performance. I've been told that it's about JIT, but as far as I know JIT is nothing new since Smalltalk and Sun's own Self have been doing it before Java. I don't remember VMs performing particularly well in the past, there weren't many non-academic ones outside of JVM and .NET and their performance was definitely not as good as it is now (I wish I could source this claim but I speak from personal experience).

Then all of a sudden, in the late 2000s something changed and a lot of VMs started to pop up even for established languages, and with very good performance. Was something discovered about the JIT implementation that allowed pretty much every modern VM to skyrocket in performance? A paper or a book maybe?

yannis
  • 39,547
  • 40
  • 183
  • 216
Gomi
  • 332
  • 1
  • 4
  • 3
    Money. The money that used to be poured into C++ and Fortran now is poured into the HotSpot, CLR, Mono, V8, Nitro, SpiderMonkey, etc. – Jörg W Mittag Sep 30 '13 at 23:54
  • I can only guess, but I think it is just improvement over time, like described here http://www.joelonsoftware.com/articles/fog0000000017.html – Doc Brown Oct 01 '13 at 06:01
  • You can do some tricky optimisations when runtime information is available, e.g., partial specialisation based on more narrow types and runtime constants. LLVM falls out of this trend, it does not do any runtime-guided optimisations, it's rather classical compiler backend, not any different from the GCC infrastructure. – SK-logic Oct 01 '13 at 07:49
  • 1
    RE how PyPy can be faster than CPython: [It isn't written in Python, it's written in a quite different language that can be AOT-optimized effectively](http://stackoverflow.com/a/8797731/395760). –  Oct 01 '13 at 10:08
  • @delnan Thanks for the link. But I wonder, couldn't all of that be achieved in CPython or maybe Cython? I guess writing a Python interpreter in Python is a bit easier since there's a 1:1 (well, not really since RPython is a subset, but still) relationship between the parsed code and the executable code. I've found [this](https://www.youtube.com/watch?v=NIcijUt-HlE) video to be very useful to the purpose of understanding, although I'd still like to read more in-depth stuff about JIT. – Gomi Oct 01 '13 at 10:44
  • 1
    @Gomi It's not about how similar the implementation language is to the implemented language. There are JavaScript, Lisp, Prolog, SmallTalk and Ruby interpreters written in RPython and they get exactly the same goodies PyPy offers. The only reason RPython is based on Python is that it was created by a bunch of Python enthusiasts. The features of RPython that make PyPy fast have nothing to do with Python: Automatic JIT compiler generation, the garbage collectors, etc. - and yes, most of that could in principle be done using other languages. You'd have to create a whole new compiler though. –  Oct 01 '13 at 15:29
  • @delnan So are you saying that it's for similar reasons languages in the ML family are popular for writing compilers and interpreters? [This](http://flint.cs.yale.edu/cs421/case-for-ml.html) paper refers to OCaml but what I'm asking is if it's an analogue concept. – Gomi Oct 01 '13 at 16:21
  • @Gomi If by "it" you mean the fact that RPython is a subset of Python, not a subset of any other language, then no. The reasons for ML in that document are technical, objective, and good. The reasons that RPython is what it is are social, subjective, and arguably bad. There are some parts of Python that RPython benefits from (automatic memory management being the main one), but dozens of other languages have those too. In fact, a recent superficially similar project (Graal) uses a subset of Java in a role similar to RPython, and Java is very different from Python (as mainstream languages go). –  Oct 01 '13 at 16:48
  • @delnan I wasn't talking about RPython or even Python particularly, just about the concept of a language offering a particular set of functionalities (such as GC). What I mean is that wouldn't for example writing a Python (or any language that shares some of its features) interpreter in for example (let's throw ML out for now) a (maybe safer) dialet of Lisp allow me to achieve the same results as writing it in RPython? – Gomi Oct 01 '13 at 17:20
  • @Gomi As I already said at the end of my second comment, yes. You'd just need to re-implement the automatic generation of JIT compilers, at least one decent garbage collector, and a bunch of other stuff RPython does. That would be a significant engineering challenge, but not impossible or even novel research. –  Oct 01 '13 at 18:42
  • @delnan Now the pieces fit together, although I realized there is still much I need to learn to fully understand JIT. Thanks. – Gomi Oct 01 '13 at 18:48
  • 4
    -1 because you seem to have at least 3 *different* questions here: (a) Why are meta-circular implementations so good? (b) Are VMs efficient because of type information, and is introspection beneficial for performance? (c) How come VM popularity surged in the late 2000s, and how come they all of a sudden have good performance? I think it's better to ask those questions separately. – Oak Oct 02 '13 at 06:39

1 Answers1

1

2 out of 3: There is no relationship between "meta-circular" and "high-performance" language runtimes. Meta-circular runtimes which achieve high performance do so by JIT-compiling to native code, and running the native code. There is no reason why your hi-perf Python runtime has to be written in Python, or Lisp in Lisp, etc. But if you think that your language is more powerful, expressive, etc. than the others, why not use it to write its own runtime? Or if you don't think that your language is somehow "better" than others, why are you going to the trouble to implement it at all?

Alex D
  • 1,308
  • 9
  • 14