48

I study the topics of compilers and interpreters intensively. I want to check if my base understanding is right, so let's assume the following:

I have a language called "Foobish" and its keywords are

<OUTPUT> 'TEXT', <Number_of_Repeats>;

So if I want to print to the console 10 times, I would write

OUTPUT 'Hello World', 10;

Hello World.foobish-file.

Now I write an interpreter in the language of my choice - C# in this case:

using System;

namespace FoobishInterpreter
{
    internal class Program
    {
        private static void Main(string[] args)
        {
            analyseAndTokenize(Hello World.foobish-file)//Pseudocode
            int repeats = Token[1];
            string outputString = Token[0];
            for (var i = 0; i < repeats; i++)
            {
                Console.WriteLine(outputString);
            }
        }
    }
}

On a very easy interpreter level, the interpreter would analyze the script-file, etc. and execute the foobish-language in the way of the interpreter's implementation.

Would a compiler create machine language which runs on the physical hardware directly?

So an interpreter doesn't produce machine language, but does a compiler do it for its input?

Do I have any misunderstandings in the basic way how compilers and interpreters work?

Peter Mortensen
  • 1,050
  • 2
  • 12
  • 14
GrayFox
  • 619
  • 1
  • 5
  • 3
  • 22
    What do you think the C# "compiler" does? As a hint, it *doesn't* produce machine code. – Philip Kendall Oct 22 '15 at 17:59
  • 3
    A Java compiler produces code for the JVM. So the target machine of a compiler can be a virtual machine that is not executed directly by the hardware. The main difference between interpreter and compiler is that a compiler first checks and translates the whole source code into a target machine language. This compiled code is then executed by the machine it was meant for. On the other hand, an interpreter will translate and execute chunks of your program on the fly. – Giorgio Oct 22 '15 at 18:01
  • @Giorgio: You mean, like a JIT? – Robert Harvey Oct 22 '15 at 18:02
  • 2
    @RobertHarvey: I meant the Java Compiler (javac): as far as I know it produces bytecode for the JVM. And, again AFAIK, the JIT later (at runtime) compiles some bytecode that is used very often into native machine language. – Giorgio Oct 22 '15 at 18:03
  • 4
    a compiler means translating. It can emit all kinds of language: c, assembly, javascript, machine code. – Esben Skov Pedersen Oct 22 '15 at 18:07
  • @Giorgio: Actually, not quite. For example, Perl 5 will first parse and check the whole program before executing it, but is still usually considered an interpreted language (see http://stackoverflow.com/questions/5376559/is-perl-a-compiled-or-an-interpreted-programming-language ). So the distinction compiled <-> interpreted is a bit fuzzy. – sleske Oct 23 '15 at 08:48
  • An aspect that seems to be unaddressed the answers is that eventually, when a program is executed, it must always resolve to electrical signals the circuitry can process. In this sense, whether interpreted or compiled, the code is always resolved to something the machine can use. – jpmc26 Oct 24 '15 at 01:41
  • @PhilipKendall It depends on your definition of machine code. Technically CIL bytecode runs on a virtual machine, so it could be argued that CIL bytecode is machine code for a virtual machine. After all, there's nothing (beyond money and practicality) to stop somone building a processor that uses CIL bytecode as its instruction set. – Pharap Oct 24 '15 at 07:29

6 Answers6

81

The terms "interpreter" and "compiler" are much more fuzzy than they used to be. Many years ago it was more common for compilers to produce machine code to be executed later, while interpreters more or less "executed" the source code directly. So those two terms were well understood back then.

But today there are many variations on the use of "compiler" and "interpreter." For example, VB6 "compiles" to byte code (a form of Intermediate Language), which is then "interpreted" by the VB Runtime. A similar process takes place in C#, which produces CIL that is then executed by a Just-In-Time Compiler (JIT) which, in the old days, would have been thought of as an interpreter. You can "freeze-dry" the output of the JIT into an actual binary executable by using NGen.exe, the product of which would have been the result of a compiler in the old days.

So the answer to your question is not nearly as straightforward as it once was.

Further Reading
Compilers vs. Interpreters on Wikipedia

Robert Harvey
  • 198,589
  • 55
  • 464
  • 673
  • I would say that compilation and interpretation are two different approaches to running a program. Even though many modern systems use both of them, IMO one can still distinguish the two mechanisms clearly. Compilation means to output code that is executed later (possibly half a second later inside the same VM). Interpretation means to look at an instruction, evaluate it, look at the next, and so on. – Giorgio Oct 22 '15 at 18:29
  • 6
    @Giorgio: Most interpreters nowadays don't actually execute the source code, but rather the output of an AST or something similar. Compilers have a similar process. The distinction is not nearly as clear-cut as you think it is. – Robert Harvey Oct 22 '15 at 18:33
  • I do not dispute that the distinction is not clear cut in real tools. Even an old Apple ][ Basic interpreter probably did some preprocessing when loading a program into memory. So in real tools you find a mix of the two approaches, but they are still two different concepts. Otherwise, why are we using two different words? – Giorgio Oct 22 '15 at 18:51
  • @Giorgio: Because historically those words meant something specific, but they don't anymore. – Robert Harvey Oct 22 '15 at 19:00
  • 2
    So when I run `gcc -S hello_world.c`, the content of the resulting file `hello_world.s` is the result of interpreting `hello_world.c`? – Giorgio Oct 22 '15 at 19:10
  • 6
    "You can "freeze-dry" the output of the JIT into an actual binary executable by using NGen.exe, the product of which would have been the result of a compiler in the old days.": But it is still today the result of a compiler (namely, the just-in-time compiler). It does not matter when the compiler is run, but what it does. A compiler takes as input a representation of a piece of code and outputs a new representation. An interpreter will output the result of executing that piece of code. These are two different processes, no matter how you mix them and when you execute what. – Giorgio Oct 22 '15 at 19:18
  • 4
    "Compiler" is simply the term they've chosen to attach to GCC. They chose not to call NGen a compiler, even though it produces machine code, preferring instead to attach that term to the previous step, which could arguably be called an interpreter, even though it produces machine code (some interpreters do that as well). My point is that nowadays there is no binding principle that you can invoke to definitively call something a compiler or interpreter, other than "that's what they've always called it." – Robert Harvey Oct 22 '15 at 19:20
  • 4
    As my very limited understanding goes, these days x86 CPUs are halfway to being hardware-based JIT engines anyway, with the assembly bearing an ever-fading relation to what exactly gets executed. – Alex Celeste Oct 23 '15 at 01:36
  • @Giorgio What about Python? It generates `pyc` files containing bytecode, but it does it at runtime. On the other hand, you can feed it instructions without even having a file (via the REPL or perhaps other mechanisms), and no bytecode file is generated. Regardless of the REPL, Python is regarded as interpreted, not compiled. – jpmc26 Oct 23 '15 at 03:26
  • 4
    @RobertHarvey while I agree that there's no clear dividing line between the techniques used in an interpreter and a compiler, there is a pretty clear division in function: if the result of executing a given tool with a program's code as input is the execution of that program, the tool is an interpreter. If the result is the output of a translation of the program into a less abstract form, it is a compiler. If the result is translation to a more abstract form, it if a decompiler. Cases where more than one of these result are ambiguous, however. – Jules Oct 23 '15 at 10:59
  • 3
    Just to be pedantic VB6 compiles to native code, unless you specifically request byte code ("p-code") – Alex K. Oct 23 '15 at 12:47
  • @Leushenko That's stretching the term "compiler" pretty far. Yes, a CPU does a coding process where it determines what to do, but it's still one instruction per thread per core at a time. It's not like it gobbles up an entire memory address range and converts it all to microcode at once for execution, which would be closer to how a compiler functions. It still works one instruction at a time, just with advanced pipeline architecture. – phyrfox Oct 24 '15 at 01:53
  • 2
    +1 to this answer, but interpreters and compilers are clearly defined. What's not so clearly defined is some languages' use of either interpreter or compiler technology. In many cases, most languages use either one technique or the other, and only rarely both. However, since most languages are described only vaguely, it's hard to tell which term correctly applies to the language. Anything with JIT is ultimately compiled, not interpreted, anything that parses one line at a time is still interpreted, not compiled. More likely, the developers of such a language abuse either word. – phyrfox Oct 24 '15 at 02:06
  • Its interesting to note that some interpreters of very dynamic languages, such as JS are also producing machine code in some places. Both V8 and the SpiderMonkey have replaced 'interpretation' with 'compilation' for hot code, although this is, of course, still done in a JIT way. There are some attempts to make binaries from this JIT code such as https://github.com/toshok/echojs. HHVM does some similar magic for PHP code. – mcfedr Oct 24 '15 at 10:23
  • @phyrfox: I assume leushenko is talking about microcode. – Robert Harvey Oct 24 '15 at 15:13
  • @jpmc26: The `python` command has a compiler phase that generates `pyc` files if they do not exist yet and an interpreter phase that runs them. This is similar to what happens with Java: `javac` compiles to a `.class` file and the JVM runs them. Yes, the JVM can further do just-in-time compilation, so one tool can implement both functions. However, the two functions are distinct. – Giorgio Oct 28 '15 at 20:56
38

The summary I give below is based on "Compilers, Principles, Techniques, & Tools", Aho, Lam, Sethi, Ullman, (Pearson International Edition, 2007), pages 1, 2, with the addition of some ideas of my own.

The two basic mechanisms for processing a program are compilation and interpretation.

Compilation takes as input a source program in a given language and outputs a target program in a target language.

source program --> | compiler | --> target program

If the target language is machine code, it can be executed directly on some processor:

input --> | target program | --> output

Compilation involves scanning and translating the entire input program (or module) and does not involve executing it.

Interpretation takes as input the source program and its input, and produces the source program's output

source program, input --> | interpreter | --> output

Interpretation usually involves processing (analyzing and executing) the program one statement at a time.

In practice, many language processors use a mix of the two approaches. E.g., Java programs are first translated (compiled) into an intermediate program (byte code):

source program --> | translator | --> intermediate program

the output of this step is then executed (interpreted) by a virtual machine:

intermediate program + input --> | virtual machine | --> output

To complicate things even further, the JVM can perform just-in-time compilation at runtime to convert byte code into another format, which is then executed.

Also, even when you compile to machine language, there is an interpreter running your binary file which is implemented by the underlying processor. Therefore, even in this case you are using a hybrid of compilation + interpretation.

So, real systems use a mix of the two so it is difficult to say whether a given language processor is a compiler or an interpreter, because it will probably use both mechanisms at different stages of its processing. In this case it would probably more appropriate to use another, more neutral term.

Nevertheless, compilation and interpretation are two distinct kinds of processing, as described in the diagrams above,

To answer the initial questions.

A compiler would create machine language which runs on the physical hardware directly?

Not necessarily, a compiler translates a program written for a machine M1 to an equivalent program written for a machine M2. The target machine can be implemented in hardware or be a virtual machine. Conceptually there is no difference. The important point is that a compiler looks at a piece of code and translates it to another language without executing it.

So an interpreter doesn't produce machine language but a compiler does it for its input?

If by producing you are referring to the output, then a compiler produces a target program which may be in machine language, an interpreter does not.

Giorgio
  • 19,486
  • 16
  • 84
  • 135
  • 9
    In other words: an interpreter takes a program P and produces its output O, a compiler takes P and produces a program P′ that outputs O; interpreters often include components that are compilers (e.g., to a bytecode, an intermediate representation, or JIT machine instructions) and likewise a compiler may include an interpreter (e.g., for evaluating compile-time computations). – Jon Purdy Oct 22 '15 at 21:20
  • "a compiler may include an interpreter (e.g., for evaluating compile-time computations)": Good point. I guess Lisp macros and C++ templates might be pre-processed in this way. – Giorgio Oct 22 '15 at 21:26
  • Even simpler, the C preprocessor compiles C source code with CPP directives into plain C, and includes an interpreter for boolean expressions such as `defined A && !defined B`. – Jon Purdy Oct 22 '15 at 21:36
  • @JonPurdy I would agree with that, but I would also add a class, "traditional interpreters", that don't make use of intermediate representations beyond perhaps a tokenized version of the source. Examples would be shells, many BASICs, classic Lisp, Tcl prior to 8.0, and bc. – hobbs Oct 23 '15 at 01:33
  • (a) `The two basic mechanisms for processing a program are compilation and interpretation.` while I dislike this particular sentence, (b) the other part of the answer with logical differentiation of producing target language code vs desired output sounds a good way to make the distinction. (c) Apart from those, how does `assembler` fit into this picture? – n611x007 Oct 23 '15 at 09:43
  • 1
    @naxa - see Lawrence's answer and Paul Draper's comments on types of compiler. An assembler is a special kind of compiler where (1) the output language is intended for direct execution by a machine or virtual machine and (2) there is a very simple one-to-one correspondence between input statements and output instructions. – Jules Oct 23 '15 at 11:16
  • @naxa: Can you suggest a better way to express (a)? I'd be glad to incorporate your suggestions in my answer. – Giorgio Oct 25 '15 at 14:18
  • I dont understand , when people say - "Interpreter outputs directly". How is it possible as a statement like "a=7+3" after going through several intermediate forms, can only be processed by ALU of processor right? How can an interpreter( a program) can perform all the operations of a processor. OR is it like this - an interpreter understands what "a= 7+3" means and it converts that statements into machine code and lets the processor run that machine code and interpreter fetches the output generated by processor? – AV94 Sep 08 '16 at 19:24
  • 2
    @anil: Of course the execution of an interpreter will be based on machine code that runs on a concrete computer. However, the output of an interpreter running on `7 + 3` will be `10`, the output of a compiler will be code in another language (e.g. machine code) that, when executed (later), will output `10`. – Giorgio Sep 08 '16 at 20:55
  • @Giorgio so at the end of the day, the concrete computer itself is an interpreter. So when some interpreter executes an instruction, that particular instruction won't be converted into machine code. It will not particularly converted, but it will become another piece of code of the interpreter application. And then that application will be interpreted by the concrete computer, hence even though the first instruction has not been directly converted into the machine code, it has been converted into the machine code some other way indirectly. Am I correct? – Buddhika Chathuranga Dec 20 '20 at 08:53
24

A compiler would create machine language

No. A compiler is simply a program which takes as its input a program written in language A and produces as its output a semantically equivalent program in language B. Language B can be anything, it doesn't doesn't have to be machine language.

A compiler can compile from a high-level language to another high-level language (e.g. GWT, which compiles Java to ECMAScript), from a high-level language to a low-level language (e.g. Gambit, which compiles Scheme to C), from a high-level language to machine code (e.g. GCJ, which compiles Java to native code), from a low-level language to a high-level language (e.g. Clue, which compiles C to Java, Lua, Perl, ECMAScript and Common Lisp), from a low-level language to another low-level language (e.g. the Android SDK, which compiles JVML bytecode to Dalvik bytecode), from a low-level language to machine code (e.g. the C1X compiler which is part of HotSpot, which compiles JVML bytecode to machine code), machine code to a high-level language (any so-called "decompiler", also Emscripten, which compiles LLVM machine code to ECMAScript), machine code to low-level language (e.g. the JIT compiler in JPC, which compiles x86 native code to JVML bytecode) and native code to native code (e.g. the JIT compiler in PearPC, which compiles PowerPC native code to x86 native code).

Note also that "machine code" is a really fuzzy term for several reasons. For example, there are CPUs which natively execute JVM byte code, and there are software interpreters for x86 machine code. So, what makes one "native machine code" but not the other? Also, every language is code for an abstract machine for that language.

There are many specialized names for compilers that perform special functions. Despite the fact that these are specialized names, all of these are still compilers, just special kinds of compilers:

  • if language A is perceived to be at roughly the same level of abstraction as language B, the compiler might be called a transpiler (e.g. a Ruby-to-ECMAScript-transpiler or an ECMAScript2015-to-ECMAScript5-transpiler)
  • if language A is perceived to be at a lower level level of abstraction than language B, the compiler might be called a decompiler (e.g. a x86-machine-code-to-C-decompiler)
  • if language A == language B, the compiler might be called an optimizer, obfuscator, or minifier (depending on the particular function of the compiler)

which runs on the physical hardware directly?

Not necessarily. It could be run in an interpreter or in a VM. It could be further compiled to a different language.

So an interpreter doesn't produce machine language but a compiler does it for its input?

An interpreter doesn't produce anything. It just runs the program.

A compiler produces something, but it doesn't necessarily have to be machine language, it can be any language. It can even be the same language as the input language! For example, Supercompilers, LLC has a compiler that takes Java as its input and produces optimized Java as its output. There are many ECMAScript compilers which take ECMAScript as their inputs and produce optimized, minified, and obfuscated ECMAScript as their output.


You may also be interested in:

Jörg W Mittag
  • 101,921
  • 24
  • 218
  • 318
  • `An interpreter doesn't produce anything. It just runs the program.` How does the system understands it? It must produce 0s and 1s to let computer understand what to do? Please explain a bit. – Vicrobot Jan 25 '20 at 14:50
  • 1
    The author of the interpreter implements the semantic rules of the language such that interpreting the program performs the side-effects and gives the results as specified in the language specification. I.e. if the interpreter encounters the term `3 + 5`, it will *execute* some piece of code that adds 3 and 5. If a *compiler* encounters the same term, it will *produce* code that *when you execute it in the future* will add 3 and 5. That's the fundamental difference between an interpreter and a compiler. A compiler *translates* from language A to language B. An Interpreter *executes*. Without – Jörg W Mittag Jan 25 '20 at 15:03
  • 1
    … interpreters, *nothing* would ever run. (Note that at the very least, there is always a CPU, which is a hardware-based interpreter for some instruction set). – Jörg W Mittag Jan 25 '20 at 15:04
  • Really helpful. Cleared so many doubts. – Vicrobot Jan 25 '20 at 15:38
16

I think you should drop the notion of "compiler versus interpreter" entirely, because it's a false dichotomy.

  • A compiler is a transformer: It transforms a computer program written in a source language and outputs an equivalent in a target language. Usually, the source language is higher-level that the target language - and if it's the other way around, we often call that kind of transformer a decompiler.
  • An interpreter is an execution engine. It executes a computer program written in one language, according to the specification of that language. We mostly use the term for software (but in a way, a classical CPU can be viewed as a hardware-based "interpreter" for its machine code).

The collective word for making an abstract programming language useful in the real world is implementation.

In the past, a programming language implementation often consisted of just a compiler (and the CPU it generated code for) or just an interpreter - so it may have looked like these two kinds of tools are mutually exclusive. Today, you can clearly see that this isn't the case (and it never was to begin with). Taking a sophisticated programming language implementation, and attempting to shove the name "compiler" or "interpreter" to it, will often lead you to inconclusive or inconsistent results.

A single programming language implementation can involve any number of compilers and interpreters, often in multiple forms (standalone, on-the-fly), any number of other tools, like static analyzers and optimizers, and any number of steps. It can even include entire implementations of any number of intermediate languages (that may be unrelated to the one being implemented).

Examples of implementation schemes include:

  • A C compiler that transforms C to x86 machine code, and an x86 CPU that executes that code.
  • A C compiler that transforms C to LLVM IR, an LLVM backend compiler that transforms LLVM IR to x86 machine code, and an x86 CPU that executes that code.
  • A C compiler that transforms C to LLVM IR, and an LLVM interpreter that executes LLVM IR.
  • A Java compiler that transforms Java to JVM bytecode, and a JRE with an interpreter that executes that code.
  • A Java compiler that transforms Java to JVM bytecode, and a JRE with both an interpreter that executes some parts of that code and a compiler that transforms other parts of that code to x86 machine code, and an x86 CPU that executes that code.
  • A Java compiler that transforms Java to JVM bytecode, and an ARM CPU that executes that code.
  • A C# compiler that transforms C# to CIL, a CLR with a compiler that transforms CIL to x86 machine code, and an x86 CPU that executes that code.
  • A Ruby interpreter that executes Ruby.
  • A Ruby environment with both an interpreter that executes Ruby and a compiler that transforms Ruby to x86 machine code, and an x86 CPU that executes that code.

...and so on.

  • +1 for pointing out that even encodings that were designed for intermediate representation (eg java bytecode) can have hardware implementations. – Jules Oct 24 '15 at 09:07
7

While the lines between compilers and interpreters has gotten fuzzy over time, one can still draw a line between them by looking at the semantics of what the program should do and what the compiler/interpreter does.

A compiler will generate another program (typically in a lower-level language like machine code) which, if that program is run, will do what your program should do.

An interpreter will do what your program should do.

With these definitions, the places where it gets fuzzy are the cases where your compiler/interpreter can be thought of as doing different things depending on how you look at it. For example, Python takes your Python code and compiles it into a compiled Python bytecode. If this Python bytecode is run through a Python bytecode interpreter, it does what your program was supposed to do. In most situations, however, Python developers think of both of those steps being done in one big step, so they choose to think of the CPython interpreter as interpreting their sourcecode, and the fact that it got compiled along the way is considered an implementation detail. In this way, it's all a matter of perspective.

Peter Mortensen
  • 1,050
  • 2
  • 12
  • 14
Cort Ammon
  • 10,840
  • 3
  • 23
  • 32
5

Here's a simple conceptual disambiguation between compilers and interpreters.

Consider 3 languages: programming language, P (what the program is written in); domain language, D (for what goes on with the running program); and target language, T (some third language).

Conceptually,

  • a compiler translates P to T so that you can evaluate T(D); whereas

  • an interpreter evaluates P(D) directly.

Lawrence
  • 637
  • 3
  • 10
  • 1
    Most modern interpreters don't actually evaluate the source language directly, but rather some intermediate representation of the source language. – Robert Harvey Oct 22 '15 at 20:14
  • 4
    @RobertHarvey That doesn't change the conceptual distinction between the terms. – Lawrence Oct 22 '15 at 20:19
  • 1
    So what you're really referring to as the interpreter is the part that evaluates the intermediate representation. The part that *creates* the intermediate representation is a *compiler*, by your definition. – Robert Harvey Oct 22 '15 at 20:22
  • 6
    @RobertHarvey Not really. The terms are dependent on the level of abstraction you're working at. If you look underneath, the tool could be doing anything. By analogy, say you go to a foreign country and bring a bilingual friend Bob along. If you communicate with the locals by talking to Bob who in turn talks to the locals, Bob acts as an interpreter to you (even if he scribbles in their language before talking). If you ask Bob for phrases and Bob writes them in the foreign language, and you communicate with the locals by referring to those writings (not Bob) Bob acts as a compiler for you. – Lawrence Oct 22 '15 at 20:55
  • @RobertHarvey: Maybe the part that creates the internal representation if just a parser. So, yes, an interpreter and a compiler both need a parser as their front-end. – Giorgio Oct 22 '15 at 21:02
  • @RobertHarvey, in Java regexes are not compiled by javac. You could say they are interpreted at runtime. But there is [`Pattern.compile`](http://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html). It translates the regex into an intermediate form. – Paul Draper Oct 22 '15 at 23:15
  • 1
    Excellent answer. Worth noting: Nowadays you may hear "transpiler". That's a compiler where P and T are similar levels of abstraction, for some definition of similar. (E.g. a ES5 to ES6 transpiler.) – Paul Draper Oct 22 '15 at 23:16
  • 1
    @PaulDraper: other specialized names for compilers are *decompiler*, where **P** is at a *lower* level of abstraction than **T**, and *optimizer*, *minifier*, or *obfuscator* (depending on the particular function of the compiler), when **P** == **T**. – Jörg W Mittag Oct 23 '15 at 07:37
  • @Lawrence, A human language interpreter does for human languages what a computer language _compiler_ does for computer languages; _not_ what a computer language interpreter does. – Solomon Slow Oct 23 '15 at 17:51
  • @jameslarge Interesting perspective :) . I understand your point of view and that analogies can be reinterpreted. Nevertheless, your comment highlights a conceptual distinction between (computer language) interpreters and compilers. My answer tries to concisely express the essence of this distinction to help clarify the OP's thinking in relation to this question. – Lawrence Oct 24 '15 at 01:13