32

A lot of questions get asked here about interpreted vs compiled language implements. I'm wondering whether the distinction actually makes any sense. (Actually the questions are usually about languages, but they are really thinking about the most popular implementations of those languages).

Today almost no implementation is strictly interpreted. i.e. pretty much nobody parses and runs the code one line at a time. Additionally, implementation which compile to machine code are also becoming less common. Increasingly, compilers target some sort of virtual machine.

In fact, most implementation are converging on the same basic strategy. The compiler produces bytecode which is interpreted or compiled to native code via a JIT. It is really a mix of the traditional ideas of compilation and interpretation.

Thus I ask: Is there a useful distinction between interpreted implementations and compiled implementation these days?

Astrophe
  • 113
  • 4
Winston Ewert
  • 24,732
  • 12
  • 72
  • 103
  • 2
    Your basic assumptions are invalid. The VM model is new and there are some popular languages in that realm, but it's never going to replace the existing models, and there's no convergence on it at all. – DeadMG Feb 26 '12 at 19:01
  • 7
    @DeadMG Not as new as you may think: [A brief history of just-in-time](http://dl.acm.org/citation.cfm?doid=857076.857077)... – yannis Feb 26 '12 at 19:06
  • 4
    @DeadMG Given that the majority of new languages introduced in the last 10 years or so primarily run on some kind of VM, I'd say he has a point. Of course there still are (and will be for decades to come) languages compiled to native code, and a JIT will remain luxury (or not, if the PyPy guys have their way). So yes, possible overstatement, but I agree that the mainstream (for now and the forseeable future) seems to be bytecode compiler + possibly JIT. –  Feb 26 '12 at 19:08
  • 4
    @DeadMG, you must have a long white beard, if the VM model is "new" for you. `P-code` had been introduced in 1966 first. IBM Aix is around since 1986. – SK-logic Feb 27 '12 at 08:33
  • 7
    Things like unix shells, Tcl and alike would always be purely interpreted, so the distinction makes sense, at least in an academic CS. But it is true that when coders are mumbling about interpreters vs. compilers they're not making any sense in most of the cases. – SK-logic Feb 27 '12 at 08:35
  • 3
    @SK-logic, I think your comment is a better answer then any of the answers actually posted – Winston Ewert Feb 27 '12 at 17:53

7 Answers7

26

It's important to remember that interpreting and compiling are not just alternatives to each other. In the end, any program that you write (including one compiled to machine code) gets interpreted. Interpreting code simply means taking a set of instructions and returning an answer.

Compiling, on the other hand, means converting a program in one language to another language. Usually it is assumed that when compilation takes place, the code is compiled to a "lower-level" language (eg. machine code, some kind of VM bytecode, etc.). This compiled code is still interpreted later on.

With regards to your question of whether there is a useful distinction between interpreted and compiled languages, my personal opinion is that everyone should have a basic understanding of what is happening to the code they write during interpretation. So, if their code is being JIT compiled, or bytecode-cached, etc., the programmer should at least have a basic understanding of what that means.

Astrophe
  • 113
  • 4
Zach Smith
  • 361
  • 2
  • 2
  • 4
    Yes the programmer should have a basic understanding. But I wonder if the compiled/interpreted terminology doesn't get in the way of that. – Winston Ewert Feb 26 '12 at 23:33
  • 3
    Thank you!! Interpreted is just a synonym for "executed", and that's how *all* programs are run. – gardenhead Oct 26 '16 at 14:53
12

The distinction is deeply meaningful because compiled languages restrict the semantics in ways that interpreted languages do not necessarily. Some interpretive techniques are very hard (practically impossible) to compile.

Interpreted code can do things like generate code at run time, and give that code visibility into lexical bindings of an existing scope. That's one example. Another is that interpreters can be extended with interpreted code which can control how code is evaluated. This is the basis for ancient Lisp "fexprs": functions that are called with unevaluated arguments and decide what to do with them (having full access to the necessary environment to walk the code and evaluate variables, etc). In compiled languages, you can't really use that technique; you use macros instead: functions that are called at compile time with unevaluated arguments, and translate the code rather than interpreting.

Some language implementations are built around these techniques; their authors reject compiling as being an important goal, and rather embrace this kind of flexibility.

Interpreting will always be useful as a technique for bootstrapping a compiler. For a concrete example, look at CLISP (a popular implementation of Common Lisp). CLISP has a compiler that is written in itself. When you build CLISP, that compiler is being interpreted during the early building steps. It is used to compile itself, and then once it is compiled, compiling is then done using the compiled compiler.

Without an interpreter kernel, you would need to bootstrap with some existing Lisp, like SBCL does.

With interpretation, you can develop a language from absolute scratch, starting with assembly language. Develop the basic I/O and core routines, then write an eval, still machine language. Once you have eval, write in the high level language; the machine code kernel does the evaluating. Use this facility to extend the library with many more routines and write a compiler also. Use the compiler to compile those routines and the compiler itself.

Interpretation: an important stepping stone in the path leading to compilation!

Kaz
  • 179
  • 2
  • 2
    IMO, this is the best answer. I'm working on my own toy language and the last paragraph describes the way I'm developing it. It really makes working on new ideas much easier. Also +1 for mentioning CLISP bootstrap process. – sinan Jul 11 '12 at 07:07
  • In theory, any “interpreted” language can be made into a “compiled” one by generating an EXE file consisting of the interpreter plus the source code or bytecode for the interpreted program. Might not be very efficient, though. – dan04 Oct 26 '16 at 16:55
  • Read up on how Wirth et al invented P-code to simplify porting PASCAL to new machine architectures. That was in the early 1970s. – John R. Strohm Oct 26 '16 at 19:59
  • 2
    I suspect your opening paragraph is confusing compilation and interpretation with static and dynamic behavior, but I'll give you the benefit of the doubt and just ask for an example of a language with semantics that are "practically impossible" to compile. Regarding bootstrapping a compiler, it is true that the first implementation needs to be written in something other than the language you're implementing, but it does not have to be an interpreter, it could be a compiler written in another language. – 8bittree Aug 03 '17 at 22:38
1

Actually lots of implementations of languages are still strictly interpreted, you just may not be aware of them. To name a few: the UNIX shell languages, The Windows cmd and PowerScript shells, Perl, awk, sed, MATLAB, Mathematica and so on.

Charles E. Grant
  • 16,612
  • 1
  • 46
  • 73
  • 3
    I believe Perl is internally compiled to bytecode, and at least Mathematica can be compiled. And nothing dictates the implementation of awk and sed (I believe some of the GNU coreutils compile regular expressions to finite automata before execution, which would arguably put them into the "compile to intermediate representation, interpret that" category). –  Feb 26 '12 at 19:12
  • 1
    Actually I'm pretty sure that Perl, MATLAB, Mathematica all compile to bytecode. I'm not familiar with PowerScript, do you mean Powershell? If so, that using the CLR and so does use bytecode. – Winston Ewert Feb 26 '12 at 19:23
  • @WinstonEwert, sorry I did mean PowerShell. My understanding is that the translation into an intermediate form does not mean that something is not interpreted. Heck, the original Dartmouth BASIC interpreter translated the source into tokens before interpreting. Each of the tools I mentioned has a mode where it 1) reads a line of source, 2) translates that line into an executable form (possibly some intermediate code rather than native code), 3) executes the code for that line, 4) loop back to 1). That corresponds to my understanding of an interpreter. – Charles E. Grant Feb 26 '12 at 19:54
  • Given that definition of an interpreter, do those languages cease to be interpreted when executed as complete programs? – Winston Ewert Feb 26 '12 at 20:21
  • That's called a "Read-Eval-Print-Loop" (REPL), and in many (nearly all I'm aware off) languages it's implemented by compiling and executing the code snippets as they are input, keeping the execution eviroment around. The classic, and more useful, definition of an interpreter is a piece of software that carries out the semantics of a program described in some data structure/program representation. Translation to an intermediate representation is by definition compilation! Your confusion may stem from the fact that in some cases, the result is immediate fed to an interpreter for the IR. –  Feb 26 '12 at 20:54
  • @WinstonEwert, it depends on the details of the implementation. The garden variety versions of bash and awk still process the input a line at a time. On the other hand, PowerShell cmdlets are compiled. The interesting thing about compilers is that they have the opportunity to look at the global and regional structure of a program and make appropriate optimizations. Interpreters are processing the code a statement at a time, and so have limited opporutnities for optimization. Whether or not an intermediate representation is used in the process is less significant. – Charles E. Grant Feb 26 '12 at 21:28
  • My point is that half of the languages you list aren't processed one statement at a time during normal program execution. They only do that when running a read-eval-print loop. – Winston Ewert Feb 26 '12 at 21:34
  • @delan can you point me to a cite for your definition of a compiler vs an interpreter? I'm not a language guy, and it's been decades since my class on language design, so it's entirely possible I'm, completely off base here. However, your definition would seem to conflict with common usage. I mean, even the classic BASIC interpreters had to generate a parse tree for each line. Does that mean a BASIC interpreter should really be called a compiler? – Charles E. Grant Feb 26 '12 at 21:36
  • @WinstonEwert, I'll give you PowerShell and Perl, but I'm still pretty confident that cmd, the UNIX shells, Mathematica, and MATLAB are processed one statement at a time. Heck, even the MATLAB "compiler" isn't actually a compiler, it's just a script packager (see http://www.mathworks.com/support/solutions/en/data/1-1ARNS/). – Charles E. Grant Feb 26 '12 at 21:41
  • You'll see that Matlab is listed on the wikipedia page for bytecode: http://en.wikipedia.org/wiki/Bytecode. What MATLAB calls a compiler is a packager, but they still have a bytecode compiler they are using behind the scenes, Mathematic's page mentions bytecode with JIT, http://www.wolfram.com/solutions/hpc/, I don't know the implementation of shell scripting, but they probably do go line by line. But very few modern languages do. – Winston Ewert Feb 26 '12 at 21:54
  • @WinstonEwert, but byte code doesn't imply compiled! There are byte code interpreters and byte code compilers. I haven't been able to find a definitive statement for the MATLAB implementation, but it is listed as an interpreted language in this article in this Wikipedia article: http://en.wikipedia.org/wiki/Interpreter_(computing). If you have a good source claiming it's compiled I'd love to see it. It turns out that Mathematica is a mixed mode language. Some statements will automatically invoke a JIT, and some will not: http://www.wolfram.com/technology/guide/TransparentAutoCompilation/. – Charles E. Grant Feb 27 '12 at 00:44
  • 2
    Bytecode does imply compiled. A bytecode compiler is simply a program that takes the source and converts it to bytecode. Hence all uses of bytecode must involve a bytecode compiler. But bytecode also has to be interpreted (or JITted). So anything using bytecode is an interpreter/compiler hybrid. – Winston Ewert Feb 27 '12 at 00:59
  • @WinstEwert, leaving my quibbles, and returning to your original question, "Is there a useful distinction between interpreted implementations and compiled implementation these days?". I think Mathematica offers a clear case where the answer is "yes". If you are trying to write efficient Mathematica code you'll need to be aware which functions are automatically compiled, which default to being interpreted, and when you should override the default. – Charles E. Grant Feb 27 '12 at 01:01
  • True. But that's contrasting between different modes within one language implementation, not contrasting between the implementation of say Python and Java. So that's not really what I was getting at with this question. – Winston Ewert Feb 27 '12 at 01:08
  • 4
    Really, my thing is that people toss out statements like "python is interpreted" and "Java is compiled" with no understanding of the implementations. I'm questing whether its even useful to describe an implementation in those terms. The truth usually more complicated, and trying to boil it down to interpreted/compiled isn't useful. – Winston Ewert Feb 27 '12 at 01:16
  • @WinstonEwert as of version 8, Mathematica can be compiled down to machine code which is dynamically linked into the kernel by setting the option [`CompilationTarget -> "C"`](http://reference.wolfram.com/mathematica/ref/CompilationTarget.html) on `Compile`. – rcollyer Mar 16 '12 at 02:04
1

I think: Absolutely Yes.

In fact, most implementation are converging on the same basic strategy

Really, C++ aims to port to compiler domain some high level concept that's usually handed to interpreters, but it stays on the minority side...

CapelliC
  • 121
  • 3
  • 2
    Wait until Clang+LLVM become the most popular compiler toolchain. – SK-logic Feb 27 '12 at 08:38
  • @SK-logic, despite the name, I believe that Clang+LLVM produces native code. – Winston Ewert Feb 27 '12 at 15:15
  • 1
    @Winston Ewert, only if you want to. You can stop on LLVM IR level and do whatever you want with it then - interpret it, JIT-compile it, instrument it any way you like. You can even translate it to Javascript and then pass through an interpreter: https://github.com/kripken/emscripten/wiki – SK-logic Feb 27 '12 at 15:20
  • @SK-logic, neat stuff! Didn't know LLVM could do that. – Winston Ewert Feb 27 '12 at 15:25
  • The IR that llvm creates is on a par with the ir used in compilers normally. Not the same as the bytecodes or pseudo instruction sets imposed by the so called interpreted languages. All are compilers just compile to a different target. llvm for the most part took the common compiler approach and exposed and documented the internals. choosing a bytecode closer to a real processor than something easy to make a (slow) backend for. We will see if they can maintain the cleanliness or if it starts to turn into a huge ball of duct tape and bailing wire like gcc. Only time will tell. – old_timer Feb 28 '12 at 07:18
  • 1
    The beauty of llvm is this deliberate separation between the front end and back end. And the tools for manipulating the middle before targeting the instruction set. You can merge your entire project into bytecode and then optimize across the whole thing, with other compilers you would need to have a single file source or at least one that includes its way down through the source tree so the compiler acts on one combined source. Also one set of tools under llvm is generic to all targets, you dont have to build for each target, one compiler fits all (at least to the asm of the target). – old_timer Feb 28 '12 at 07:22
1

It's helpful to remember that "compiled" is not just the name of a category of programs but is also a verb (specifically, the past participle of "to compile"). In other words, any program that falls into the "compiled" category had to have been passed through a compilation event or process. This is a qualitative difference from programs that are not compiled; it is not gray at all.

A compilation process includes more than translating source code to object code. It also includes several features that are unique to compilation:

  1. Compile-time ("preprocessing") directives, conditionals, and variables
  2. Compile-time validation, e.g. syntax and type checking
  3. Compile-time errors
  4. Early binding
  5. Static linking
  6. Certain optimizations, such as string interning and tail calls

This leads to certain key differences, For example, a program that is compiled must contain valid (although not necessarily correct) source code. In contrast, an interpreted program may contain source that is not valid. It is possible for a Javascript program to run just fine until it encounters an invalid line of code, at which point it simply halts or errs out.

I suppose there is something of a gray area when it comes to JIT-compiled code, which in some respects is handled like interpreted code-- invalid source code, for example, may not cause a problem until the JIT compilation occurs. However, once the JIT compilation is complete, you can be assured that the code is now valid.

Is this distinction "useful?" I guess that is a matter of opinion. However, I would probably not hire a software engineer who did not understand the difference.

John Wu
  • 26,032
  • 10
  • 63
  • 84
0

Useful distinction: interpreted programs can modify themselves by adding or changing functions at runtime.

Diego Pacheco
  • 297
  • 1
  • 4
  • 8
    Nonsense. Self-modifying (machine) code is the oldest trick in the book. Then again, some argue even native code is ultimately interpreted by an interpreter cast into silicon (the CPU). But if we assume that, all code is interpreted and there's no distinction to make. –  Feb 26 '12 at 19:09
  • 2
    @delnan is right. I'll just add that modern languages can modify themselves by creating new classes dynamically and loading/unloading libraries (or "assemblies" in .NET for example) – Jalayn Feb 27 '12 at 07:35
  • 5
    Common Lisp is compiled, but you still can replace function definitions in runtime easily. – SK-logic Feb 27 '12 at 08:37
  • That's a really interesting and necessary (for instance in Prolog) feature of interpretation. – CapelliC Nov 18 '12 at 11:18
0

I think the distinction has become very gray. I was researching how to have Java call Matlab, tripped over the concept that Java is compiled while Matlab is, well, compiled to a lower level these days. Coming from a digital circuit design background, I know that even register level operations can be represented by a sequence of code corresponding to control flow logic for state machine transitions in state. Indeed, the code describing exactly how and when these "events" occur at a hardware level can look remarkably like 3GL source code (when described in Verilog, VHDL, SystemC). Do we call such high level views of low-level operations "source code" to be compiled? What if the operations are specified in detail so that there is no freedom for the compiler to make a decision on the final Boolean logic? Then it's just matter of labelling the hardware, states, and I/O with designer-friendly names -- is that still source code to be compiled? Even viewing zeros and ones in an editor is an abstraction, as the actual binary states are the result of transistor gate layouts with certain charge levels and voltages at certain nodes. Along these lines, should we consider logic cell selection, layout, and interconnect routing as degrees of freedom that qualify the mapping of coded description to hardware & its operation as "compilation"?

Despite the blurring, however, I still think of Matlab as interpretted-like, since it has an interactive command line. I can navigate and manipulate data objects in between command line statements. VBA is compiled, but it has the "Immediate Window" wherein I can do the same. C++ debuggers give me a similar kind of environment, though I haven't used a 3GL in decades, and the functionality was quite awkward back in the day. Then again, Java is "compiled" and a web search shows that it too has an interactive command line in which to learn Java (which I do not know).

All this to say that even if we draw back from the technical details and look at the question only from the black-box perspective of how it feels and what one can do at the console, it's still pretty blurry.

user2153235
  • 101
  • 1