How do programming languages work?

Question

This is probably a dumb question, but how do programming languages work on a low level? If you go to the Go language GitHub page here, it says almost 90% of the source files are Go files. How is it possible that a programming language is made up of itself, especially 90% of itself? I can kind of understand how a language such as Lua is written in C, but Go is made-up of mostly Go files. It doesn't make much sense to me, how do the developers use Go to create Go?

Possible duplicate of [Why are self-hosting compilers considered a rite of passage for new languages?](http://programmers.stackexchange.com/questions/263651/why-are-self-hosting-compilers-considered-a-rite-of-passage-for-new-languages) — gnat, Oct 13 '16 at 23:18
The answers are right, but I don't see any mention of the concept of "universality", which is what allows any programming language to do what any other can do, including simulating (i.e. pretending to be) any other. (Just to ward off flak, there are a lot of caveats to this, like talking about unlimited memory, etc. but if a language cannot be though of as universal, it doesn't join the club.) — Mike Dunlavey, Oct 14 '16 at 00:21
@MikeDunlavey I didn't call it "universality" or "Turing-equivalence", but I touched on the basic principle when I wrote, "The quick version is that if a programming language is capable of basic file IO and data processing, it's capable of implementing a compiler for a language, including its own language." — Mason Wheeler, Oct 14 '16 at 00:50
@MikeDunlavey: A language for writing a compiler in doesn't need to be universal. A compiler essentially is (or can be modeled as) a pure, total function. — Jörg W Mittag, Oct 14 '16 at 06:34
@connorb08: The answer is simply that the *first* Go compiler was written in a different pre-existing language. Then when you have the first Go compiler working, you can write a Go-compiler in Go. — JacquesB, Oct 14 '16 at 06:51

Jörg W Mittag · Accepted Answer · 2016-10-13T23:57:44.143

A compiler or interpreter is a program just like any other program. You can write programs in any programming language you want, including Go. Ergo, you can write a compiler in Go. Including a Go compiler.

Why not?

Of course, in order to actually use that compiler or interpreter, you need to have a compiler or interpreter for that language as well.

In the case of Go, for example, the first several versions of the compiler were written in C. So, they already had a working Go compiler. Then, it's no problem to write a Go compiler in Go, since you already have a working Go compiler written in C, which you can use to compile the Go compiler written in Go. And now, you have compiled version of your Go compiler written in Go, and you can use that to compile future versions of your Go compiler written in Go.

This is called "bootstrapping" (after the old tale of Baron Münchhausen, who pulled himself out of the mud by his own bootstraps).

Note that for Go specifically, there are multiple different compilers for Go, and at least gccgo continues to be written in C++; there is no Go code in gccgo. So, you can always use gccgo to re-start the bootstrapping process, should you ever lose your compiled Go binary.

A compiler that is written in the language it compiles, and that is capable of compiling itself, is called "self-hosting". There are a couple of advantages to a self-hosting compiler:

When working on a compiler, you need to know three languages: the language you are compiling (the source language), the language you are compiling to (the target language), and the language you are writing the compiler in (the implementation language). Self-hosting allows you to get rid of one of them. This increases the amount of people able to work in the compiler by lessening the amount of knowledge a potential contributor needs to possess.
Production-grade industrial-strength high-performance compilers are large, complex, resource-intensive programs. They are a good test for your language (can your language's abstraction features handle such a large and complex project?) and your compiler (if the compiler can compile itself, then it probably also can compile other large, complex programs).
If your compiler is very simple, the code of the compiler can serve as a specification of the language's behavior. (In general, production-grade compilers aren't simple and simple compilers aren't production-grade, though. Also, this shouldn't be your only specification, otherwise you'll never be able to tell whether or not your compiler is correct.)
Self-hosting is considered to be an important milestone for a language.

There are also some disadvantages:

The complex bootstrap process.
If the compiler writer is also the language designer, there is the danger that he will add features that that he can use while writing the compiler, and leave out features that are hard to write a compiler for, thus ending up with a language that is only good for writing compilers and nothing else. (That's not necessarily a bad thing if you are designing a language for writing compilers.)

In one of his articles, Prof. Niklaus Wirth gave a nice example of the latter: when designing the Oberon language, he wrote the compiler at the same time he was designing the language. The system he was writing on, only had an obscure proprietary dialect of Fortran. After same time, he realized that he had subconsciously left out or changed features that would make it easier to write programs in Oberon because he couldn't think of a nice way to implement them in the obscure Fortran dialect. So, he threw away the compiler, re-examined his design and started a new compiler in Oberon itself. Oberon was intended to be a systems programming language, so writing a compiler and a standard library in it was a natural choice.

Now, the question is, how did he solve the bootstrap problem? Well, he was a professor, after all: he handed out portions of the compiler to his students, to manually translate by hand into Fortran.

score 3 · Answer 2 · answered Oct 13 '16 at 23:04

3

This is a very simple question with a very deep answer. A full explanation is beyond the scope of a site like this, but I'll give you enough to get started learning about it.

First, what is a programming language? Or, better put, what defines a programming language? Most professional developers would agree that a language is defined by two primary things: the compiler/interpreter and the standard library. The compiler sets out the syntactical and semantic rules of how the language works, and the standard library helps to establish paradigms and idioms for what the language is most useful for.

Understanding that, we can answer your question. The quick version is that if a programming language is capable of basic file IO and data processing, it's capable of implementing a compiler for a language, including its own language. How do they write Go in Go? They wrote a Go compiler in Go, of course!

But that's not the thing you really want to know, is it? You're asking about the chicken-and-egg problem inherent in that statement: how can you write the first compiler in a language that doesn't have a compiler for itself yet?

Obviously you can't, so you write the first compiler in another language. This is known as "bootstrapping." It doesn't even need to implement the entire language, though; just enough that you'd be able to compile a Go compiler that can do the same things as your bootstrap compiler can do (ie. compile itself.)

At that point, you have a working compiler in Go, and you can then build new features into the compiler, compile them, and have a new and improved compiler, and so on, until you've built up the full language as designed.

answered Oct 13 '16 at 23:04

Mason Wheeler

82,151
24
234
309

1

Not too relevant to the question at hand, but I take slight issue with your definition of a programming language. Really, a language is independent of its interpreter or compiler. A language could (and I believe should) be defined through formal syntax and semantic systems. It's not even that difficult. Using the compiler as the *de facto* definition is a huge flaw, in my opinion. – gardenhead Oct 14 '16 at 00:00
1

I feel like the usage of "define" in the second paragraph can be misleading. I think the intent was not to use the compiler as a formal specification (as @gardenhead pointed out), but to assert that even different compilers for the same language can change your perception of that language, as they have different characteristics. These differences can range from differently generated assembly instructions (e.g. better optimization) to whole new language constructs (e.g. C++ compiler vendor extensions) to having a "real" compiler instead of an interpreter.. – hoffmale Oct 14 '16 at 00:28
@gardenhead The compiler **is** the *de facto* definition of a language. The formal specification (when there is one) is the ideal of the language, but when you get down to it, the code does what the compiler says it does. – Mason Wheeler Oct 14 '16 at 00:47
1

@MasonWheeler You are wrong, and that is an extremely warped view you have. If the compiler doesn't do what the specification says, then the compiler is buggy. We should fix buggy code, not rebrand it as a feature. By your logic, if the processor is buggy, then oh well, we just got a new version of the language - everybody better get used to it. – gardenhead Oct 14 '16 at 01:02
@MasonWheeler: The question is … which compiler defines the language? There are two compilers for Go, `gc` (the original implementation, created from scratch, originally written in C, now in Go, modeled after the Plan9 C compiler) and `gccgo` (a Go frontend for GCC, originally written in C, now in C++); there used to be a third, proprietary commercial one, designed specifically for running on and compiling for Windows, but the company seems to have folded (interestingly this one was written 100% in Go, long before `gc` was). There are about 50-100 compilers for C (and 2 interpreters). – Jörg W Mittag Oct 14 '16 at 06:44
@JörgWMittag Then you have multiple different versions of the language, obviously. – Mason Wheeler Oct 14 '16 at 12:32

score 1 · Answer 3 · edited May 23 '17 at 12:40

1

In the same way that I can use English to describe and define English.

Programming languages aren't written as such. Instead they are more of a description of how they should work.

The compiler is the actual implementation. Originally I think it was written in C. But once you can compile the language, you can then create a compiler with that language.

Similar question: When someone writes a new programming language, what do they write it IN?

edited May 23 '17 at 12:40

Community

1

answered Oct 13 '16 at 22:57

Rowan Freeman

3,478
4
30
41

1

You are right, `gc` was originally written in C. Plus, there is `gccgo`, which is still written in C++, and probably won't be re-written in Go anytime soon (or ever). – Jörg W Mittag Oct 13 '16 at 23:58

How do programming languages work?

3 Answers3