5

This Turing award lecture by Ken Thompson on topic "Reflections on Trusting Trust" gives good insight about how C compiler was made in C itself.

Though I understand the crux, it still hasn't sunk in. So ultimately, once the compiler is written to do lexical analysis, parse trees, syntax analysis, byte code generation etc, a separate machine code is again written to do all that on compiler?

Can anyone please explain with a small example of the procedure? Bootstrapping on wiki gives good insights, but only a rough view on it.

PS: I am aware of the duplicates on the site, but found them to be an overview which I am already aware

Robert Harvey
  • 198,589
  • 55
  • 464
  • 673
Jatin
  • 819
  • 8
  • 16
  • See the Wikipedia articles on [Self-Hosting](http://en.wikipedia.org/wiki/Self-hosting) and [Compiler Bootstrapping](http://en.wikipedia.org/wiki/Bootstrapping_%28compilers%29). – Robert Harvey Nov 28 '12 at 06:36
  • See also http://programmers.stackexchange.com/questions/167277 – Robert Harvey Nov 28 '12 at 06:39
  • They only provide an overview, I aware of bootstrapping and self-hosting program but want to have a look at some simple examples – Jatin Nov 28 '12 at 06:46
  • http://www.rano.org/bcompiler.html – Robert Harvey Nov 28 '12 at 06:51
  • see also: [Why are self-hosting compilers considered a rite of passage for new languages?](http://programmers.stackexchange.com/questions/263651/why-are-self-hosting-compilers-considered-a-rite-of-passage-for-new-languages) – gnat Jan 04 '16 at 10:08

4 Answers4

9

Step 1. Write your compiler in a different language.
Step 2. Compile the code from Step 1.
Step 3. Write your compiler in the same language.
Step 4. Compile the code from step 3.

Repeat Steps 3-4 for any further updates to your compiler.

Note that steps 1 and 3 may happen simultaneously or in a different order.

I admit that this is very much an oversimplification.

Brian
  • 4,480
  • 1
  • 22
  • 37
  • 1
    @RobertHarvey: In step 4, I did not specify which compiler to use on purpose so that "Repeat Steps 3-4" will make more sense. – Brian Nov 28 '12 at 06:55
  • Won't step 4 only work if you use a compiler that recognizes the new language? – Robert Harvey Nov 28 '12 at 06:56
  • @RobertHarvey: Yes, but I deliberately didn't specify it so that "Repeat Steps 3-4" did not mean "use the compiler from Step 1", as this would defeat the purpose of the exercise. I assumed it went without saying that you should use the compiler from Step 1 on the first iteration (it's the only compiler you have) and the compiler generated from the previous iteration of Step 3 on each additional iteration. – Brian Nov 28 '12 at 07:01
8

Lets say you are writing a new programming language named "XYZ". The first step is to write a compiler for this language. Since this new language doesn't exist yet, you write the compiler in... lets say java. We'll call it jxyz. Doing this process is a typical class in college.

Now you have a java program (jxyz) that takes an XYZ source file and produces an executable. It is then possible to undertake writing a compiler for XYZ in XYZ that is complied by jxyz.

At this point, you have a complier for XYZ that was complied with jxyz. Well call this program 'xyzFromJ'.

'xyzFromJ' should be able to take itself as an input and and compile itself completely removing anything created by jxyz from the dependencies and definition of the language. From this point on, any changes to the XYZ language can be done on the compiler written in XYZ and compiled using itself.

5

Unlike some other answers that suggest writing the compiler in some other language (presumable one that runs on the same machine you are targeting), it is also possible (actually preferable) to write the compiler in the target language from the beginning.

Say you want to write a compiler for C for an ARM processor. You already have a C compiler that runs on Windows (Intel architecture). You write your new compiler in C, both the front end (lexical analysis and parsing), and back end (code generation). The back end of course is written to generate code for the ARM, not Intel.

You then compile the source for the new compiler with the existing compiler. Once you are satisfied the compiler is generating valid code for the ARM, you take the compiled ARM code and run it on the target ARM architecture. You now have the source for the new compiler, that can be modified and fed into it's executable and generate a new version.

This is process is called cross-compiling. The advantage is that you only have to write the compiler once.

tcrosley
  • 9,541
  • 1
  • 25
  • 41
3

from another answer:

Step 1. Write your compiler in a different language.
Step 2. Compile the code from Step 1.
Step 3. Write your compiler in the same language.
Step 4. Compile the code from step 3.

Steps 1 and 2 are only needed for the "first ever" compiler for a new language. There are lots of variations on steps 3 and 4, so for example you can write the code generator for a new architecture, cross compile, then run the same compiler on a new machine. Or you can extend your compiler to handle new syntax and constructs without using any of the extensions, then rewrite the compiler to use the extensions and compile itself.

ddyer
  • 4,060
  • 15
  • 18