47

Possible Duplicate:
How could the first C++ compiler be written in C++?

I know my question goes to the underground galaxy cave where languages are born and involves some lambda math and light-years of google-studying. But what kind of knowledge is necessary to create a language?

H_7
  • 559
  • 1
  • 4
  • 12
  • 24
    Possible duplicate: "Chickens come from eggs but eggs come from chickens, how is this possible?" :) – George Duckett Mar 26 '12 at 15:48
  • 1
    1st appear: *"the CREATOR"*, 2nd to get in: *"the egg."* 3rd and last *"the chicken"* . Learned from [this guys](http://www.wibit.net/), they are very funny! – H_7 Mar 26 '12 at 16:09
  • @Caleb I guess not since C++ is build upon C, but the answer for both questions looks the same. hmm... my question is more about C, before C++. – H_7 Mar 26 '12 at 16:16
  • 2
    Unless you are from the Star Wars universe (where parsecs are apparently units for measuring time and not distance), you could change that to "years of google studying" :-) – Francesco Mar 26 '12 at 16:35
  • 1
    @Francesco I am from Star Wars universe, nice to meet you. – H_7 Mar 26 '12 at 22:00
  • 1
    Vote to reopen, the best answer in the other question seems fairly C++ specific. C != C++. Also, the question here seems like it might be a tad more general. – Doug T. Mar 27 '12 at 21:45
  • @Doug T. Exact my point, since C came before C++. – H_7 Mar 28 '12 at 02:07

4 Answers4

85

Look up "bootstrapping".

Basically you start with a very minimal process/set of functions that can be used to compile the code that defines a slightly more functional compiler. This creates your next compiler which then can then be used to build code that can do even more. You repeat this process until you have a full blown compiler that can compile all the language features.

The other alternative is to write the first version of the compiler in a different language and then write the next version in your target language.

ChrisF
  • 38,878
  • 11
  • 125
  • 168
  • 4
    I've always been baffled by the idea of a compiler being able to compile itself, but that cleared it up quite a bit. Thanks! – Andy Hunt Mar 26 '12 at 15:07
  • 1
    The very first compiler may be written in another language too, and then rewritten as soon the new compiler has enough features. – marcus Mar 26 '12 at 15:10
  • @marcus - good point, I'll update the answer. – ChrisF Mar 26 '12 at 15:16
  • 4
    It would be really interesting to see a list of self-compiling languages, as well as the languages they were *originally* written in. – Plutor Mar 26 '12 at 15:23
  • I'd wager that the first C compiler was probably written in assembly. – Andy Mar 26 '12 at 15:40
  • There is a third alternative: for an extensible language, implement a minimal subset of its core in another language (it can even be an ad hoc interpreter), and then grow the language to its full power and use it for implementing its own compiler. – SK-logic Mar 26 '12 at 15:40
  • Particularly mind-bending are (mythical?) compilers that realize they are compiling themselves and inject instructions into the executable that are not actually in the source code (including logic for future self-propagation). The source code for this logic exists only in the seminal compiler, thereafter propagated only in binary form. A _compiler worm_, as it were. – mcmcc Mar 26 '12 at 15:42
  • 3
    @mcmcc That would be the scenario described in [Reflections on Trusting Trust](http://cm.bell-labs.com/who/ken/trust.html), written by [Ken Thompson](http://en.wikipedia.org/wiki/Ken_Thompson). Whether or not anyone's actually snuck code like that into a mainstream compiler is questionable. – Tacroy Mar 26 '12 at 15:53
  • @Tacroy: Thanks, that's exactly the reference I couldn't quite put my finger on. If anybody has attempted it (and I'm sure someone somewhere has), I doubt it ever made it out into the wild. – mcmcc Mar 26 '12 at 15:59
  • The question of whether Ken Thompson actually did it is addressed here: http://skeptics.stackexchange.com/questions/6386/was-the-c-compiler-trojan-horse-written-by-ken-thompson-ever-distributed – Oddthinking Mar 26 '12 at 16:31
  • _You can't trust code that you did not totally create yourself. (Especially code from companies that employ people like me.)_ Words from Ken speech. Scenario or reality, @Tacroy? – H_7 Mar 26 '12 at 22:09
  • 7
    When two compilers love each other very much... – Joe Mar 26 '12 at 22:41
  • As a physical analogue, say you want to make a hammer but you have no tools. One of the tools you need for making a hammer is a hammer. So maybe you start out using a rock as a crude hammer and you make a nicer hammer. With this nicer hammer, you then make the hammer you wanted in the first place. – Roger Dahl Dec 25 '14 at 17:09
  • When related to compilers, bootstrapping is also called [self hosting](https://en.wikipedia.org/wiki/History_of_compiler_construction#Self-hosting_compilers). – Roger Dahl Dec 25 '14 at 17:11
25

ChrisF's answer is excellent, but I wanted to add this example that always stuck by me after my computer science course on bootstrapping.

Suppose you have a basic C compiler that does not support escape codes for strings yet, and you wanted to add that. You could add a snippet of code similar to this:

if( str[i] == 0x5c ) {       // ASCII code for backslash
   switch( str[i+1] ) {
      case 'n': return 0x0a; // ASCII code for new line
      case 't': return 0x09; // ASCII code for tab
      // ...                 // more ASCII code for other escapes
      default: return str[i+1];
   }
}

After you have added this to the compiler, and generated a new compiler binary, you could then rewrite this into:

if( str[i] == '\\' ) { 
   switch( str[i+1] ) {
      case 'n': return '\n';
      case 't': return '\t';
      // ...
      default: return str[i+1];
   }
}

That would remove any knowledge about ASCII codes from the compiler source code, but the compiler would still magically generate the correct codes.

fishinear
  • 375
  • 2
  • 7
19

Bootstrapping is definitely the standard way to build a compiler today. But remember that you don't need a compiler or interpreter to write a program in a language. For instance, Christopher Strachey wrote a famous AI program that was able to play Checkers in CPL before there was a compiler for CPL. He had to translate the program to machine code "manually", which is tedious and error-prone, but not really difficult (that's why computers can do it so well).

nikie
  • 6,303
  • 4
  • 36
  • 39
10

I hope this is not out of topic, but I wanted to point out that, once you have one C compiler for one platform X, bootstrapping for other platforms can be done by using cross-compilation:

  • You have a C compiler c1 for architecture X that runs on architecture X.
  • You write a C compiler c2 for architecture Y, written in C.
  • You compile the compiler c2 on X using c1 and obtain the binary for compiler c2 that runs on X.
  • You use the binary for c2 that runs on X to compile itself and obtain a binary of c2 that will run on Y.

In other words, when you have the first egg, it is easy to make more eggs.

Giorgio
  • 19,486
  • 16
  • 84
  • 135