74

C is one of the most widely-used languages in the world. It accounts for a huge proportion of existing code and continues to be used for a vast amount of new code. It's beloved by its users, it's so widely ported that being able to run C is to many the informal definition of a platform, and is praised by its fans for being a "small" language with a relatively clean set of features.

So where are all the compilers?

On the desktop, there are (realistically) two: GCC and Clang. Thinking about it for a few seconds you'll probably remember Intel exists as well. There are a handful of others, far too obscure for the average person to name and almost universally not bothering to support a recent language version (or often even a well-defined language subset, just "a subset"). Half of the members of this list are historical footnotes; most of the rest are very specialized and still don't actually implement the full language. Very few actually seem to be open-source.

Scheme and Forth - other small languages that are beloved by their fans for it - probably have more compilers than actual users. Even something like SML has more "serious" implementations to choose between than C. Whereas the announcement of a new (unfinished) C compiler aiming at verification actually sees some pretty negative responses, and veteran implementations struggle to get enough contributors to even catch up to C99.

Why? Is implementing C so hard? It isn't C++. Do users simply have a very skewed idea about what complexity group it falls in (i.e. that it actually is closer to C++ than Scheme)?

  • 64
    MSVC still counts, as a C89 compiler at least. Probably more popular than Intel even. – Rufflewind Feb 19 '15 at 02:50
  • 23
    [Wikipedia](http://en.wikipedia.org/wiki/List_of_compilers#C_compilers) lists quite a few C compilers. They get *very* common when you find yourself in the embedded realm. –  Feb 19 '15 at 02:51
  • 114
    how many compilers do you need to compile your C code? – Bryan Chen Feb 19 '15 at 02:57
  • 11
    I can't keep all of C's stupid little idiosyncrasies and gotchas in my head. SML is a relatively simple language by comparison. Also, there's a ton of C code out there that may or may not be using those obscure language features or GCC extensions or relying on implementation-defined behavior or even undefined-behavior and the author *will* complain about how your compiler does things the standard says it's allowed to do but breaks their code. Meanwhile many of the SML compilers may be academic experiments to see what the language would be like if you added X feature. – Doval Feb 19 '15 at 03:51
  • 4
    If an excellent modern optimizing compiler exists for a target platform, why bother re-inventing the wheel? Also, lots of time and energy is being dedicated to inventing new languages ( Go, Scala, Clojure, Dart, Swift, etc. ) and writing compilers for them, and more energy is being dedicated to Java and the JVM. C by comparison is very mature. – Robert Munn Feb 19 '15 at 08:58
  • 3
    Writing a Scheme compiler/interpreter in your newly designed language is a simple way to test it (and its compiler). tiny spec, much good sample code with which to test. So there are as many Scheme implementations out there as there are languages, more or less. – itsbruce Feb 19 '15 at 10:32
  • 3
    What about [CodeWarrior](http://en.wikipedia.org/wiki/CodeWarrior) and [Borland C++](http://en.wikipedia.org/wiki/Borland_C%2B%2B)? – Peter Mortensen Feb 19 '15 at 17:44
  • 6
    @Rufflewind The biggest advantage of the MSVC Compiler is not the compiler but the debugger. IMHO, if the debugger wasn't that excellent, almost no C code would be compiled with MSVC anymore. But no debugger could keep up with the MSVC Debugger. It's outstanding. – eckes Feb 19 '15 at 18:21
  • 79
    The question is based upon a false premise. Analog Devices, armcc, Bruce's C Compiler, the Bare-C Cross Compiler, the Borland compiler, the clang compiler, the Cosmic C compiler, the CodeWarrior compiler, the dokto compiler, the Ericsson compiler, and I'm not even out of the first five letters of the alphabet yet. There is an *insanely large* number of C compilers. The question is "why are there so few C compilers, if we don't count these several dozens as real C compilers?" You have defined away the vast majority of C compilers as not interesting, which is why there are not very many of them. – Eric Lippert Feb 19 '15 at 18:45
  • 19
    "Why" questions are bad questions for this site at the best of times, and "why not?" questions are worse. If I were to meet you at a party and ask "so, why don't you race sailboats?" I think you'd rightly find it to be an odd question. You don't need to provide a justification for NOT engaging in a technically difficult, physically risky and very expensive hobby. Writing any non-trivial piece of software is expensive, difficult and risky and therefore requires an *enormous* motivator. A better question would be "why are there so many C compilers?" It is surprising that there is more than one. – Eric Lippert Feb 19 '15 at 18:53
  • 7
    @BryanChen Compiler monoculture leads to things like Ken Thompson's "trusting trust" virus. The more compilers in use, the easier it is to use David A. Wheeler's "diverse double-compiling" construction to [defeat "trusting trust"](http://programmers.stackexchange.com/questions/184874/is-ken-thompsons-compiler-hack-still-a-threat). – Damian Yerrick Feb 20 '15 at 03:42
  • In context I should probably mention that this question was actually inspired by me being *unable* to find an existing C implementation that suited a very unusual requirement (spent ages looking for a C interpreter - I have my reasons - that at least *tried* to be complete and compliant. Far as I can tell it doesn't exist. Yes I know this is a ridiculous edge case). –  Feb 20 '15 at 04:03
  • 5
    It's all a question of perception: For embedded devices (which take > 98% of the annual world-wide CPU production, about 50% still being 8-Bit MCUs or DSPs) there are hundreds of commercial C compilers. You probably have never heard about Greenhills or Tasking. However, both are dominant in the automotive industry. Given that even an ordinary car today contains 30–90 built-in microprocessors, there are probably more CPUs in the world running code from one of these compilers than, lets say, code generated by MSVC or clang. – Daniel Feb 20 '15 at 09:01
  • 40 something, free, is not few: [Free C/C++ Compilers and Interpreters](http://www.thefreecountry.com/compilers/cpp.shtml), [Free C/C++ Compilers for Handheld Devices, Micro-controllers, Embedded Systems and Calculators](http://www.thefreecountry.com/compilers/cpp-microcontrollers-pda.shtml) both at thefreecountry.com –  Feb 20 '15 at 09:43
  • @tepples: Any idea why my answer on that "trusting trust" question you linked got -4 votes? It suggest a procedure which should allow one to ensure that one is using a reliable compiler. – supercat Feb 20 '15 at 19:42
  • 6
    It's a bit like asking "why are there so few browsers, apart from Firefox and Chrome?" --- there are many more, but they are platform-specific, less widespread and/or with less features. It seems a common trend that in every area the community tends to converge on a very small number of big players. Another example is search engines, or Linux GUIs. – Federico Poloni Feb 20 '15 at 19:44
  • 2
    "There are a handful of others, far too obscure for the average person to name"...isn't that *every* SML compiler? – Paul Draper Feb 22 '15 at 02:40
  • I can't improve on Eric Lippert's answer regarding this question's false premise; this question is very naive in that it completely disregards (or is ignorant of) the 97% of CPUs (literally 10s of billions of them) that run in embedded systems. Not everything is a PC or a server... BTW the 97% and 10s of billions comes from a study that I read within the last year or two (Gartner?), I'm not just pulling the figures out of thin air, but I don't have the time to track it down right now, sorry... – Radian Feb 23 '15 at 20:27
  • @Radian proportion of different types of CPU doesn't really affect the question at all. Even assuming each one has its own compiler, those still aren't competitive with GCC/Clang in GCC/Clang's operational space (or more likely don't exist in it at all). –  Feb 23 '15 at 23:00
  • @Leushenko - I suppose if the question was worded, "Why are there so few C compilers ***on the desktop***" I would see the point. And I'd be very careful talking about being "competitive" (assuming you mean based on code generation) - cross-compilers that I use such as IAR or Keil routinely generate smaller & faster code than GCC (haven't benchmarked Clang as a cross-compiler). Do you have a lot of experience with cross compilers? I'm not sure I see your point, particularly when GCC (and perhaps Clang) /is/ used in the same (operational?) space as the many cross-compilers available. – Radian Feb 24 '15 at 02:36
  • You might enjoy OTCC - 2048 bytes - https://bellard.org/otcc/ – Thorbjørn Ravn Andersen Dec 16 '21 at 19:59

5 Answers5

160

Today, you need a real C compiler to be an optimizing compiler, notably because C is no longer a language close to the hardware, because current processors are incredibly complex (out-of-order, pipelined, superscalar, with complex caches & TLB, hence needing instruction scheduling, etc...). Today's x86 processors are not like i386 processors of the previous century, even if both are able to run the same machine code. See the C is not a low level language (Your computer is not a fast PDP-11) paper by David Chisnall.

Few people are using naive non-optimizing C compilers like tinycc or nwcc, since they produce code which is several times slower than what optimizing compilers can give.

Coding an optimizing compiler is difficult. Notice that both GCC and Clang are optimizing some "source language-neutral" code representation (Gimple for GCC, LLVM for Clang). The complexity of a good C compiler is not in the parsing phase!

In particular, making a C++ compiler is not much harder than making a C compiler: parsing C++ and transforming it into some internal code representation is complex (because the C++ specification is complex), but is well understood, but the optimization parts are even more complex (inside GCC: the middle-end optimizations, source-language and target-processor neutral, form the majority of the compiler, with the rest being balanced between front-ends for several languages and back-ends for several processors). Hence most optimizing C compilers are also able to compile some other languages, like C++, Fortran, D, ... The C++ specific parts of GCC are about 20% of the compiler...

Also, C (or C++) is so widely used that people expect their code to be compilable even when it does not exactly follow the official standards, which do not define precisely enough the semantics of the language (so each compiler may have its own interpretation of it). Look also into the CompCert proved C compiler, and the Frama-C static analyzer, which care about more formal semantics of C.

And optimizations are a long-tail phenomenon: implementing a few simple optimizations is easy, but they won't make a compiler competitive! You need to implement a lot of different optimizations, and to organize and combine them cleverly, to get a real-world compiler that is competitive. In other words, a real-world optimizing compiler has to be a complex piece of software. BTW, both GCC and Clang/LLVM have several internal specialized C/C++ code generators. And both are huge beasts (several millions of source lines of code, with a growth rate of several percent each year) with a large developer community (a few hundred persons, working mostly full-time, or at least half-time).

Notice that there is no (to the best of my knowledge) multi-threaded C compiler, even if some parts of a compiler could be run in parallel (e.g. intra-procedural optimization, register allocation, instruction scheduling... ). And parallel build with make -j is not always enough (especially with LTO).

Also, it is difficult to get funded on coding a C compiler from scratch, and such an effort needs to last several years. Finally, most C or C++ compilers are free software today (there is no longer a market for new proprietary compilers sold by startups) or at least are monopolistic commodities (like Microsoft Visual C++), and being a free software is nearly required for compilers (because they need contributions from many different organizations).

I'd be delighted to get funding to work on a C compiler from scratch as free software, but I am not naive enough to believe that is possible today!

Basile Starynkevitch
  • 32,434
  • 6
  • 84
  • 125
  • Maybe I'm overestimating the number of people who still enjoy working in C as an app-level language enough to settle for the kind of `-O1`-or-worse performance a smaller compiler like TCC can give. – Alex Celeste Feb 19 '15 at 08:08
  • 1
    But *tinycc* don't give at all `gcc -O1` performance (but `-O0` or worse) – Basile Starynkevitch Feb 19 '15 at 08:14
  • Could have sworn the docs claimed it did, but they don't... don't know where I got that idea now. – Alex Celeste Feb 19 '15 at 08:51
  • 14
    `(there is no more a market for proprietary compilers` Tell that to the Visual Studio team... – Mason Wheeler Feb 19 '15 at 10:15
  • 19
    Microsoft has a monopoly. I meant that small companies developing new C compilers won't sell a lot of them. Can you name a recent proprietary competitor to MSVC? – Basile Starynkevitch Feb 19 '15 at 10:23
  • 15
    There are many proprietary compilers in the HPC world. PGCC, NAG, and ICC are the most widely used. – Davidmh Feb 19 '15 at 13:09
  • 40
    @MasonWheeler: VS is given away for free nowadays (as in beer). The non-free versions add tooling, but the C compiler in VS2013 is the same in all versions. There just isn't a market, not even for them. – MSalters Feb 19 '15 at 16:06
  • 2
    Re no parallel/multithreaded C compilers, perhaps this is because most people break their projects into many independent files, which can be compiled in parallel with "make -j". – jamesqf Feb 19 '15 at 18:36
  • But for many translation units, their compilation last long enough to make parallel compilation worthwhile (even if extremely difficult to do) – Basile Starynkevitch Feb 19 '15 at 18:38
  • 1
    Optimizing C++ might actually be easier than optimizing C, since C++ contains mechanisms that are designed to be optimized. – Robert Harvey Feb 19 '15 at 20:11
  • @RobertHarvey: in practice both GCC & LLVM are optimizing a quite low level common internal representation, so I don't think that C++ helps... – Basile Starynkevitch Feb 20 '15 at 05:38
  • 3
    @BasileStarynkevitch: C++ actually helps in at least two ways: making more code (longer instruction sequences) available to the optimizer, and reducing false aliasing (`char*` aliases everything, `std::string` virtually nothing). The optimizer may be shared, but it works on different inputs. – MSalters Feb 20 '15 at 12:12
  • 3
    But both GCC & LLVM are operating on much lower representations, and they optimize likewise C++ & C (& Ada & Fortran, for GCC) code. I would on the contrary say that C++ requires more optimization (notably when compiling code using its STL) than C! – Basile Starynkevitch Feb 20 '15 at 12:17
  • 1
    @Basile What MSalters is saying is that C++ code will generate different intermediate language than what equivalent C code would. Sure everybody works on some form of SSA, etc. these days, but that doesn't change much. To use MSalter's example: If the compiler has to assume that two variables might alias it will have to reload one after writing to the other - if the c++ code allows the compiler to conclude that they can't, this will lead to different SSA form. – Voo Feb 20 '15 at 23:09
  • When you look into [GIMPLE](https://gcc.gnu.org/onlinedocs/gccint/GIMPLE.html) (or Gimple/SSA) most of the representation is common to C & C++, however, GIMPLE addad some instructions for C++ exceptions. – Basile Starynkevitch Feb 21 '15 at 06:53
  • 3
    _"In particular, making a C++ compiler is not much harder than making a C compiler"_ You must be joking. – Lightness Races in Orbit Jan 03 '16 at 20:40
  • 2
    No, most of GCC code is in middle-end, and that is both independent of source language and of target processor. More than a half of GCC is middle-end, a quarter is all the front-ends, and a quarter is the back-ends. C++ parsing is less than 20% of GCC. – Basile Starynkevitch Jan 03 '16 at 21:49
  • Modern x86 processors don’t need instruction scheduling created by the compiler, nor does POWER. – gnasher729 Oct 26 '17 at 07:12
  • 1
    They don't *need* it but they take *advantage* of it. Bad ordering of instruction would slow down the processor pipeline. – Basile Starynkevitch Oct 26 '17 at 07:49
  • 1
    Out-of-order execution does make instruction scheduling *much* less important than in the past (or present where in-order ARM is still a thing, and Knight's Corner Xeon Phi uses in-order cores, like Atom pre-silvermont. But mainstream x86 is all out-of-order, even low-power Silvermont has a small OoO window). Robust front-ends in modern CPUs (like Sandybridge-family) again make ordering to avoid fetch/decode problems not usually an issue. Moving independent instructions around without changing the pattern of dependencies usually has only a tiny effect, if any. – Peter Cordes Dec 02 '17 at 12:44
72

I would like to contest your underlying assumption that there are only a small number of C implementations.

I don't even know C, I don't use C, I am not a member of the C community, and yet, even I know far more than the few compilers you mentioned.

First and foremost, there is the compiler which probably completely dwarfs both GCC and Clang on the desktop: Microsoft Visual C. Despite the inroads that both OSX and Linux have been making on the desktop, and the marketshare that iOS and Android have "stolen" away from former traditional desktop users, Windows is still the dominant desktop OS, and the majority of Windows desktop C programs are probably compiled using Microsoft tools.

Traditionally, every OS vendor and every chip vendor had their own compilers. Microsoft, as an OS vendor, has Microsoft Visual C. IBM, as both an OS vendor and a chip vendor, has XLC (which is the default system compiler for AIX, and the compiler with which both AIX and i/OS are compiled). Intel has their own compiler. Sun/Oracle have their own compiler in Sun Studio.

Then, there are the high-performance compiler vendors like PathScale and The Portland Group, whose compilers (and OpenMP libraries) are used for numbercrunching.

Digital Mars is also still in business. I believe Walter Bright has the unique distinction of being the only person on the planet who managed to create a production-quality C++ compiler (mostly) by himself.

Last but not least we have all the proprietary compilers for embedded microcontrollers. IIRC, there are more microcontrollers sold every year than desktop, mobile, server, workstation, and mainframe CPUs have been sold in the entire history of computing combined. So, those are definitely not niche products.

An honorary mention goes out to TruffleC, a C interpreter(!) running on the JVM(!) written using the Truffle AST interpreter framework that is only 7% slower than GCC and Clang (whichever is fastest on any given particular benchmark) across the Computer Languages Benchmark Game, and faster than both on microbenchmarks. Using TruffleC, the Truffle team was able to get their version of JRuby+Truffle to execute Ruby C extensions faster than the actual C Ruby implementation!

So, these are 6 implementations in addition to the ones you listed which I can name off the top of my head, without even knowing anything about C.

Jörg W Mittag
  • 101,921
  • 24
  • 218
  • 318
  • 1
    Outside of Microsoft Visual C, most of the C compilers you are mentioning are rarely used. – Basile Starynkevitch Feb 19 '15 at 06:36
  • 7
    MSVC is the big C++ compiler, but for C it's hard to use and permanently stuck in C89; microcontroller compilers are usually target-specific, stuck in C89, and quirky; TruffleC doesn't appear to be available yet (but is interesting, thanks). Pathscale and Digital Mars seem more like the kind of counterexamples I was looking for though. – Alex Celeste Feb 19 '15 at 08:05
  • @Leushenko So what? I can write C89 code perfectly fine. You're missing out on some nice things, but if you want to be portable, you most likely still rely on C89 anyway, especially considereing the many servers out there still using some older version of GCC. – Mario Feb 19 '15 at 08:15
  • 8
    @Mario my meaning isn't that C89 is broken, but C89 is not the up-to-date form of the language; and that does mean fewer compilers *that are up-to-date* exist. – Alex Celeste Feb 19 '15 at 08:42
  • 8
    @Leushenko MSVC isn't *permanently* stuck in C89. There have been some discussions and more C99 features should be added. For starters, most of the C99 library is supported as of MSVC 2015 and a few language features too (mainly the things needed for C++11 though). – Morwenn Feb 19 '15 at 09:28
  • 1
    By contrast, I think there really are only two Java compilers (although I do recognise that compiling to bytecode is a completely different ball game) - C actually has more than other languages. – mjaggard Feb 19 '15 at 23:27
  • 5
    @Morwenn: Microsoft's policy appears to be that C99 solves no problems that C++ had not already solved, and that if you're doing system programming you should be using the C-like subset of C++ (anything that doesn't require the runtime or where you can't control where the compiler is going to put things - important if you need to ensure that code or data isn't paged out from states where paging is disabled). The only features from C99 will be things required in later C++ specs, and those which are no-brainers to implement. – Mike Dimmick Feb 20 '15 at 10:25
  • @Leushenko TruffleC itself is not available, but [Truffle is](http://lafo.ssw.uni-linz.ac.at/), and can be used for C if you write a parser that generates an AST. – nanofarad Feb 21 '15 at 17:29
  • 1
    @hexafraction: I believe TruffleC is part of the JRuby+Truffle implementation, which has already been merged into the main JRuby repository. – Jörg W Mittag Feb 22 '15 at 01:05
  • 1
    @hexafraction: Or maybe not, I couldn't find it at first glance. It would probably be possible to ask one of the Jruby+Truffle developers ([Chris Seaton](http://chrisseaton.com/rubytruffle/)) about it. Note also that there is an upcoming conference paper *Dynamically Composing Languages in a Modular Way: Supporting C Extensions for Dynamic Languages.*. – Jörg W Mittag Feb 22 '15 at 01:11
  • @hexafraction: "[Our C extension work is an early experiment, so the source code isn't available at the moment.](http://chrisseaton.com/rubytruffle/cext/)" Sorry. He does say, though, that they might mail you a preprint copy of the paper if you ask for it. – Jörg W Mittag Feb 22 '15 at 01:19
  • @JörgWMittag That's fine, no need to apologize. I'm actually currently [trying out Truffle independently of C](http://hextruffle.wordpress.com) but am far behind what is happening with Ruby at the moment. It does appear to be a promising technology in terms of actually executing C and other languages in the future. – nanofarad Feb 22 '15 at 01:26
  • 1
    @hexafraction: A couple of years ago, the Rubinius folks had this crazy idea of hooking up Clang to the Rubinius JIT (which is after all LLVM-based) in order to cross-language JIT compile C extensions at runtime together with the Ruby code that uses them (LLVM bitcode produced by Clang would be JITted and optimized together with LLVM bitcode produced by the Rubinius JIT, LLVM doesn't care where it comes from), but it was never more than a crazy idea. Well, turns out, it wasn't actually crazy after all! – Jörg W Mittag Feb 22 '15 at 01:30
  • 1
    Update: MSVC supports C11 and C17. By default, it's C89 plus extensions, some of which are in C99. – Adrian McCarthy Feb 21 '21 at 16:16
8

How many compilers do you need?

If they have different feature sets, you create a portability problem. If they're commoditised you choose either the "default" (GCC, Clang or VS). If you care about the last 5% performance you have a benchmark-off.

If you're doing programming language work recreationally or for research purposes, it's likely to be in a more modern language. Hence the proliferation of toy compilers for Scheme and ML. Although OCaml seems to be getting some traction for non-toy non-academic uses.

Note this varies a lot by language. Java has essentially the Sun/Oracle toolchain and the GNU one. Python has various compilers none of which are really respected compared to the standard interpreter. Rust and Go have exactly one implementation each. C# has Microsoft and Mono.

pjc50
  • 10,595
  • 1
  • 26
  • 29
  • 1
    It's obvious that there are more interesting reasons to develop an ML compiler... I just thought that the C community being probably three orders of magnitude bigger would balance that effect out. But you might be right, `1000 * 0` is still `0`. – Alex Celeste Feb 19 '15 at 10:46
  • Creating a new compiler is often linked with fragmentation of the community (either caused by or causing). For example, the egcs vs gcc maintainer split. Also, C source compatibility tends to be below 100%. – pjc50 Feb 19 '15 at 11:28
  • @pjc50: The way the standard is written effectively subdivides C into a number of disjoint dialects based upon things like the basic type of `int`, and will require different compilers to interpret the same source code in very different ways. – supercat Feb 19 '15 at 20:09
  • 5
    I believe, Go has two implementations (the `6g`/`8g`/… toolchain and gccgo). There also used to be a very interesting proprietary commercial implementation called erGo, which was a) a native Windows implementation of Go at a time when neither gccgo nor the original Go compiler worked very well on Windows, b) a company betting on Go, long before it even became 1.0, and c) the first implementation of Go written in Go (gccgo and 6g/8g are both written in C). Both the project and the company vanished, however, before they even got out of closed beta. – Jörg W Mittag Feb 20 '15 at 00:22
7

C/C++ is unique amongst compiled languages in that it has 3 major implementations of a common specification.

Going by the rule of dismissing anything that's not used much, every other compiled language has 0 to 1.

And I think javascript is the only reason you need to specify 'compiled'.

soru
  • 3,625
  • 23
  • 15
  • 2
    The label "C" is applied to a number of different languages; some define the code `uint16_t a=48000u; unsigned uint32_t b=(a*a)/2;` as assigning to `b` the value 8192. Some define it as assigning 1152000000. Most nowadays regard it as Undefined Behavior, and likely to store 3299483648 but make no promise in that regard. – supercat Feb 19 '15 at 20:05
  • 1
    @supercat: Ah, a good weird one with overflows and integer promotion rules. It hinges on using `2` or `2u` apparently. – Zan Lynx Feb 19 '15 at 23:46
  • 1
    @ZanLynx: I don't think there are any cases where 2 versus 2u *legitimately* matters; the only case I know where it might matter involves Undefined Behavior with both 2 and 2u. – supercat Feb 19 '15 at 23:54
  • 1
    @ZanLynx: If I were in a position to design a language standard, my goal would be to write it such that most code which would run consistently on a variety of C dialects would compile and run without modification, almost all code which compiled cleanly would run the same as it did on common C implementations, and any code which would compile under existing C implementations could easily be made to run under the new standard (if nothing else by enclosing it in a directive specifying that certain rules should be assumed for the enclosed code). – supercat Feb 19 '15 at 23:58
  • 3
    @supercat: how would you get undefined behavior from `/2u` ? Unsigned overflow is defined (as modulo 2^N for implementation-defined N) but division can't even overflow. – MSalters Feb 20 '15 at 12:17
  • 2
    The Undefined Behavior would come from the multiplication of values which would get promoted to signed `int`, but whose product would not fit in that type. Coercing that result to unsigned int would likely change the interpretation of the resulting value, but would not negate the Undefined Behavior from the preceding calculation. – supercat Feb 20 '15 at 16:34
5

So what is your target language?

SML compilers are often targeting C or something like LLVM (or as seen in your link, the JVM or JavaScript).

If you're compiling C, it's not because you're going to the JVM. You're going to something worse than C. Far worse. And then you get to duplicate that minor hell a bunch of times for all your target platforms.

And sure, C isn't C++, but I'd say that it's closer to C++ than Scheme. It does have its own subset of undefined behavior evilness (I'm looking at you size of built in types). And if you screw up that minutiae (or do it "correctly" but unexpectedly) then you have decades of existing code on vital systems that will tell you how terrible you are. If you screw up an SML compiler, it just won't work - and someone might notice. Someday.

Telastyn
  • 108,850
  • 29
  • 239
  • 365
  • SML/NJ and PolyML are both compiling to machine code... – Basile Starynkevitch Feb 19 '15 at 06:31
  • 2
    How is int size " Undefined Behavior" ? And why would be UB be a burden on compiler vendors anyway? The only real burden for compiler writers is that int widths are implementation defined, not unspecified, so you have to document what you did. – MSalters Feb 19 '15 at 16:11
  • @MSalters In reality, compiler writers for an established platform have the burden of matching what others that went before them did. Sometimes this is documented and standardized, sometimes not. It's easy to find what size an int is, but harder to find what is done with register values and where arguments are stored when calling a function (which may change depending on the argument types and return type of the function), struct layout rules, etc. – Random832 Feb 19 '15 at 18:07
  • @MSalters Most people expect `int` to be 32 or 64 bits but it can be as small as 16 bits. It's not hard at all to produce a number outside the range of `[−32767, +32767]` and `int` overflow is UB. There's also `char`/`short` getting promoted to `int` *or* `unsigned int` depending on whether `int` can represent every value of the original type, which can further trigger a conversion from `int` to `unsigned int` if the operands had different types and got converted differently, plus potentially another conversion when you assign the result to a variable. – Doval Feb 19 '15 at 18:36
  • @MSalters There's enough leeway in the size of the standard types and enough implicit conversions that I'd bet that for just about any non-trivial C program there's a choice of legal integer sizes that will cause it to do the wrong thing or cause undefined behavior. – Doval Feb 19 '15 at 18:37
  • That's *implementation-defined behaviour*, not undefined behaviour. – Morwenn Feb 20 '15 at 16:48
  • @Morwenn: For many C programs, Undefined Behavior. If `short` is half the length of long, for example, `(unsigned short)(-1)*(unsigned short)(-1)` will yield full-fledged Undefined Behavior, since the `unsigned short` values will get promoted to `signed int` values which, when multiplied, will exceed the range of the `signed int` type; a compiler is allowed to do anything it likes in response to such overflow, including launching a virus to target and destroy all copies of the painting "Dogs Playing Poker". – supercat Feb 20 '15 at 17:55