173

Ken Thompson Hack (1984)

Ken Thompson outlined a method for corrupting a compiler binary (and other compiled software, like a login script on a *nix system) in 1984. I was curious to know if modern compilation has addressed this security flaw or not.

Short description:

Re-write compiler code to contain 2 flaws:

  • When compiling its own binary, the compiler must compile these flaws
  • When compiling some other preselected code (login function) it must compile some arbitrary backdoor

Thus, the compiler works normally - when it compiles a login script or similar, it can create a security backdoor, and when it compiles newer versions of itself in the future, it retains the previous flaws - and the flaws will only exist in the compiler binary so are extremely difficult to detect.

Questions:

I could not find any answers to these on the web:

  • How does this relate to just-in-time compilation?
  • Are functions like the program handling logins on a *nix system compiled when they are run?
  • Is this still a valid threat, or have there been developments in the security of compilation since 1984 that prevent this from being a significant issue?
  • Does this affect all languages?

Why do I want to know?

I came across this while doing some homework, and it seemed interesting but I lack the background to understand in a concrete way whether this is a current issue, or a solved issue.

Reference Material

svick
  • 9,999
  • 1
  • 37
  • 51
  • 7
    The Diverse Double Compiling strategy is a reasonably reliable way of detecting the presence of a RoTT rigged compiler. – dmckee --- ex-moderator kitten Jan 26 '13 at 00:54
  • 3
    I imagine the NSA have put a lot of work into this sort of attack. – Paul M Sep 07 '13 at 23:57
  • 11
    This [has been mentioned on reddit](https://reddit.com/r/netsec/comments/3lefc6/). In 2009, there was [a virus infecting Delphi installation](https://nakedsecurity.sophos.com/2009/08/18/compileavirus/) and [compiling itself into any new executable](https://nakedsecurity.sophos.com/2009/08/19/w32induca-spread-delphi-software-houses/). Recently, it has been discovered [a malware distributed in pirated Xcode that compiles itself into iOS apps](http://researchcenter.paloaltonetworks.com/2015/09/novel-malware-xcodeghost-modifies-xcode-infects-apple-ios-apps-and-hits-app-store/). – Denilson Sá Maia Sep 18 '15 at 11:58
  • 2
    The essense of the talk is to say that at one point you end up in having to trust the work of other people. These days we trust the CPU vendors, as modern CPU's are tiny computers all by themselves instead of being raw hardware as in the old days, and they can choose to do anything they like when running your code. Incidentially this is also something that e.g. the EU might want to tighten on if they decide their secrets should only be run on controlled hardware. – Thorbjørn Ravn Andersen Feb 19 '20 at 11:29

11 Answers11

119

This hack has to be understood in context. It was published at a time and in a culture where Unix running on all kinds of different hardware was the dominant system.

What made the attack so scary was that the C compiler was the central piece of software for these systems. Almost everything in the system went through the compiler when it was first installed (binary distributions were rare due to the heterogenous hardware). Everyone compiled stuff all the time. People regularly inspected source code (they often had to make adjustments to get it to compile at all), so having the compiler inject backdoors seemed to be a kind of "perfect crime" scenario where you could not be caught.

Nowadays, hardware is much more compatible and compilers therefore have a much smaller role in the day-to-day operation of a system. A compromised compiler is not the most scary scenario anymore - rootkits and a compromised BIOS are even harder to detect and get rid of.

Michael Borgwardt
  • 51,037
  • 13
  • 124
  • 176
  • 29
    Or, since most people don't compile anything from source (say, on Windows) your average trojan will suffice :) (I'm agreeing that a compromised compiler is way overkill) – Andres F. Jan 25 '13 at 22:18
  • 2
    The modern open source software tradition is still hugely reliant on source distribution and gcc, so within that ecosystem I think the theoretical vulnerability is still there. Consider what would have happened if e.g. the first Red Hat Linux distro had this back door. – Russell Borogove Jan 26 '13 at 00:24
  • 1
    This was one of the motivations for git's security model. Somebody did compromise one of the Linux kernel repositories, inserted some malicous code and modified the logs to hide the change. – Martin Beckett Jan 26 '13 at 03:45
  • 1
    @RussellBorogove - So according to you, code available for *anyone* to see and investigate has a higher chance of a backdoor going unnoticed than a non-free proprietary binary-only compiler? – ArjunShankar Jan 26 '13 at 04:08
  • 18
    @ArjunShankar: A non-free proprietary binary-only compiler does not need, and cannot have, *this* backdoor. *This* backdoor only applies to compilers that you compile yourself from source-code. – ruakh Jan 26 '13 at 04:48
  • @ruakh - Thank you for the emphasis on "*this*". – ArjunShankar Jan 26 '13 at 10:35
  • 12
    Except for the desktop, Unix, and all its variants, is still the dominant operating system. – Rob Jan 26 '13 at 12:44
  • 10
    @ruakh: maybe I do not understand your emphasis on 'this', but I happen to disagree. If this backdoor has been introduced in the company that happens to own the non-free, proprietary compiler and uses this compiler to compile new versions of the same compiler, this backdoor would have a much worse impact than in the original scenario. You'll only need one attack vector to infect all. – orithena Jan 26 '13 at 15:14
  • @Rob: yes, but nowadays the hardware is much more homogenous, and Software is more commonly distributed in binary Form. – Michael Borgwardt Jan 26 '13 at 15:25
  • @ArjunShankar - I didn't say anything whatsoever about probability; it's obviously quite low. The whole concept of the backdoor in question is that it *doesn't appear in the source*, so I don't see that "many eyes on the source" is even relevant. – Russell Borogove Jan 26 '13 at 18:45
  • 1
    @RussellBorogove - I misunderstood your original comment. You are right about *this* back-door. Cheers! – ArjunShankar Jan 26 '13 at 19:00
  • 9
    Imagine someone compromises a ubuntu build server and replaces the compiler without changing any source. It might take a little time for this to be found out, and by that time ubuntu images would be pushed out to people all over with the compromised compiler built into them (along with compromised login assemblies or what have you). I think this is still a perfectly valid concern. – Jimmy Hoffa Jan 28 '13 at 15:11
  • 3
    @Jimmy Hoffa: In that scenario, the compiler isn't necessarily the most interesting target, you could just as well change some other component involved in producing the images to insert precompiled backdoors. What made this hack interesting is really just that it disproves the idea "if I compiler everything from source and the source is inspected for backdoors beforehand, I'm safe". – Michael Borgwardt Jan 28 '13 at 15:25
  • 2
    @MichaelBorgwardt yes there would be other ways to slip something in if you compromised such a server, but this is a valid one and I'm merely disagreeing with this answer in that in this way, it is actually quite a valid and dangerous attack. Granted so are others in the same way; though the kernel repository was compromised at one point and someone placed malicious code into the kernel, this would actually be a more effective attack as one wouldn't simply find malicious code in their source, so in summation: More effective than current attacks that have happened, so yes, still a threat. – Jimmy Hoffa Jan 28 '13 at 15:34
  • I've found this talk is far more important now, in the context of things like NODEJS. Where people npm install thousands of modules that they have NO idea what they do, in order to get a project done. Not only are people trusting someones module, but every other person whose module they included. – Rahly Aug 11 '16 at 07:50
  • 1
    @Rahly: I agree that that is a serious threat, but it doesn't really have anything to do with this question and answer. – Michael Borgwardt Aug 11 '16 at 08:12
  • Sure it does.. a lot of the node projects do code generation... aka transpiling... but this is similar in the fact that projects like babel or typescript could inject rogue Javascript into your application and most would never know... it's not compiling in the traditional sense... but can have a far wider reaching aspect as it would run on everyone's browser – Rahly Aug 11 '16 at 22:12
  • 1
    @Rahly: No, it has absolutely nothing to do with this question and answer, and it does not have "a far wider reaching aspect" at all. Sure, all that automatic dependency resolution tends to make people gloss over just how many other people's code they are trusting, but you still get all the code and can theoretically inspect it before you ever run it. The "special" thing about Ken Thompson's idea is that it leads to a scenario where you can recompile your entire system from source after inspecting all the source code with perfect understanding, and *still* not get rid of the threat. – Michael Borgwardt Aug 12 '16 at 09:04
  • Except it was fairly easy to trick the compiler to not inject the code.. and in general it is the same because you still have access to the machine generated code... which is not much different than obfuscated code... you could see what the machine code was doing but people just didn't look... code generating code... no MATTER the platform will have this issue – Rahly Aug 12 '16 at 09:12
  • 1
    @Rahly: yes, it was always a rather theoretical threat - but in theory it could also have changed every tool you might use to inspect the source code to lie to you - and rootskits actually do something like that. The core point is this: once your system is compromised, you cannot reliably use that system to "uncompromise" itself. – Michael Borgwardt Aug 12 '16 at 09:47
  • I thought the core point was unless you know 100% all of your tools... there is no way for you to even know if your system is compromised – Rahly Aug 12 '16 at 09:52
  • 1
    @Rahly - not here, but it is an entirely valid point. – Michael Borgwardt Aug 12 '16 at 09:55
94

The purpose of that speech wasn't to highlight a vulnerability that needs to be addressed, or even to propose a theoretical vulnerability that we need to be aware of.

The purpose was that, when it comes to security, we'd like to not have to trust anyone, but unfortunately that's impossible. You always have to trust someone (hence the title: "Reflections On Trusting Trust")


Even if you're the paranoid type who encrypts his desktop hard-drive and refuses to run any software you didn't compile yourself, you still need to trust your operating system. And even if you compile the operating system yourself, you still need to trust the compiler you used. And even if you compile your own compiler, you still need to trust that compiler! And that's not even mentioning the hardware manufacturers!

You simply can't get away with trusting no one. That's the point he was trying to get across.

  • 2
    If one has an open-source compiler whose behavior does not depend upon any implementation-defined or unspecified behavior, compiles it using a variety of independently-developed compilers (trusted or not), and then compiles one program using all the different compiled versions of that open-source one, every compiler should produce exactly the same output. If they do, that would suggest that the only way a trojan could be in one would be if it was identically in all. That would seem rather unlikely. One of my peeves with much of .net, though, ... – supercat Jan 26 '13 at 19:07
  • ...is that many of the compilers generally produce different output every time they are run, making comparisons of compiled code essentially impossible. – supercat Jan 26 '13 at 19:09
  • 16
    @supercat: You seem to be missing the point. You're saying that the hack Ken Thompson presented can be worked around. I am saying that the particular hack he chose doesn't matter; it was just an example, to demonstrate his larger point that you must always trust *someone*. That's why this question is somewhat meaningless - **it completely misses the forest for the trees.** – BlueRaja - Danny Pflughoeft Jan 26 '13 at 19:32
  • If one digs a variety of antique computers out of one's cellar along with C compilers for them, and if one feeds into those computers a copy of an open-source compiler package which one has inspected personally (and perhaps, for good measure, tweaked slightly), and if the two-step process described above yields the same output on all computers, whom would one have to trust, really, aside from oneself, to be sure the set of executables that all computers produced identically, was "clean"? – supercat Jan 26 '13 at 21:01
  • 13
    @supercat: Its highly unlikely that different compilers would produce the same bytecode for any non-trivial program due to different design decisions, optimizations etc. This raises the question - how would you even know that the binaries are identical? – ankit Jan 27 '13 at 21:01
  • 1
    @AnkitSoni: My answer goes into more detail. Feeding a suitably-written open-source compiler/linker through different compilers should yield different executables that will *behave identically*. If the executables do in fact behave identically, they will produce the same output if the code for the open-source compiler/linker is passed through them. To compare the files, one could copy them to a floppy disk and use an antique computer to compare them. – supercat Jan 27 '13 at 21:19
  • 1
    @AnkitSoni: If one was worried that the antique computer might claim the files were identical when they really weren't, one could program each version of the file into a different serial flash chip, and wire up a board with a couple buttons, an oscillator, and a few simple logic gates to check whether the chips contained identical contents. There's no way an adversary could insert into a quad-NAND chip a circuit that would make it mostly behave normally but not have it reveal that programs which should match, don't. – supercat Jan 27 '13 at 21:27
  • 2
    @AnkitSoni: Not that I'm really suggesting anyone should do that, but rather to suggest that one doesn't really have to rely upon the honesty of anyone other than oneself if one doesn't want to. Using antique PCs would be taking things to a low enough level for my taste, but if someone didn't trust those, one could build stuff out of gates or even transistors [or, for that matter, home-built vacuum tubes and core memory]. Once one gets low-enough level, there's really noplace for a virus to hide. – supercat Jan 27 '13 at 22:29
  • I only need to trust my hardware. The first thing I *always* do when I get a computer is do the following: Write an assembler. Build it. Write a C compiler in assembly. Build it. Use my C compiler. Build my OS. – Thomas Eding Jan 31 '13 at 19:19
  • 2
    Wouldn't some of this conversation just mean that for the things you tested, the binaries/hardware behaved as expected? There could still be something in it you *didn't* test for and are unaware of. – Bart Silverstrim Feb 01 '13 at 16:04
55

No

The attack, as originally described, was never a threat. While a compiler could theoretically do this, actually pulling off the attack would require programming the compiler to

  • Recognize when the source code being compiled is of a compiler, and
  • Figure out how to modify arbitrary source code to insert the hack into it.

This entails figuring out how the compiler works from its source code, in order that it can modify it without breakage.

For instance, imagine that the linking format stores the data lengths or offset of the compiled machine code somewhere in the executable. The compiler would have to figure out for itself which of these need to be updated, and where, when inserting the exploit payload. Subsequent versions of the compiler (innocuous version) can arbitrarily change this format, so the exploit code would effectively need to understand these concepts.

This is high-level self-directed programming, a hard AI problem (last I checked, the state of the art was generating code that is practically determined by its types). Look: few humans can even do this; you would have to learn the programming language and understand the code-base first.

Even if the AI problem is solved, people would notice if compiling their tiny compiler results in a binary with a huge AI library linked into it.

Analogous attack: bootstrapping trust

However, a generalization of the attack is relevant. The basic issue is that your chain of trust has to start somewhere, and in many domains its origin could subvert the entire chain in a hard-to-detect way.

An example that could easily be pulled off in real life

Your operating system, say Ubuntu Linux, ensures security (integrity) of updates by checking downloaded update packages against the repository's signing key (using public-key cryptography). But this only guarantees authenticity of the updates if you can prove that the signing key is owned by a legitimate source.

Where did you get the signing key? When you first downloaded the operating system distribution.

You have to trust that the source of your chain of trust, this signing key, isn't evil.

Anyone that can MITM the Internet connection between you and the Ubuntu download server—this could be your ISP, a government that controls Internet access (e.g. China), or Ubuntu's hosting provider—could have hijacked this process:

  • Detect that you're downloading the Ubuntu CD image. This is simple: see that the request is going to any of the (publicly-listed) Ubuntu mirrors and asks for the filename of the ISO image.
  • Serve the request from their own server, giving you a CD image containing the attacker's public key and repository location instead of Ubuntu's.

Thenceforth, you will get your updates securely from the attacker's server. Updates run as root, so the attacker has full control.

You can prevent the attack by making sure the original is authentic. But this requires that you validate the downloaded CD image using a hash (few people actually do this)—and the hash must itself be downloaded securely, e.g. over HTTPS. And if your attacker can add a certificate on your computer (common in a corporate environment) or controls a certificate authority (e.g. China), even HTTPS provides no protection.

Mechanical snail
  • 902
  • 5
  • 12
  • 51
    This is false. The compiler only has to determine when it is compiling a very specific source file from its own source code with very specific contents, not when it is compiling *any* compiler whatsoever!!! – Kaz Jan 25 '13 at 23:45
  • 16
    @Kaz -- At some point, aboveboard modifications to the compiler or login program might get to the point where they defeat the backdoor's compiler-recognizer/login-recognizer, and subsequent iterations would lose the backdoor. This is analogous to a random biological mutation granting immunity to certain diseases. – Russell Borogove Jan 26 '13 at 00:27
  • 12
    The first half of your answer has the problem that Kaz describes, but the second half is so good that I'm +1'ing anyway! – ruakh Jan 26 '13 at 01:03
  • 7
    An evil compiler that only recognizes it's very own source is easy to build, but relatively worthless in practice - few people who already have a binary of this compiler would use it to recreate said binary. For the attack to be successful for a longer period, the compiler would need more intelligence, to patch newer verdions of its own source, thus running into the problems described in the snswer. – user281377 Jan 26 '13 at 21:34
  • 6
    A recognizer for a specific compiler could be quite general, and unlikely to break in the face of new version. Take for instance gcc - many lines of code in gcc are very old, and haven't changed much. Simple things like the name almost never change. Before the recognition goes awry, it's likely the injected code would. And in reality, both of those problems are largely theoretical - in practice a malware author would have no trouble keeping up to date with the (slow) pace of compiler development. – Eamon Nerbonne Jan 27 '13 at 15:15
  • In fact, with a bit of trivial heuristics, you could probably get quite close to a generalized compiler detector (or rather, linker detector) for a particular platform. Such software is likely to contains several magic strings related to the way the OS loads and recognizes executables. Since this is a compiler, you're likely to have lots of infrastructure to make this detection easier too - e.g. reachability analysis and a well-annotated syntax tree, so you might even be able guess which system write call is the receiver of those magic strings. – Eamon Nerbonne Jan 27 '13 at 15:21
  • @Kaz's point is dead true, more realistically one could imagine your origination point of Ubuntu having it's build server compromised by someone who alters their compiler to do such a thing so that the images it pushes out have been compiled by such a compiler, and thus have a compiler of such compromise built into them. This is a perfectly reasonable attack vector for a hacker. I would say in this way Ken Thompson's original hack actualls is still viable. Imagine if someone snuck this into the ubuntu build server what the effect would be. Would get caught soon, but not soon enough. – Jimmy Hoffa Jan 28 '13 at 15:07
  • @RussellBorogove Imagine if the backdoor also contacted (or made an attempt to contact) a central server once a month or so for updated detection/injection heuristics. Maintaining such a backdoor would then still be both mutation-resistant and significantly easier than the hard AI problem. – Daniel Wagner Jun 09 '13 at 02:12
  • @DanielWagner - but much easier to detect when your soft firewall alerts you that your compiler is calling the mothership ;) – Russell Borogove Jun 10 '13 at 19:47
  • Not true, the hacker don't need to understand how the compiler work, it can just for instance prepend some source code at the beginnin of some file before compiling it. Just string processing before the compiler starts to compile – CoffeDeveloper Apr 21 '15 at 22:07
  • @DarioOO If the compiler prepends a fixed source code string at the beginning of each build, then that code will result in a `fixed signature` in binary executable which is easier to detect. – Prahlad Yeri Jun 08 '15 at 17:40
  • You could add random variation, it takes no time replacing (1) with (arcos(cos(1)), or (maxfromList(-4,-6,1)). Easier to detect random variation if each time output a different executable, but again, you could base random seed on source code check sum ^^. Everything is detectable later or sooner, but since detectors are code, you can easily cheat them (also reason why detecting viruses is becoming even harder).. Who actually spend time detecting? Luckily software engineer teachs that such a complex virus system has high chance to fail ;) (unless people put big money in that) – CoffeDeveloper Jun 08 '15 at 23:20
  • 2
    I suppose the big fat "No" has been disproven. Today 344 apps have been exposed on the Apple App Store to have been infected with XcodeGhost, malware injected through a modified compiler. In all, 500 million people are estimated to be infected. [More details here.](https://plus.google.com/+PaulLammertsma/posts/UfXU3w7GKaa) – Paul Lammertsma Sep 21 '15 at 11:52
26

First, my favorite writeup of this hack is called Strange Loops.

This particular hack could certainly (*) be done today in any of the major open source OS projects, particularly Linux, *BSD, and the like. I would expect it would work almost identically. For example, you download a copy of FreeBSD that has an exploited compiler to modify openssh. From then on, every time you upgrade openssh or the compiler by source, you will continue the problem. Assuming the attacker has exploited the system used to package FreeBSD in the first place (likely, since the image itself is corrupted, or the attacker is in fact the packager), then every time that system rebuilds FreeBSD binaries, it will reinject the problem. There are lots of ways for this attack to fail, but they're not fundamentally different than how Ken's attack could have failed (**). The world really hasn't changed that much.

Of course, similar attacks could just as easily (or more easily) be injected by their owners into systems like Java, the iOS SDK, Windows, or any other system. Certain kinds of security flaws can even be engineered into the hardware (particularly weakening random number generation).

(*) But by "certainly" I mean "in pricinciple." Should you expect that this kind of hole exists in any particular system? No. I would consider it quite unlikely for various practical reasons. Over time, as the code changes and changes, the likelihood that this kind of hack would cause strange bugs increases. And that raises the likelihood that it would be discovered. Less ingenious backdoors would require conspiracies to maintain. Of course we know for a fact that "lawful intercept" backdoors have been installed in various telecommunications and networking systems, so in many cases this kind of elaborate hack is unnecessary. The hack is installed overtly.

So always, defense in depth.

(**) Assuming Ken's attack ever actually existed. He just discussed how it could be done. He didn't say he actually did it as far as I know.

Rob Napier
  • 451
  • 4
  • 5
  • Regarding your second footnote, Ken said ["build and not distributed."](https://skeptics.stackexchange.com/a/6399/35634) – 8bittree May 21 '19 at 20:34
18

Does this affect all languages?

This attack primarily affects languages that are self-hosting. That is languages where the compiler is written in the language itself. C, Squeak Smalltalk, and the PyPy Python interpreter would be affected by this. Perl, JavaScript, and the CPython Python interpreter would not.

How does this relate to just-in-time compilation?

Not very much. It is the self-hosting nature of the compiler that allows the hack to be hidden. I don't know of any self-hosting JIT compilers. (Maybe LLVM?)

Are functions like the program handling logins on a *nix system compiled when they are run?

Not usually. But the question isn't when it is compiled, but by which compiler. If the login program is compiled by a tainted compiler, it will be tainted. If it is compiled by a clean compiler, it will be clean.

Is this still a valid threat, or have there been developments in the security of compilation since 1984 that prevent this from being a significant issue?

This is still a theoretical threat, but is not very likely.

One thing you could do to mitigate it is to use multiple compilers. For example, an LLVM Compiler which is, itself compiled by GCC will not pass along a back door. Similarly, a GCC compiled by LLVM will not pass along a back door. So, if you are worried about this sort of attack, then you could compile your compiler with another breed of compiler. That means that the evil hacker (at your OS vendor?) Will have to taint both compilers to recognize each other; A much more difficult problem.

Sean McMillan
  • 5,075
  • 25
  • 26
  • Your last paragraph isn't, strictly speaking, true. In theory, code could detect the compiler being compiled and output the back door appropriately. This is of course impractical in the real world, but there's nothing that inherently prevents it. But then, the original idea was not about real practical threats but rather a lesson in trust. – Gort the Robot Jan 25 '13 at 22:29
  • Fair point. After all, the hack carries along a backdoor for login, and a mod for the compiler, so it can carry a mod for another compiler too. But it becomes increasingly unlikely. – Sean McMillan Jan 30 '13 at 14:10
  • Just in time compilation could be a treat. If some code has some vulnerability only when a particular piece is JITcompiled it may go unnoticed. (just pure thoery) – CoffeDeveloper Apr 21 '15 at 22:10
15

There's a theoretical chance for this to happen. There is, however, a way of checking if a specific compiler (with available source code) has been compromised, through David A. Wheeler's Diverse double-compiling.

Basically, use both the suspected compiler and another independently developed compiler to compile the source of the suspect compiler. This gives you SCsc and SCT. Now, compile the suspect source using both of these binaries. If the resulting binaries are identical (with exception of a variety of things that may well legitimately vary, like assorted timestamps), the suspect compiler was not actually abusing trust.

Damian Yerrick
  • 309
  • 3
  • 10
Vatine
  • 4,251
  • 21
  • 20
  • Either that or the trusted compiler isn't as trustworthy as the user thought. But for two independent implementations of a language, the probability that they contain the same backdoor is negligible. – Damian Yerrick Sep 10 '17 at 16:06
  • Or the diff tool you're using to compare them was compromised also ;) – iCodeSometime Aug 15 '18 at 21:00
  • @kennycoc However, writing a "are these two files identical" comparison tool is not, all things considered, that difficult (as in, given a syscall reference, it should be doable in 2-16 hours in binary machine code). – Vatine Aug 30 '18 at 01:22
3

As a specific attack, it's as much of a threat as it ever was, which is pretty much no threat at all.

How does this relate to just-in-time compilation?

Not sure what you mean by that. Is a JITter immune to this? No. Is it more vulnerable? Not really. As a developer YOUR app is more vulnerable simply because you can't validate that it's not been done. Note that your as yet undeveloped app is basically immune to this and all practical variations, you only have to worry about a compiler that is newer than your code.

Are functions like the program handling logins on a *nix system compiled when they are run?

That's not really relevant.

Is this still a valid threat, or have there been developments in the security of compilation since 1984 that prevent this from being a significant issue?

There is no real security of compilation, and can't be. That was really the point of his talk, that at some point you have to trust someone.

Does this affect all languages?

Yes. Fundamentally, at some time or another, your instructions have to be turned into something the computer execeutes, and that translation can be done incorrectly.

jmoreno
  • 10,640
  • 1
  • 31
  • 48
0

For all we know.. this is happening at this very moment:

enter image description here

It's hard to know for sure...

Dan
  • 326
  • 1
  • 8
-2

David Wheeler has a good article: http://www.dwheeler.com/trusting-trust/

Me, I'm more worried about hardware attacks. I think we need a totally VLSI design toolchain with FLOSS source code, that we can modify and compile ourselves, that lets us build a microprocessor that has no backdoors inserted by the tools. The tools should also let us understand the purpose of any transistor on the chip. Then we could pop open a sample of the finished chips and inspect them with a microscope, making sure they had the same circuitry that the tools said they were supposed to have.

paul
  • 21
  • 1
-3

Systems in which the end users have access to the source code are the ones for which you would have to hide this type of attack. Those would be open source systems in today's world. The problem is that although there is a dependence on a single compiler for all Linux systems, the attack would have to get onto the build servers for all of the major Linux distributions. Since those don't download the compiler binaries directly for each compiler release, the source for the attack would have had to be on their build servers in at least one previous release of the compiler. Either that or the very first version of the compiler that they downloaded as a binary would have to have been compromised.

  • 2
    Your answer scratches at the surface of the question, but doesn't really address what's being asked. –  Jan 29 '13 at 16:41
-4

If one has source code for a compiler/build system whose output should not depend on anything other than the content of the supplied source files, and if one has several other compilers and knows that they do not all contain the same compiler hack, one can make sure one gets an executable that depends upon nothing other than the source code.

Suppose one has source code for a compiler/linker package (say the Groucho Suite) written in such a way that its output will not depend upon any unspecified behaviors, nor on anything other than the content of the input source files, and one compiles/links that code on a variety of independently-produced compilers/linker packages (say the Harpo Suite, the Chico suite, and the Zeppo Suite), yielding a different set of exeuctables for each (call them G-Harpo, G-Chico, and G-Zeppo). It would not be unexpected for these executables to contain different sequences of instructions, but they should be functionally identical. Proving that they are functionally identical in all cases, however, would likely be an intractable problem.

Fortunately, such proof won't be necessary if one only uses the resulting executables for one single purpose: compiling the Groucho suite again. If one compilers the Groucho suite using using G-Harpo (yielding G-G-Harpo), G-Chico (G-G-Chico), and G-Zeppo (G-G-Zeppo), then all three resulting files, G-G-Harpo, G-G-Chico, and G-G-Zeppo, should all byte-for-byte identical. If the files match, that would imply that any "compiler virus" that exists in any of them must exist identically in all of them (since the all three files are byte-for-byte identical, there's no way their behaviors could differ in any way).

Depending upon the age and lineage of the other compilers, it may be possible to ensure that such a virus could not plausibly exist in them. For example, if one uses an antique Macintosh to feed a compiler that was written from scratch in 2007 through a version of MPW that was written in the 1980's, the 1980's compilers wouldn't know where to insert a virus in the 2007 compiler. It may be possible for a compiler today to do fancy enough code analysis to figure it out, but the level of computation required for such analysis would far exceed the level of computation required to simply compile the code, and could not very well have gone unnoticed in a marketplace where compilation speed was a major selling point.

I would posit that if one is working with compilation tools where the bytes in an executable file to be produced should not depend in any way upon anything other than the content of the submitted source files, it is possible to achieve reasonably good immunity from a Thompson-style virus. Unfortunately, for some reason, non-determinism in compilation seems to be regarded as normal in some environments. I recognize that on a multi-CPU system it may be possible for a compiler to run faster if it is allowed to have certain aspects of code generation vary depending upon which of two threads finishes a piece of work first.

On the other hand, I'm not sure I see any reason that compilers/linkers shouldn't provide a "canonical output" mode where the output depends only upon the source files and a "compilation date" which may be overridden by the user. Even if compiling code in such a mode took twice as long as normal compilation, I would suggest that there would be considerable value in being able to recreate any "release build", byte for byte, entirely from source materials, even if it meant that release builds would take longer than "normal builds".

supercat
  • 8,335
  • 22
  • 28
  • 2
    -1. I don't see how your answer addresses the core aspects of the question. –  Jan 29 '13 at 16:39
  • @GlenH7: Many older compilation tools would consistently produce bit-identical output when given bit-identical input [outside things like __TIME__, which could be tweaked to report an "official" compile time]. Using such tools, one could pretty well protect against compiler viruses. The fact that some popular development frameworks provide no way of "deterministically" compiling code means that techniques that could have protected against viruses in older tools cannot be effectively used with newer ones. – supercat Jan 29 '13 at 17:55
  • 1
    Have you tried this? 1. Lead with your thesis. 2. Use shorter paragraphs. 3. Be more explicit about the difference between "functionally identical" (the result of the first stage) and "bit identical" (the result of the second), possibly with a list of all compiler binaries produced and their relationships to one another. 4. Cite David A. Wheeler's DDC paper. – Damian Yerrick Feb 20 '15 at 22:15