Why is it so difficult to make C less prone to buffer overflows?

Question

I'm doing a course in college, where one of the labs is to perform buffer overflow exploits on code they give us. This ranges from simple exploits like changing the return address for a function on a stack to return to a different function, all the way up to code that changes a programs register/memory state but then returns to the function that you called, meaning that the function you called is completely oblivious to the exploit.

I did some research into this, and these kinds of exploits are used pretty much everywhere even now, in things like running homebrew on the Wii, and the untethered jailbreak for iOS 4.3.1

My question is why is this problem so difficult to fix? It's obvious this is one major exploit used to hack hundreds of things, but seems like it would be pretty easy to fix by simply truncating any input past the allowed length, and simply sanitizing all input that you take.

EDIT: Another perspective that I'd like answers to consider - why do the creators of C not fix these issues by reimplementing the libraries?

score 35 · Accepted Answer · 2012-02-18T15:20:06.350

35

They did fix the libraries.

Any modern C standard library contains safer variants of strcpy, strcat, sprintf, and so on.

On C99 systems - which is most Unixes - you will find these with names like strncat and snprintf, the "n" indicating that it takes an argument that's the size of a buffer or a maximum number of elements to copy.

These functions can be used to handle many operations more securely, but in retrospect their usability is not great. For example some snprintf implementations don't guarantee the buffer is null-terminated. strncat takes a number of elements to copy, but many people mistakenly pass the size of the dest buffer.

On Windows, one often finds the strcat_s, sprintf_s, the "_s" suffix indicating "safe". These too have found their way into the C standard library in C11, and provide more control over what happens in the event of an overflow (truncation vs. assert for example).

Many vendors provide even more non-standard alternatives like asprintf in the GNU libc, which will allocate a buffer of the appropriate size automatically.

The idea that you can "just fix C" is a misunderstanding. Fixing C is not the problem - and has already been done. The problem is fixing decades of C code written by ignorant, tired, or hurried programmers, or code that has been ported from contexts where security didn't matter to contexts where security does. No changes to the standard library can fix this code, although migration to newer compilers and standard libraries can often help identify the problems automatically.

edited Feb 18 '12 at 15:20

answered Feb 18 '12 at 15:08

11

+1 for aiming the problem on the programmers, not the language. – Nicol Bolas Feb 19 '12 at 03:41
8

@Nicol: Saying "the problem [is] the programmers" is unfairly reductionist. The problem is that for years (decades) C made it easier to write unsafe code than safe code, particularly as our definition of "safe" evolved faster than any language standard, and that that code is still around. If you want to try to reduce that to a single noun, the problem is "1970-1999 libc", not "the programmers." – Feb 19 '12 at 16:14
1

It's still the responsibility of programmers to use the tools they have now to *fix* these problems. Take a half-day or so and do some grepping through the source code for these things. – Nicol Bolas Feb 19 '12 at 16:21
That's assuming you have programmers available (some of this code shipped 20 years ago and hasn't been read since - it does no good to assign "responsibility" to someone that doesn't exist!) and half-day of less important things to do. It's a systemic problem and needs a systemic solution, not half an idle day. – Feb 19 '12 at 16:31
1

@Nicol : Although trivial to detect a potential buffer overflow, it's often not trivial to be certain it is a real threat, and less trivial to work out what should happen if the buffer ever is overflown. Error handling is/was often not considered, it is not possible to "quickly" implement an improvement as you can change the behavior of a module in unexpected ways. We have just done this in a multi-million line legacy code base, and although a worth while exercise it cost a lot of time (and Money). – mattnz Feb 20 '12 at 03:53
4

@NicolBolas: Not sure what kind of shop *you* work in, but the last place I wrote C for production use required amending the detailed design doc, reviewing it, changing the code, amending the test plan, reviewing the test plan, performing a complete system test, reviewing the test results, then re-certifying the system at the customer's site. This is for a telecom system on a different continent written for a company that doesn't exist anymore. Last I knew, the source was in an RCS archive on a QIC tape that *should* still be readable, if you can find a suitable tape drive. – TMN Feb 20 '12 at 13:28

score 19 · Answer 2 · answered Feb 18 '12 at 20:34

It's not really inaccurate to say that C is actually "error-prone" by design. Aside from some grievous mistakes like gets, the C language can't really be any other way without losing the primary feature that draws people to C in the first place.

C was designed as a systems language to act as a sort of "portable assembly." A major feature of the C language is that unlike higher-level languages, C code often maps very closely to the actual machine code. In other words, ++i is usually just an inc instruction, and you can often get a general idea of what the processor will be doing at run-time by looking at the C code.

But adding in implicit bounds checking adds a lot of extra overhead - overhead which the programmer didn't ask for and might not want. This overhead goes way beyond the extra storage required to store the length of each array, or the extra instructions to check array bounds on every array access. What about pointer arithmetic? Or what if you have a function that takes in a pointer? The runtime environment has no way of knowing if that pointer falls within the bounds of a legitimately allocated memory block. In order to keep track of this, you'd need some serious runtime architecture that can check each pointer against a table of currently allocated memory blocks, at which point we're already getting into Java/C#-style managed runtime territory.

Honestly when people ask why C isn't "safe" it makes me wonder if they'd complain that assembly isn't "safe". — Ben Brocka, Feb 19 '12 at 01:45
The C language is a lot like portable assembly on a Digital Equipment Corporation PDP-11 machine. At the same time the Burroughs machines had array bounds checking in the CPU, so they were really easy to get programs right in. Array checks in hardware lives on in Rockwell Collins hardware ( mostly used in aviation.) — Tim Williscroft, Feb 20 '12 at 22:14

score 15 · Answer 3 · answered Feb 18 '12 at 09:30

15

I think the real problem isn't that these kinds of bugs are hard to fix, but that they're so easy to make: If you use strcpy, sprintf and friends in the (seemingly) simplest way that can work, then you've probably opened the door for a buffer overflow. And nobody will notice it until someone exploits it (unless you have very good code reviews). Now add the fact that there are many mediocre programmers and that they're under time pressure most of the time - and you have a recipe for code that is so riddled with buffer overflows that it'll be hard to fix them all simply because there's so many of them and they're hiding so well.

answered Feb 18 '12 at 09:30

nikie

6,303
4
36
39

3

You don't really need "very good code reviews". You just need ban sprintf, or re-#define sprintf to something that uses sizeof() and errors on the size of a pointer, or etc. You don't even need code reviews, you can do this kind of stuff with SCM commit hooks and grep. – Feb 18 '12 at 21:17
1

@JoeWreschnig: `sizeof(ptr)` is 4 or 8, generally. That's another C limitation: there's no way to determine the length of an array, given just the pointer to it. – MSalters Feb 20 '12 at 09:36
@MSalters: Yes, an array of int[1] or char[4] or whatever may be a false positive, but in practice you're never handling buffers of that size with those functions. (I'm not speaking theoretically here - I worked on a large C code base for four years that used this approach. I never hit the limitation of sprintfing into a char[4].) – Feb 20 '12 at 09:43
@JoeWreschnig - I don't fully understand. Since `sizeof(ptr)` will most likely not give us the correct array length, in what context is it used? – BlackJack Feb 21 '12 at 15:02
5

@BlackJack: Most programmers aren't stupid - if you force them to pass the size, they'll pass the right one. It's just most also won't pass the size unless forced to. You can write a macro that will return the length of an array if it's static or auto sized, but errors if given a pointer. Then you re#define sprintf to call snprintf with that macro giving the size. You now have a version of sprintf that works only on arrays with known sizes, and forces the programmer to call snprintf with a manually specified size otherwise. – Feb 21 '12 at 15:31
1

One simple example of such a macro would be `#define ARRAY_SIZE(a) (sizeof(a) / sizeof((a)[0]) / (sizeof(a) != sizeof(void *))` which will trigger a compile-time divide-by-zero. Another clever one I first saw in Chromium is `#define ARRAY_SIZE(a) (sizeof(a) / sizeof((a)[0]) / !(sizeof(a) % sizeof((a)[0]))` which trades the handful of false positives for some false negatives - unfortunately it's useless for char[]. You can use various compiler extensions to make it even more reliable, e.g. http://blogs.msdn.com/b/ce_base/archive/2007/05/08/so-long-to-dim-array-size-and.aspx. – Feb 21 '12 at 15:40
Ah okay, I see what you meant. I was familiar with the array size macro, but the context makes much more sense. Thanks! – BlackJack Feb 21 '12 at 15:44

score 7 · Answer 4 · answered Feb 18 '12 at 09:47

7

It's difficult to fix buffer overflows because C provides virtually no useful tools to address the problem. It's a fundamental language flaw that the native buffers provide no protection and it's virtually, if not completely, impossible to replace them with a superior product, like C++ did with std::vector and std::array, and it's hard even under debug mode to find buffer overflows.

answered Feb 18 '12 at 09:47

DeadMG

36,794
8
70
139

13

"Language flaw" is an awfully biased claim. That the libraries did not provide bounds-checking was a flaw; that the language did not is a conscious choice to avoid overhead. That choice is part of what allows higher-level constructs like `std::vector` to be implemented efficiently. And `vector::operator[]` makes the same choice for speed over safety. The safety in `vector` comes from making it easier to cart around the size, which is the same approach modern C libraries take. – Feb 18 '12 at 15:15
This answer is pretty accurate. The reason C is prone to buffer overflows is simply because C just doesn't provide any sort of dynamically-expanding buffers as part of the standard library. So the most convenient thing for a programmer to do is often just type `char buf[1024]` and do manual bounds checking. – Charles Salvia Feb 18 '12 at 18:55
Luckily, C++ doesn't provide the tools _virtual_ ly either...;) – leftaroundabout Feb 18 '12 at 18:57
1

@Charles: "C just doesn't provide any sort of dynamically-expanding buffers as part of the standard library." No, this has nothing to do with it. First, C does provide them via `realloc` (C99 also allows you to size stack arrays using a runtime-determined but constant size via any automatic variable, almost always preferrable to `char buf[1024]`). Second, the problem has nothing to do with expanding buffers, it has to do with whether or not buffers carry size with them and check that size when you access them. – Feb 18 '12 at 20:50
5

@Joe: The problem isn't so much that native arrays are broken. It's that they're impossible to replace. For a start, `vector::operator[]` does do bounds checking in debug mode- something native arrays can't do- and secondly, there's no way in C to swap out the native array type with one that *can* do bounds checking, because there's no templates and no operator overloading. In C++, if you want to move from `T[]` to `std::array`, you can practically just swap out a typedef. In C, there's no way to achieve that, and no way to write a class with equivalent functionality, let alone interface. – DeadMG Feb 18 '12 at 20:57
@DeadMG: The issue isn't bounds checking vs. not bounds checking. That we're talking about bounds is somewhat of a red herring. It's about whether the standard library makes writing safe code as easy as writing unsafe code. Practically speaking, the buffer overflow is never _in_ `snprintf`, it's because someone called it wrong or more likely called `sprintf` instead. With the new standard library functions in C99 and C11, and especially now with `_Generic`, the problem is not the language nor the standard library but existing code and bad education. – Feb 18 '12 at 21:14
C++ still makes it easier because you can't _not_ have your size handy when you have a std::vector, or even a standard array. But within the capabilities _C_ provides and should provide - and C still has a place even with C++ around - we could be writing code much safer than the average C code now, I think even better than average C++ now (which in turn is worse than what C++ we could be writing, etc.). – Feb 18 '12 at 21:15
Also, with struct hacks, you can totally make something that looks like an array but carts its size around, and then write functions that operate on that, and _that_ calls unchecked functions like snprintf with the size it is sure of. C's type system unfortunately means that arrays can also _look like it_, but you can usually put some magic key in front of it in debug mode to make that a near non-issue. – Feb 18 '12 at 21:21
3

@Joe: Except it can never be statically sized, and you can never make it generic. It's impossible to write any library in C which accomplishes the same role that `std::vector` and `std::array` do in C++. There would be no way to design and specify any library, not even a Standard one, that could do this. – DeadMG Feb 18 '12 at 22:56
1

I'm not sure what you mean by "it can never be statically sized." As I'd use that term, `std::vector` can also never be statically sized. As for generic, you can make it as generic as good C needs it to be - a small number of fundamental operations on void* (add, remove, resize) and everything else written specifically. If you're going to complain that C doesn't have C++-style generics, that's way outside the scope of safe buffer handling. – Feb 19 '12 at 16:19
1

@DeadMG: C++ implementations often offer bounds checking for `vector::operator[]` in debug mode, but that's not part of the Standard. Similarly, nothing in the Standard prevents somebody from producing a debug version of C that does bounds checking. – David Thornley Feb 20 '12 at 15:48
@JoeWreschnig: But `std::array` *is* statically sized. Between them, they offer everything native arrays do. Operating on `void*` is not at all the same, because you have to go through another indirection. It's not type-safe and slower. Not having generics is relevant because it means you can't reasonably introduce your own safe buffer handling, which is the solution found by a C-child language. – DeadMG Feb 20 '12 at 20:46
@DavidThornley: Except it's *vastly* easier to add a buffer bounds check in `operator[]` than it is to re-write the compiler to add the check. Especially since their buffers don't even know their size. How can you write a check for a buffer whose size you don't know? – DeadMG Feb 20 '12 at 21:05
@DeadMG: Having a handful of small, generic functions that take a void* does not mean using a void* in every context. It also doesn't require another indirection, although it does create aliasing preventing some (relatively obscure) optimizations. This isn't theoretical. Implementing generic arrays via the struct hack is commonplace and gets you 90% of the way to `std::vector` with none of the downsides. Growable buffers are _not that hard_, it's just that for so long C did not have any standard way to pass sizes around at all, and even when it did teaching materials didn't cover it. – Feb 20 '12 at 22:03
@Joe: But they *are* horrifically type unsafe, which is not a replacement for native arrays. and there is definitely no way to create them statically sized. – DeadMG Feb 21 '12 at 05:52
There are plenty of situations in which I will take a half dozen small, audited, straightforward, but type-unsafe functions over C++ and `std::vector`. Type-safety is not an end unto itself. – Feb 21 '12 at 15:50
@Joe: That's true. But it is a solid advantage, and `vector` has no other downsides. – DeadMG Feb 22 '12 at 07:31
The real issue isn't so much that C is flawed, as it is that C is really only suitable for a tiny subset of the things it gets used for, and is used for other purposes because no single better language has become sufficiently dominant to replace it in those roles for which it isn't suitable. – supercat Apr 28 '14 at 22:44

score 7 · Answer 5 · answered Feb 20 '12 at 17:01

The problem isn't with the C language.

IMO, the single major obstacle to overcome is that C is just plain taught badly. Decades of bad practice and wrong information have been institutionalized in reference manuals and lecture notes, poisoning the minds of each new generation of programmers from the beginning. Students are given a brief description of "easy" I/O functions like gets¹ or scanf and then left to their own devices. They aren't told where or how those tools can fail, or how to prevent those failures. They aren't told about using fgets and strtol/strtod because those are considered "advanced" tools. Then they're unleashed on the professional world to wreak their havoc. Not that many of the more experienced programmers know any better, because they received the same brain-damaged education. It's maddening. I see so many questions here and on Stack Overflow and on other sites where it's clear that the person asking the question is being taught by someone who simply doesn't know what they're talking about, and of course you can't just say "your professor is wrong," because he's a Professor and you're just some guy on the Internet.

And then you have the crowd that disdains any answer beginning with, "well, according to the language standard..." because they're working in the real world and according to them the standard doesn't apply to the real world. I can deal with someone who just has a bad education, but anyone who insists on being ignorant is just a blight on the industry.

There would be no buffer overflow problems if the language were taught correctly with an emphasis on writing secure code. It's not "hard", it's not "advanced", it's just being careful.

Yes, this has been a rant.

¹ Which, thankfully, has finally been yanked from the language specification, although it will lurk in 40 years' worth of legacy code forever.

While I mostly agree with you, I think you're still being a bit unfair. What we consider "safe" is also a function of time (and I see you've been a professional software developer much longer than me, so I'm sure you're familiar with this). Ten years from now someone will be having this same conversation about why the hell everyone in 2012 used DoS-able hash table implementations, didn't we know anything about security? If there's a problem in teaching, it's a problem that we focus too much on teaching "best" practice, and not that best practice itself evolves. — , Feb 21 '12 at 15:55
And let's be honest. You _could_ write safe code with just `sprintf`, but that doesn't mean the language wasn't flawed. C _was_ flawed and _is_ flawed - like any language - and it's important that we admit those flaws so we can continue to fix them. — , Feb 21 '12 at 15:58
@JoeWreschnig - While I agree with the larger point, I think there's a qualitative difference between DoS-able hash table implementations and buffer overruns. The former can be attributed to circumstances evolving around you, but the second has no excuses; buffer overruns are coding errors, period. Yes, C has no blade guards and will cut you if you're careless; we can argue over whether that's a flaw in the language or not. That's orthogonal to the fact that very few students are given *any* safety instruction when they're learning the language. — John Bode, Feb 21 '12 at 16:29

score 5 · Answer 6 · answered Feb 18 '12 at 09:53

The problem is as much one of managerial shortsightedness than of programmer incompetence. Remember, a 90,000-line application needs only one insecure operation to be completely insecure. It is almost beyond the realms of possibility that any application written on top of fundamentally insecure string handling will be 100% perfect - which means that it will be insecure.

The problem is that the costs of being insecure are either not charged to the right addressee (the company selling the app will almost never have to refund the purchase price), or not clearly visible at the time decisions are made ("We have to ship in March no matter what!"). I'm fairly certain that if you factored long-term costs and costs to your users rather than to your company profit in, writing in C or related languages would be much more expensive, probably so expensive that it is clearly the wrong choice in many fields where nowadays conventional wisdom says that it is a necessity. But that won't change unless much stricter software liability is introduced - which nobody in the industry wants.

-1 : Blaming management as the root of all evil is not particularly constructive. Ignoring history a little less so. The answer is nearly redeemed by the last sentence. — mattnz, Feb 20 '12 at 04:04
Stricter software liability could be introduced by users interested in security and willing to pay for it. Arguably, it could be introduced by having severe penalties for security breaches. A market-based solution would work if users would be willing to pay for security, but they aren't. — David Thornley, Feb 20 '12 at 15:06

score 4 · Answer 7 · answered Feb 20 '12 at 08:39

One of the great powers of using C is that it lets you manipulate memory in whatever way you see fit.

One of the great weaknesses of using C is that it lets you manipulate memory in whatever way you see fit.

There are safe versions of any unsafe functions. However, programmers and compiler do not strictly enforce their use.

score 2 · Answer 8 · answered Feb 19 '12 at 13:39

why do the creators of C not fix these issues by reimplementing the libraries?

Probably because C++ already did this, and is backward compatible with C code. So if you want a safe string type in your C code, you just use std::string and write your C code using a C++ compiler.

The underlying memory subsystem can help to prevent buffer overflows by introducing guard blocks and validity checking of them - so all allocations have 4 bytes of 'fefefefe' added, when these blocks are written to, the system can throw a wobbler. Its not guaranteed to prevent a memory write, but it will show that something has gone wrong and needs to be fixed.

I think the problem is that the old strcpy etc routines are still present. If they were removed in favour of strncpy etc then that would help.

Removing strcpy etc. entirely would make incremental upgrade paths even more difficult, which in turn would result in people not upgrading at all. The way it's done now you can switch to a C11 compiler, then start using _s variants, then ban non-_s variants, then fix existing usage, over whatever period of time is practically viable. — , Feb 22 '12 at 20:09

score -2 · Answer 9 · answered Feb 18 '12 at 13:17

-2

It is simple to understand why the overflow problem isn't fixed. C was flawed in a couple of areas. At the time those flaws were seen as tolerable or even as a feature. Now decades later those flaws are un-fixable.

Some parts of the programming community doesn't want those holes plugged. Just look at all the flame wars that start over strings, arrays, pointers, garbage collection...

answered Feb 18 '12 at 13:17

mhoran_psprep

2,328
2
16
14

5

LOL, terrible and wrong-headed answer. – Heath Hunnicutt Feb 18 '12 at 19:03
1

To expound on why this is a bad answer: C does indeed have many flaws, but allowing buffer overflows etc. has very little to do with them, but with the basic language requirements. It would not be possible to design a language to do C's job and not allow buffer overflows. Parts of the community don't want to give up the capabilities C allows them, often with good reason. There are also disagreements as to how to avoid some of these problems, showing that we don't have a complete understanding of programming language design, nothing more. – David Thornley Feb 21 '12 at 14:55
1

@DavidThornley: One could design a language to do C's job but make it so that the normal idiomatic ways of doing things would at least *allow* a compiler to check buffer overflows reasonably efficiently, should the compiler choose to do so. There's a huge difference between having `memcpy()` available and having it be only standard means of efficiently copying an array segment. – supercat Apr 28 '14 at 22:41

Why is it so difficult to make C less prone to buffer overflows?

9 Answers9

Linked

Related