30

I've been reviewing C programming and there are just a couple things bothering me.

Let's take this code for example:

int myArray[5] = {1, 2, 2147483648, 4, 5};
int* ptr = myArray;
int i;
for(i=0; i<5; i++, ptr++)
    printf("\n Element %d holds %d at address %p", i, myArray[i], ptr);

I know that an int can hold a maximum value of positive 2,147,483,647. So by going one over that, does it "spill over" to the next memory address which causes element 2 to appear as "-2147483648" at that address? But then that doesn't really make sense because in the output it still says that the next address holds the value 4, then 5. If the number had spilled over to the next address then wouldn't that change the value stored at that address?

I vaguely remember from programming in MIPS Assembly and watching the addresses change values during the program step by step that values assigned to those addresses would change.

Unless I am remembering incorrectly then here is another question: If the number assigned to a specific address is bigger than the type (like in myArray[2]) then does it not affect the values stored at the subsequent address?

Example: We have int myNum = 4 billion at address 0x10010000. Of course myNum can't store 4 billion so it appears as some negative number at that address. Despite not being able to store this large number, it has no effect on the value stored at the subsequent address of 0x10010004. Correct?

The memory addresses just have enough space to hold certain sizes of numbers/characters, and if the size goes over the limit then it will be represented differently (like trying to store 4 billion into the int but it will appear as a negative number) and so it has no effect on the numbers/characters stored at the next address.

Sorry if I went overboard. I've been having a major brain fart all day from this.

stumpy
  • 417
  • 1
  • 4
  • 5
  • 10
    You might be getting confused with [string overruns](https://en.wikipedia.org/wiki/Buffer_overflow). – Robbie Dee Jan 20 '16 at 12:56
  • 19
    Homework: Modify a simple CPU so that it *does* spill. You'll see that the logic gets far more complex, all for a "feature" that would guarantee security holes everywhere without being useful in the first place. – phihag Jan 20 '16 at 15:05
  • 4
    If you need really huge numbers, it is possible to have a number representation that increases how much memory it uses to fit large numbers. The processor itself can't do this, and it's not a feature of the C language, but a library can implement it - a common C library is [the GNU Multiple Precision arithmetic library](https://gmplib.org/). The library has to manage memory to store the numbers which has a performance cost on top of the arithmetic. A lot of languages have this kind of thing built in (which doesn't avoid the costs). –  Jan 20 '16 at 15:10
  • 1
    write a simple test, I'm not a C programmer but something along the lines of `int c = INT.MAXINT; c+=1;` and see what happened to c. – JonH Jan 20 '16 at 20:41
  • 2
    @JonH: The problem is that overflow in Undefined Behavior. A C compiler may spot that code, and deduce that it's unreachable code **because** it unconditionally overflows. Since unreachable code doesn't matter, it can be eliminated. End result: no code left. – MSalters Jan 21 '16 at 07:57
  • @MSalters: can it be circumvented with `c=INT.MAXINT - 10; c+=user_inputed_value;` ? What can the compiler decide there? Shouldn't it just allow the operation to proceed (and at runtime, then, overflow can be had with an inputed value > 10) ? – Olivier Dulac Jan 21 '16 at 14:43
  • @OlivierDulac: The compiler may only eliminate code that _certainly_ overflows. In your example, there's no overflow for inputs 0-10. But keep in mind that the compiler now knows for certain that `user_inputed_value<=10`. – MSalters Jan 21 '16 at 16:07
  • @MSalters: the compiler may *assume* that (and, for exemple, reserve only 1 byte (in user_inputed_value is an integer) to old that value as it *assumes* (wrongly) that it won't need more), but it can't *know for certain*, unless it also modifies all the code that reads the user_inputed_value, which it shouldn't (/can't). So it should allow one to force the overflow by inputing a number >10. – Olivier Dulac Jan 21 '16 at 16:54
  • 1
    @OlivierDulac: Actually, the compiler may look at the input routine, see that it repeatedly scans for digits, and unroll that loop to look only at the last two digits. This is why some people say that Undefined Behavior can travel back in time. That potential overflow can affect the code leading up to it. – MSalters Jan 21 '16 at 17:02
  • @MSalters interresting! – Olivier Dulac Jan 21 '16 at 17:19
  • @MSalters, not directly related to the topic at hand, but a compiler may optimize away a boolean expression that tests whether or not an overflow occurred. (Don't ask me how I know!) – Solomon Slow Jan 21 '16 at 20:12

5 Answers5

48

No, it does not. In C, variables have a fixed set of memory addresses to work with. If you are working on a system with 4-byte ints, and you set an int variable to 2,147,483,647 and then add 1, the variable will usually contain -2147483648. (On most systems. The behavior is actually undefined.) No other memory locations will be modified.

In essence, the compiler will not let you assign a value that is too big for the type. This will generate a compiler error. If you force it to with a case, the value will be truncated.

Looked at in a bitwise way, if the type can only store 8 bits, and you try to force the value 1010101010101 into it with a case, you will end up with the bottom 8 bits, or 01010101.

In your example, regardless of what you do to myArray[2], myArray[3] will contain '4'. There is no "spill over". You are trying to put something that is more than 4-bytes it will just lop off everything on the high end, leaving the bottom 4 bytes. On most systems, this will result in -2147483648.

From a practical standpoint, you want to just make sure this never, ever happens. These sorts of overflows often result in hard-to-solve defects. In other words, if you think there is any chance at all your values will be in the billions, don't use int.

SuperBiasedMan
  • 263
  • 3
  • 9
Gort the Robot
  • 14,733
  • 4
  • 51
  • 60
  • 52
    *If you are working on a system with 4-byte ints, and you set an int variable to 2,147,483,647 and then add 1, the variable will contain -2147483648.* => **No**, it's **Undefined Behavior**, so it might loop around or it might do something else entirely; I've seen compilers optimizing checks based on the absence of overflow and got infinite loops for example... – Matthieu M. Jan 20 '16 at 15:43
  • Sorry, yes, you are correct. I should have added a "usually" in there. – Gort the Robot Jan 21 '16 at 09:33
  • @MatthieuM from a *language* perspective, that's true. In terms of execution on a given system, which is what we're talking about here, it's absolute nonsense. – hobbs Jan 21 '16 at 15:42
  • @hobbs: The problem is that when the compilers mangle the program because of Undefined Behavior, actually running the program will indeed produce an unexpected behavior, comparable in effect to overwriting memory. – Matthieu M. Jan 21 '16 at 17:16
24

Signed integer overflow is undefined behavior. If this happens your program is invalid. The compiler is not required to check this for you, so it may generate an executable that appears to do something reasonable, but there is no guarantee that it will.

However, unsigned integer overflow is well-defined. It will wrap modulo UINT_MAX+1. The memory not occupied by your variable will not be affected.

See also https://stackoverflow.com/q/18195715/951890

Vaughn Cato
  • 1,069
  • 7
  • 12
  • signed integer overflow is just as well-defined as is unsigned integer overflow. if the word has $N$ bits, the upper boundary of signed integer overflow is at $$ 2^{N-1}-1 $$ (where it wraps around to $-2^{N-1}$) whereas the upper boundary for unsigned integer overflow is at $$ 2^N - 1$$ (where it wraps around to $0$). same mechanisms for addition and subtraction, same size of the range of numbers ($2^N$) that can be represented. just a different boundary of overflow. – robert bristow-johnson Jan 22 '16 at 06:00
  • 1
    @robertbristow-johnson: Not according to the C standard. – Vaughn Cato Jan 22 '16 at 06:07
  • well, standards sometimes are anachronistic. looking at the SO reference, there is one comment that hits it directly: "The important note here, though, is that there remain no architectures in the modern world using anything other than 2's complement signed arithmetic. That the language standards still allow for implementation on e.g. a PDP-1 is a pure historical artifact. – Andy Ross Aug 12 '13 at 20:12" – robert bristow-johnson Jan 22 '16 at 07:18
  • i suppose it *isn't* in the C standard, but i suppose there could be an implementation where regular binary arithmetic is not used for `int`. i s'pose they could use [Gray code](https://en.wikipedia.org/wiki/Gray_code) or [BCD](https://en.wikipedia.org/wiki/Binary-coded_decimal) or [EBCDIC](https://en.wikipedia.org/wiki/EBCDIC). dunno why anybody would design hardware to do arithmetic with Gray code or EBCDIC, but then again, i dunno why anyone would do `unsigned` with binary and do signed `int` with anything other than 2's complement. – robert bristow-johnson Jan 22 '16 at 07:24
14

So, there are two things here:

  • the language level: what are the semantics of C
  • the machine level: what are the semantics of the assembly/CPU you use

At the language level:

In C:

  • overflow and underflow are defined as modulo arithmetic for unsigned integers, thus their value "loops"
  • overflow and underflow are Undefined Behavior for signed integers, thus anything can happen

For those would want a "what anything" example, I've seen:

for (int i = 0; i >= 0; i++) {
    ...
}

turn into:

for (int i = 0; true; i++) {
    ...
}

and yes, this is a legitimate transformation.

It means that there are indeed potential risks of overwriting memory on overflow due to some weird compiler transformation.

Note: on Clang or gcc use -fsanitize=undefined in Debug to activate the Undefined Behavior Sanitizer which will abort on underflow/overflow of signed integers.

Or it means that you can overwrite memory by using the result of the operation to index (unchecked) into an array. This is unfortunately far more likely in the absence of underflow/overflow detection.

Note: on Clang or gcc use -fsanitize=address in Debug to activate the Address Sanitizer which will abort on out-of-bounds access.


At the machine level:

It really depends upon the assembly instructions and CPU you use:

  • on x86, ADD will use 2-complement on overflow/underflow, and set the OF (Overflow Flag)
  • on the future Mill CPU, there will be 4 different overflow modes for Add:
    • Modulo: 2-complement modulo
    • Trap: a trap is generated, halting computation
    • Saturate: value gets stuck to min on underflow or max on overflow
    • Double Width: the result is generated in a double-width register

Note that whether things happen in registers or memory, in neither case does the CPU overwrite memory on overflow.

Matthieu M.
  • 14,567
  • 4
  • 44
  • 65
  • Are the last three modes signed? (Doesn't matter for the first one, as it's 2-complement.) – Deduplicator Jan 20 '16 at 17:55
  • 1
    @Deduplicator: According to [Introduction to the Mill CPU Programming Model](http://millcomputing.com/topic/introduction-to-the-mill-cpu-programming-model-2/) there are different opcodes for signed addition and unsigned addition; I expect that both opcodes would support the 4 modes (and be able to operate on various bit-width and scalar/vectors). Then again, it's vapor hardware for now ;) – Matthieu M. Jan 20 '16 at 18:04
4

To further @StevenBurnap's answer, the reason this happens is because of how computers work at machine-level.

Your array is stored in memory (e.g. in RAM). When an arithmetic operation is performed, the value in memory is copied into the input registers of the circuit that performs the arithmetic (the ALU: Arithmetic Logic Unit), the operation is then carried out on the data in the input registers, producing a result in the output register. This result is then copied back into memory at the correct address in memory, leaving other areas of memory untouched.

Pharap
  • 551
  • 3
  • 13
4

First (assuming C99 standard), you may want to include <stdint.h> standard header and use some of the types defined there, notably int32_t which is exactly a 32 bits signed integer, or uint64_t which is exactly a 64 bits unsigned integer, and so on. You might want to use types like int_fast16_t for performance reasons.

Read others answers explaining that unsigned arithmetic never spills (or overflows) to adjacent memory locations. Beware of undefined behavior on signed overflow.

Then, if you need to compute exactly huge integer numbers (e.g. you want to compute factorial of 1000 with all its 2568 digits in decimal), you want bigints a.k.a. arbitrary precision numbers (or bignums). Algorithms for efficient bigint arithmetic are very clever, and usually requires using specialized machine instructions (e.g. some add word with carry, if your processor has that). Hence I strongly recommend in that case to use some existing bigint library like GMPlib

Basile Starynkevitch
  • 32,434
  • 6
  • 84
  • 125