28

I see and work with a lot of software, written by a fairly large group of people. LOTS of times, I see integer type declarations as wrong. Two examples I see most often: creating a regular signed integer when there can be no negative numbers. The second is that often the size of the integer is declared as a full 32 bit word when much smaller would do the trick. I wonder if the second has to do with compiler word alignment lining up to the nearest 32 bits but I'm not sure if this is true in most cases.

When you create a number, do you usually create it with the size in mind, or just create whatever is the default "int"?

edit - Voted to reopen, as I don't think the answers adequately deal with languages that aren't C/C++, and the "duplicates" are all C/C++ base. They fail to address strongly typed languages such as Ada, where there cannot be bugs due to mismatched types...it will either not compile, or if it can't be caught at compile time, will throw an exception. I purposely left out naming C/C++ specifically, because other languages treat different integers much differently, even though most of the answers seem to be based around how C/C++ compilers act.

prelic
  • 886
  • 7
  • 15
  • 37
    A signed, 32 bit integer is the one numeric type that works in the largest number of common programming cases. Using it as the type that fulfills those common cases is not wrong, despite your assertion that it is. – Robert Harvey Jul 17 '17 at 01:46
  • Robert, that's true, but what's your stance on situations where if the number got above say 8 or 16 bits something is terribly wrong by that point anyway and still use 32-but numbers? Enumerations as numbers for example. – prelic Jul 17 '17 at 01:49
  • 7
    I don't understand what you just said. Make a clear assertion and I'll try to refute it. – Robert Harvey Jul 17 '17 at 01:49
  • I guess be more clear, say you have a variable, which represents some thing. If the thing value ever required more than 8 or 16 bits, and then something terrible has already happened, and overwriting memory is the least of your problems, why use 32 bits? – prelic Jul 17 '17 at 01:52
  • 11
    That seems too hypothetical and abstract to consider a serious possibility. 32 bits is used, not because it can cause enums to be too large, but because it's the "common numeric type," and so it can be utilized in the largest number of programming cases without requiring endless casting/numeric conversions. – Robert Harvey Jul 17 '17 at 01:53
  • Well, using signed integer might be somewhat more "safe" if the code is not 100% bug free. For example, decrementing 0 would give -1 which is smaller than 0 and not 2^32 -1 which is the largest signed 32 bit integer. For the size, `int` is often used as anyway most computation are done using them anyway. – Phil1970 Jul 17 '17 at 01:55
  • 6
    I think you're expecting numeric type ranges (i.e. int, uint, ushort, etc) to provide the necessary numeric range restrictions to make you safe. That's not how it works. – Robert Harvey Jul 17 '17 at 01:55
  • I guess that's my questions. Your usual every-day number needs 32 bits? Not 8 or 16 or whatever is reasonable in the context? – prelic Jul 17 '17 at 01:55
  • 3
    Your usual everyday number can fit in 32 bits. Go to 8 or 16 bits for your common numeric type, and you will exclude many common, valid use cases. 16 unsigned bits will only count to 65535, and will prevent negative numbers. 8 bits only counts to 255. You can't even count range in miles in most cars with 8 bits. – Robert Harvey Jul 17 '17 at 01:56
  • But what if I'm not counting miles in cars but colors of traffic lights? – prelic Jul 17 '17 at 02:09
  • 18
    @prelic Then you're saving 3 bytes of space for a massive complication in development workflow. Using `int` for individual variables without spending a moment's thought on it is the smart thing to do, because it wastes almost nothing and saves a whole lot - moments of thought (for programmers) are in fact one of the most expensive currencies imaginable. – Kilian Foth Jul 17 '17 at 05:49
  • 3
    Because unsigned support sucks in many languages (e.g. C#) – CodesInChaos Jul 17 '17 at 08:06
  • 1
    @CodesInChaos Can you elaborate please? – Robbie Dee Jul 17 '17 at 09:37
  • 3
    @RobbieDee [None of the .NET APIs use unsigned ints](https://stackoverflow.com/questions/3935165/why-does-net-framework-not-use-unsigned-data-types) because unsigned integers are not supported by all .NET languages. Even if you're not interested in working between .NET languages, coding 'against' the way the framework works is always going to suck. I think they're there for things like bitwise math, not for actually representing non-negative numbers through your code. – Nathan Cooper Jul 17 '17 at 11:29
  • 3
    If you insist on being pedantic and want define an integer variable with a range from 42 to 93, because those are the only value values that it can take in your app, you can always program in Pascal. (Except there are a lot of other good reasons why nobody does that any more....) – alephzero Jul 17 '17 at 12:04
  • @NathanCooper While this is undoubtedly true, there is a subtle different between it not being used in the framework and a lack of support for the construct within the language... – Robbie Dee Jul 17 '17 at 13:09
  • 1
    The current answers do a good job of refuting a performance-centric argument for using more specific integer types, but there's also a safety/correctness argument. It'd be nice to get compile-time safety against passing a negative number to a method that is only valid for positives. – Ben Aaronson Jul 17 '17 at 15:57
  • @Ben - this was something I'd hoped to see addressed, but didn't really see anyone talking about it, so I will give my opinion. We use Ada for some things, and we have a strong type system so things like radians and degrees cannot be mixed up. In a similar vein, if I use a signed number because that's all I need, I cannot mistakenly compare it to an unsigned number. So even if your argument is "just use 32-bit signed integers always, and you won't have problems", that doesn't address differences in units (radians/degrees, ft/inches, etc). I think that's a hugely overlooked benefit. – prelic Jul 23 '17 at 20:16

8 Answers8

60

Do you see the same thing?

Yes, the overwhelming majority of declared whole numbers are int.

Why?

  1. Native ints are the size your processor does math with*. Making them smaller doesn't gain you any performance (in the general case). Making them larger means they maybe (depending on your processor) can't be worked on atomically, leading to potential concurrency bugs.
  2. 2 billion and change is big enough to ignore overflow issues for most scenarios. Smaller types mean more work to address them, and lots more work if you guess wrong and you need to refactor to a bigger type.
  3. It's a pain to deal with conversion when you've got all kinds of numeric types. Libraries use ints. Clients use ints. Servers use ints. Interoperability becomes more challenging, because serialization often assumes ints - if your contracts are mismatched, suddenly there are subtle bugs that crop up when they serialize an int and you deserialize a uint.

In short, there's not a lot to gain, and some non-trivial downsides. And frankly, I'd rather spend my time thinking about the real problems when I'm coding - not what type of number to use.

*- these days, most personal computers are 64 bit capable, but mobile devices are dicier.

Telastyn
  • 108,850
  • 29
  • 239
  • 365
  • 3
    "*I'd rather spend my time thinking about the real problems when I'm coding*" - integer overflows are a significant cause of code security issues, frequently ignored by developers. The real reason developers use ints everywhere: it's just easier. – adelphus Jul 17 '17 at 12:59
  • 1
    Re #3, even outside of serialization trying to do calculations with mixed integer types turns into a nightmare of casting (and if you care about handing unexpected fault cases) checking to make sure everything will fit across the cast. Except in edge cases (microcontrollers with limited amounts of ram, enormous databases, interfacing with systems using something else) using signed int32 or int64 everywhere is preferable to reduce the amount of extra work needed. – Dan Is Fiddling By Firelight Jul 17 '17 at 14:13
  • Just as in the 80's, "Not every computer is a VAX"; these days, not every computer is a personal computer. I write for computers with all of the following word lengths; 4, 8, 16, 24 and 32 bits. The 24-bit char makes for some interesting debugging. – uɐɪ Jul 17 '17 at 15:01
  • @adelphus I've seen similar assertions elsewhere, but I'm rather skeptical of the can you point to a significant exploit found in the wild that relied on an integer overflow? Preferably more than one. – Kevin Jul 17 '17 at 16:31
  • You appear to misunderstand the first half of the question. He's asking why he sees "int" in places where "unsigned int" is correct, because the quantity in question cannot be negative. (For it to be negative would be an error.) – John R. Strohm Jul 17 '17 at 16:40
  • @adelphus: Use Swift. No integer overflow is ignored. Instead, it crashes. Guaranteed. – gnasher729 Jul 17 '17 at 16:48
  • @Kevin are you being serious? Just go to the CVE site and enter "integer overflow" as the search parameter. And those are just the vulns that are openly reported - the real figure is likely to be much higher. – adelphus Jul 17 '17 at 20:08
  • @adelphus no need to be a dick about it. I understand there are plenty of reported potential vulnerabilities, but I don't see anywhere on CVE that indicates whether they were actually exploited as a zero-day. – Kevin Jul 17 '17 at 20:52
  • Overflowing integers is sometimes intended behavior, and not an error condition. For example, lots of protocols use alive counters which are free to overflow. – prelic Jul 17 '17 at 23:51
  • @Telastyn - I still don't fully agree that regular default ints should always be used, but I appreciate your explanation on why this is the norm. Marked as the answer. – prelic Jul 17 '17 at 23:52
  • Even on modern 64-bit CPUs, 32-bit operations are often more efficient in some manner - for example, in x86-64, the instructions are shorter. – mtraceur Jan 02 '20 at 01:09
19

Regarding size, you are operating under the mistaken impression that "smaller is better", which is simply not true.

Even if we completely ignore issues like programmer time or propensity for error, smaller integer types can still have following disadvantages.

Smaller types = bigger work

Processors don't work on arbitrary sized data; they do operations in registers of specific sizes. Trying to do arithmetic with less precision than stored in the registers can easily require you to do extra work.

For example, if a C program does arithmetic in uint8_t — an unsigned 8-bit integer type where overflow is specified to be reduction modulo 256 — then unless the processor has specialized assembly instructions to handle the special case, your program will have to follow every arithmetic operation with a mask by 0xff, unless the compiler is capable of outright proving that the mask is unnecessary.

Smaller types = inefficient memory

Memory is not uniform. It is fairly common on processors that accessing memory on addresses that are multiples of 4 bytes (or more!) is much more efficient than accessing memory on other addresses.

You may think that using a 1-byte field rather than a 4-byte field is helping you, but the reality may be that it's actually harming you due to such misaligned memory accesses running slower than they need to be.

Of course, compilers know all about this, and in many places will insert the needed wasted space to make things faster:

struct this_struct_is_64_bits_not_40_bits
{
    uint32_t x; uint8_t y;
};

Signed integers = more optimization opportunities

A peculiarity of C and C++ is that signed integer overflow is undefined behavior, which allows the compiler to make optimizations without regard to what effect the optimization might have in the case of overflow.

Optimization guides often outright recommend the use of signed integers in many places for exactly this reason. For example, from the CUDA Best Practices Guide

Note:Low Medium Priority: Use signed integers rather than unsigned integers as loop counters.

  • 2
    " if a C program does arithmetic in uint8_t" - they don't. C promotes arguments smaller than int to int. This rule directly avoids the `&0xff` masking you're assuming. Instead, it's only on the final store to a `uint_8` where masking must be done. – MSalters Jul 17 '17 at 10:08
  • 2
    @MSalters: Sure, but one often stores values in objects of type `uint8_t` when doing arithmetic with objects of type `uint8_t`. –  Jul 17 '17 at 10:17
  • Also: "Smaller types = greater possibility of [silent] overflow," often with catastrophic consequences. – Mike Harris Jul 17 '17 at 18:34
  • On your point of alignment, wouldn't that mean there would be no reason for struct packing and padding? Why do people put bigger size types at the top of structs then, if everything is aligned at a boundary anyway? There may be situations where even small types are aligned to the nearest boundary, but that is certainly not always the case. – prelic Jul 17 '17 at 23:57
  • 1
    @prelic: I don't mean to imply there are *never* times when using smaller integral types can be beneficial; sometimes the amount of memory used is more important than how efficiently it's accessed (e.g. if you're limited by total memory or available bandwidth), there are times when you have specialized instructions (e.g. x86 vector extensions), or other things. The main point I want to get across is that benefits are not automatic and detriments are common -- the choice to use a smaller integer type is something that should be a deliberate and informed decision, not an automatic habit. –  Jul 18 '17 at 03:45
18

Using signed 32 bit int "just works" in all of these cases:

  • Loops
  • Integer arithmetic
  • Array indexing and sizing
  • Enumeration values
  • Size of objects in memory (most reasonably sized things)
  • Image dimensions (reasonably-sized images)

Yes-- not all of the uses require signage or 32 bits of data, but the compatibility of signed int 32 with most use cases make it an easy choice to make. Picking any other integer type would take consideration that most people don't want to take the time to take. And with the availability of memory today we enjoy the luxury of wasting a few bytes here and there. Standardizing on a common integer type makes everybody's life a bit easier, and most libraries default to using signed 32-bit integers, so choosing to use other integer types would be a hassle from a casting/converting stand-point.

Samuel
  • 9,137
  • 1
  • 25
  • 42
  • Hey, I do it all the time too. But don't you feel it's kind of a cop-out to say we use them because it's easiest and it's usually been done that way? – prelic Jul 17 '17 at 02:13
  • 12
    @prelic We do it that because it is easier and there is no compelling reason to NOT do it that way. – Craig T Jul 17 '17 at 06:35
  • 7
    Also because if you do something else you make people stop and wonder why. The fewer things that make me wonder in code the faster I can read it and be done with it. I've had my fill of needless mysteries. – candied_orange Jul 17 '17 at 06:43
  • One place to be very careful is that use of integers for array sizes or sizes of objects in memory, especially if I as a user can cause your system to parse a file with carefully crafted contents, I can likely exploit such code to overwrite the stack, and a simple minded n < (Some value) will not stop me if n is signed..... Integers have their place, but you always need to be aware that they are signed and will wrap in unfortunate (and undefined) ways, and that sometimes such things can be exploitable. However, that very undefinedness of the wrap behaviour can give optimisation opportunities. – Dan Mills Jul 17 '17 at 10:10
  • @DanMills for array sizes, you want to be using a `size_t` which _will_ be an unsigned type, as opposed to an `int`, signed or not. – Baldrickk Jul 17 '17 at 10:36
  • @Balderickk Indeed, and I usually end up using the stuff in when writing in C, it comes from an embedded background, but the point is not unique to C. – Dan Mills Jul 17 '17 at 14:55
  • Actually, using signed int, as opposed to unsigned in, does NOT "just work" in array indexing. A few years ago, I watched a guy spend three days working on a crash, that turned out to be an array write corrupting memory, because the subscript was -1. If the subscript variable had been declared "unsigned", this would have caused an immediate crash and been much easier to find. – John R. Strohm Jul 17 '17 at 16:42
  • It would probably also "just work" if you allocated way more memory than you need, way more processor time than you need, etc. I mean, sure, "just works" is a damn good reason, but it still feels a little silly to use 32 or more bytes to store data which I know _damn well_ will never exceed an 8 byte number. – prelic Jul 17 '17 at 23:55
10

There are still many millions, or billions, of embedded processing devices out there where the "default" integer is 16 bits, eight bits, (a few even smaller), where the assumption that a signed integer is enough is not a valid assumption. (I work with them all of the time).

If you are dealing with any form of communications protocol, you should be thinking about:

  • Sizes, (8 bits, 16, 32, 64, others),
  • Signed/Unsigned
  • Endianness
  • Packing/Alignment

So while I see people just using int all over the place in my field of work we have specific rules against it, (MISRA), and deliberately design our communications protocols, type and data stores with the pitfalls in mind and have to reject such code before it gets into production code.

Steve Barnes
  • 5,270
  • 1
  • 16
  • 18
5

I would like to post an answer which goes in the opposite direction to most of the others. I argue that using int for everything is not good, at least in C or C++.

  1. int does not have much semantic meaning. Using a strictly typed language you should convey as much meaning as possible with your types. So, if your variable represents a value for which it makes no sense to be negative, why not conveying this using unsigned int?
  2. Similar to the above, even more precise types are available than int and unsigned int: in C, the size of an object should be size_t, the offset of a pointer should be ptrdiff_t, etc. They will all really be translated to the appropriate int types by the compiler, but they convey some additional, useful information.
  3. Precise types can allow some architecture-specific optimisation (e.g. uint_fast32_t in C).
  4. Normally, a 64-bit processor can operate on one 64-bit value at a time or on two 32-bit values. In other words, in one clock cycle, you can for example perform 1 64-bit sum, or two 32-bit sums. This effectively doubles the speed of most math operations if 32-bit integers are enough for you. (I cannot find a text quote for this, but iirc it was said by Alexei Alexandrescu in a CppCon talk which would make for a quite authoritative source).
  5. If you use a 32-bit unisgned integer instead of a 64-bit signed integer, for a value which can anyways only be positive, you have effectively halved the memory required to hold that value. It might not be so important in the grand scheme of things, if you think about how cheap RAM is nowadays on most platforms, but it can make the difference if you are doubling the quantity of data that goes into your L1 cache, for example!
Alberto Santini
  • 243
  • 2
  • 7
  • 1
    "`int` does not have any semantics." - you mean it doesn't have the semantics you want (in some cases)? Surely it must have *some* semantics. – npostavs Jul 17 '17 at 15:27
  • 1
    Point 4 does not sound entirely accurate. Was Alexei Alexandrescu perhaps talking about [SIMD](https://en.wikipedia.org/wiki/SIMD) or [vector](https://en.wikipedia.org/wiki/Automatic_vectorization) operations (e.g. [SSE2 for x86](https://en.wikipedia.org/wiki/Streaming_SIMD_Extensions))? Most common architectures cannot perform separate, simultaneous operations on different parts of a general purpose register. Things like SSE2 can, but with the restriction that you have to perform the same operation on all parts. You can't multiply the upper half of a register while XORing the lower half. – 8bittree Jul 18 '17 at 00:15
1

There are several reasons why it's usually slightly simpler to use signed numbers in C during calculations. But these are just recommendations which apply to calculations/loops in C-like languages, not for designing your abstract data types, communication protocols and anything where storage is concerned.

  1. In 99% cases, your variables will operate much closer to zero than to the MAX_INT value, and in these cases using a signed int often makes it simpler to ensure correctness:

    // if i is unsigned this will loop forever
    // due to underflow to (unsigned)-1
    while (--i >= 0)
    { /* do something */ }
    
  2. Integer promotion in C is a rule which tries to promote all smaller-than-int operands into a (signed) int, if they fit. This means that your smaller unsigned variables (uint8_t or uint16_t) will be treated as an int during operations:

    uint8_t x = 1;
    uint8_t y = 2;
    
    // this will produce a warning, because the 
    // result of `x + y` is an `int`, and you're
    // placing it into a `uint8_t` without explicitly
    // casting:
    
    uint8_t result = x + y;
    

    At the same time, by using smaller types, you haven't probably gained anything in terms of performance, because compilers usually choose int to match the word size of the target architecture, so CPU registers won't really care if you are using anything smaller.

Obviously, this doesn't mean you will waste space in struct fields on 32-bit ints, if all you need is a uint8_t.

vgru
  • 623
  • 4
  • 16
0

Regarding the signed/unsigned thing: Remember that unsigned arithmetic has totally different semantics compared to signed arithmetic. Unsigned arithmetic is mod 2^n (where n is the number of bits of your unsigned type). However, such an arithmetic is often undesired and it is better to handle an overflow as error.

As far as C++ is concerned also note that there exist some regret in the standards committee about using unsigned data types all over the standard library. See this video at 9:50, 42:40, 1:02:50.

sigy
  • 716
  • 4
  • 8
0

signed vs. unsigned or 16-bits vs. 32-bits are only few cases of specifying exact boundaries for integer variables.

C has no way to specify these boundaries, like in Ada:

subtype My_Index is Integer range 2 .. 7;

In C, int, short, char, long, unsigned, are only a convenient way for optimizing storage size. They are not intended to carry a strict semantic.

mouviciel
  • 15,473
  • 1
  • 37
  • 64