22

I came across an interesting point today in a review over on Code Review. @Veedrac recommened in this answer that variable size types (e.g. int and long) be replaced with fixed size types like uint64_t and uint32_t. Citation from the comments of that answer:

The sizes of int and long (and thus the values they can hold) are platform-dependent. On the other hand, int32_t is always 32 bits long. Using int just means that your code works differently on different platforms, which is generally not what you want.

The reasoning behind the standard not fixing the common types is partially explained here by @supercat. C was written to be portable across architectures, in contrary to assembly which was usually used for systems programming at the time.

I think the design intention was originally that each type other than int be the smallest thing that could handle numbers of various sizes, and that int be the most practical "general-purpose" size that could handle +/-32767.

As for me, I've always used int and not really worried about the alternatives. I've always thought that it is the most type with best performance, end of story. The only place I thought fixed width would be useful is when encoding data for storage or for transfer over a network. I've rarely seen fixed width types in code written by others either.

Am I stuck in the 70s or is there actually a rationale for using int in the era of C99 and beyond?

jacwah
  • 385
  • 2
  • 9
  • 1
    A portion of people just imitates others. I believe the majority of code fixed-bit-type was made conscienceless. No reason to set size neither to not. I have code made primarily on 16 bits platforms (MS-DOS and Xenix of 80s), that just compile and run today on any 64 and benefits of the new word-size and addressing, just compiling it. That is to say that serialization to export/import data is an very important architecture design to keep it portable. – Luciano Jun 18 '15 at 18:13
  • 1
    Related: http://stackoverflow.com/questions/24444356/why-arent-the-c-supplied-integer-types-good-enough-for-basically-any-project – dan04 Jun 25 '15 at 23:28

2 Answers2

7

There is a common and dangerous myth that types like uint32_t save programmers from having to worry about the size of int. While it would be helpful if the Standards Committee were to define a means of declaring integers with machine-independent semantics, unsigned types like uint32_t have semantics which are too loose to allow code to be written in a fashion which is both clean and portable; further, signed types like int32 have semantics which are for many applications ways defined needlessly tightly and thus preclude what would otherwise be useful optimizations.

Consider, for example:

uint32_t upow(uint32_t n, uint32_t exponent)
{
  while(exponent--)
    n*=n;
  return n;
}

int32_t spow(int32_t n, uint32_t exponent)
{
  while(exponent--)
    n*=n;
  return n;
}

On machines where int either cannot hold 4294967295, or can hold 18446744065119617025, the first function will be defined for all values of n and exponent, and its behavior will not be affected by the size of int; further, the standard will not require that it yield different behavior on machines with any size of int Some values of n and exponent, however, will cause it to invoke Undefined Behavior on machines where 4294967295 is representable as an int but 18446744065119617025 is not.

The second function will yield Undefined Behavior for some values of n and exponent on machines where int cannot hold 4611686014132420609, but will yield defined behavior for all values of n and exponent on all machines where it can (the specifications for int32_t imply that two's-complement wrapping behavior on machines where it is smaller than int).

Historically, even though the Standard said nothing about what compilers should do with int overflow in upow, compilers would have consistently yielded the same behavior as if int had been large enough not to overflow. Unfortunately, some newer compilers may seek to "optimize" programs by eliminating behaviors not mandated by the Standard.

supercat
  • 8,335
  • 22
  • 28
  • 3
    Anyone happening to want to manually implement `pow`, remember this code is just an example and does not cater for `exponent=0`! – Mark Hurd Jun 15 '15 at 06:42
  • 1
    I think you should be using the prefix decrement operator not the postfix, currently it is doing 1 extra multiplication e.g. `exponent=1` will result in n being multiplied by itself once, since the decrement is performed after the check, if the increment is performed before the check (i.e. --exponent), no multiplication will be performed and n itself will be returned. – ALXGTV Jun 15 '15 at 06:48
  • 2
    @MarkHurd: The function is poorly named, since what it actually computes is `N^(2^exponent)`, but computations of the form `N^(2^exponent)` are often used in computation of exponentiation functions, and mod-4294967296 exponentiation is useful for things like computing the hash of the concatenation of two strings whose hashes are known. – supercat Jun 15 '15 at 13:54
  • 1
    @ALXGTV: The function was meant to be illustrative of something that computed something power-related. What it actually computes is N^(2^exponent), which is a part of efficiently computing N^exponent, and may well fail even if N is small (repeated multiplication of a `uint32_t` by 31 won't ever yield UB, but the efficient way to compute 31^N entails computations of 31^(2^N), which will. – supercat Jun 15 '15 at 13:59
  • I don't think this is a good argument. The aim isn't to make functions defined for all inputs, sensible or not; it's to be able to reason about sizes and overflow. `int32_t` sometimes having defined overflow and sometimes not, which is what you seem to be mentioning, seems of minimal importance relative to the fact that it lets me reason about preventing overflow in the first place. And if you do want defined overflow, chances are you're wanting the result modulo some fixed value - so you're using fixed-width types anyway. – Veedrac Jun 15 '15 at 19:31
  • And on the topic of performance - on normal architectures there will be a direct correspondence between fixed-width and variable-width types, so there's no performance loss at all. Further, in the cases there would be a performance loss, the variable width types have a good chance of simply being *wrong*. If you really care about performance this much, use the `intXX_fast_t` types - although they seem mostly pointless to me. – Veedrac Jun 15 '15 at 19:52
  • @Veedrac: Part of the design intention with unsigned types is that operations with them are defined on all inputs; it is common for things like hash functions to use unsigned types perform modular arithmetic. Having a Standard say that `n*=n;` is always allowed to perform the result according to mod-4294967296 arithmetic, is only required to do so when `n` is 3037000499 or less, is silly. As for signed types, the problem is that having overflow defined for signed types smaller than `int` means that... – supercat Jun 15 '15 at 20:04
  • ...code which stores a computation in a smaller integer type and then uses that value cannot simply use the cached in a register unless it first truncates and sign-extends it. The programmer might not care about whether oversize values are truncated or left as-is, but the Standard requires the truncation in any cases where it would be detectable. – supercat Jun 15 '15 at 20:11
  • @supercat Doing modular arithmetic against *some modulus you don't know* is silly (except in the case of hashes since you don't actually care about the value of the result). If you want modular arithmetic, *neither* type is sufficient as-is. – Veedrac Jun 15 '15 at 20:22
  • @Veedrac: Using `uint32_t`, the modulus will be `2^32` *in all cases where the result is defined*, but the Standard presently allows 64-bit compilers to negate the laws of time and causality for some values of `n`. Also, in some cases it may be just fine to perform calculations with a modulus which is known to be an arbitrary unknown *multiple of a modulus one is interested in*, and only reduce the result to the modulus of interest after all other calculations are complete, since reducing at the end will yield the same result (in defined cases) as reducing at every step. – supercat Jun 15 '15 at 20:31
  • Let us [continue this discussion in chat](http://chat.stackexchange.com/rooms/24846/discussion-between-veedrac-and-supercat). – Veedrac Jun 15 '15 at 20:49
4

For values closely related to pointers (and thus, to the amount of addressable memory) such as buffer sizes, array indexes, and Windows' lParam, it makes sense to have an integer type with an architecture-dependent size. So, variable-sized types are still useful. This is why we have the typedefs size_t, ptrdiff_t, intptr_t, etc. They have to be typedefs because none of the built-in C integer types need be pointer-sized.

So the question is really whether char, short, int, long, and long long are still useful.

IME, it's still common for C and C++ programs to use int for most things. And most of the time (i.e., when your numbers are in the range ± 32 767 and you don't have rigorous performance requirements), this works just fine.

But what if you need to work with numbers in the 17-32 bit range (like the populations of large cities)? You could use int, but that would be hard-coding a platform dependency. If you want to strictly adhere to the standard, you could use long, which is guaranteed to be at least 32 bits.

The problem is that the C standard doesn't specify any maximum size for an integer type. There are implementations on which long is 64 bits, which doubles your memory usage. And if these longs happen to be elements of an array with millions of items, you'll thrash memory like crazy.

So, neither int nor long is a suitable type to use here if you want your program to be both cross-platform and memory-efficient. Enter int_least32_t.

  • Your I16L32 compiler gives you a 32-bit long, avoiding the truncation problems of int
  • Your I32L64 compiler gives you a 32-bit int, avoiding the wasted memory of the 64-bit long.
  • Your I36L72 compiler gives you a 36-bit int

OTOH, suppose you don't need huge numbers or huge arrays but you have a need for speed. And int may be large enough on all platforms, but it isn't necessarily the fastest type: 64-bit systems usually still have 32-bit int. But you can use int_fast16_t and get the “fastest” type, whether it's int, long, or long long.

So, there are practical use cases for the types from <stdint.h>. The standard integer types don't mean anything. Especially long, which may be 32 or 64 bits, and may or may not be large enough to hold a pointer, depending on the whim of the compiler writers.

dan04
  • 3,748
  • 1
  • 24
  • 26
  • A problem with types like `uint_least32_t` is that their interactions with other types are even more weakly specified than those of `uint32_t`. IMHO, the Standard should define types like `uwrap32_t` and `unum32_t`, with the semantics that any compiler which defines type `uwrap32_t`, must promote as an unsigned type in essentially the same cases as it would be promoted if `int` were 32 bits, and any compiler which defines type `unum32_t` must ensure that basic arithmetic promotions always convert it to a signed type capable of holding its value. – supercat Aug 06 '15 at 19:47
  • Additionally, the Standard could also define types whose storage and aliasing were compatible with `intN_t` and `uintN_t`, and whose *defined* behaviors would be *consistent* with `intN_t` and `uintN_t`, but which would grant compilers some freedom in case code assigned values outside their range [allowing semantics similar to those that were perhaps intended for `uint_least32_t`, but without uncertainties like whether adding a `uint_least16_t` and an `int32_t` would yield a signed or usnigned result. – supercat Aug 06 '15 at 19:53