26

I have a very simple question that baffles me for a long time. I am dealing with networks and databases so a lot of data I am dealing with are 32-bit and 64-bit counters (unsigned), 32-bit and 64-bit identification ids (also do not have meaningful mapping for sign). I am practically never deal with any real word matter that could be expressed as a negative number.

Me and my co-workers routinely use unsigned types like uint32_t and uint64_t for these matters and because it happens so often we also use them for array indexes and other common integer uses.

At the same time various coding guides I am reading (e.g. Google) discourage use of unsigned integer types, and as far as I know neither Java nor Scala have unsigned integer types.

So, I could not figure out what is the right thing to do: using signed values in our environment would be very inconvenient, at the same time coding guides to insist on doing exactly this.

Mael
  • 2,305
  • 1
  • 13
  • 26
zzz777
  • 456
  • 1
  • 5
  • 11
  • 1
    related: [Is integer used too much as a data type?](https://softwareengineering.stackexchange.com/q/95863/31260) – gnat Jul 17 '17 at 08:26

7 Answers7

33

There are two schools of thought on this, and neither will ever agree.

The first argues that there are some concepts that are inherently unsigned - such as array indexes. It makes no sense to use signed numbers for those as it may lead to errors. It also can impose un-necessary limits on things - an array that uses signed 32-bit indexes can only access 2 billion entries, while switching to unsigned 32-bit numbers allows 4 billion entries.

The second argues that in any program that uses unsigned numbers, sooner or later you will end up doing mixed signed-unsigned arithmetic. This can give strange and unexpected results: casting a large unsigned value to signed gives a negative number, and conversely casting a negative number to unsigned gives a large positive one. This can be a big source of errors.

Simon B
  • 9,167
  • 4
  • 26
  • 33
  • 9
    Mixed signed-unsigned arithmetic issues are detected by the compiler; just keep your build warning-free (with a high enough warning level). Besides, `int` is shorter to type :) – rucamzu Jan 29 '14 at 22:30
  • 7
    Confession: I'm with the second school of thought, and though I understand the considerations for unsigned types: `int` is more than enough for array indexes 99.99% of times. The signed - unsigned arithmetic issues are far more common, and thus take precedence in terms of what to avoid. Yes, compilers warn you about this, but how many warnings do you get when compiling any sizable project? Ignoring warnings is dangerous, and bad practice, but in the real world... – Elias Van Ootegem Jan 30 '14 at 08:30
  • 14
    +1 to the answer. **Caution**: _Blunt Opinions Ahead_: 1: My response to the second school of thought is: I'd bet money that anyone who gets unexpected results out of _unsigned_ integral types in C will have undefined behavior (and not the purely-academic kind) in their non-trivial C programs that use _signed_ integral types. If you don't know C well enough to think that unsigned types are the _better_ ones to use, I advise avoiding C. 2: There's exactly one correct type for array indexes and sizes in C, and that's `size_t`, unless there's a special-case good reason otherwise. – mtraceur Apr 08 '16 at 07:08
  • 1
    Also: At least in C (don't know about C++), assigning an unsigned type into a signed type which can't fit that exact value has _implementation defined_ results [C89 3.2.1.2](http://port70.net/~nsz/c/c89/c89-draft.txt). [C99 6.3.1.3](http://port70.net/~nsz/c/c99/n1256.txt) and [C11 6.3.1.3](http://port70.net/~nsz/c/c11/n1570.txt) explicitly includes "raising a signal" amongst the possible implementation behaviors. In other words, even assuming that it will convert to your large unsigned integer into a negative number is in error. – mtraceur Apr 08 '16 at 07:28
  • 6
    You run into trouble without mixed signedness. Just calculate unsigned int minus unsigned int. – gnasher729 Jun 13 '16 at 19:06
  • 5
    Not taking issue with you Simon, only with the *first school of thought* that argues that "there are some concepts that are inherently unsigned - such as array indexes." specifically: "There's exactly one correct type for array indexes ... in C, " **Bullshit!**. We DSPers use negative indices all of the time. particularly with even or odd-symmetry impulse responses that are non-causal. and for LUT math. i'm in the second school of thought, but i think that it is useful to have **both** signed and unsigned integers in C and C++. – robert bristow-johnson Feb 22 '18 at 01:24
  • @gnasher729 but don't you *always* run into trouble when arithmetic overflows? – RonJohn Jun 15 '23 at 17:18
22

First of all, the Google C++ coding guideline is not a very good one to follow: it shuns things like exceptions, boost, etc which are staples of modern C++. Secondly, just because a certain guideline works for company X doesn't mean it will be the right fit for you. I would continue using unsigned types, as you have a good need for them.

A decent rule of thumb for C++ is: prefer int unless you have a good reason to use something else.

bstamour
  • 1,239
  • 9
  • 8
  • I do believe that using exceptions is counter productive anywhere outside library development. And the main reason is 80% of application engineers miss at leas 20% corner cases. – zzz777 Jan 31 '14 at 15:56
  • And it is not only Google, John Lakos [link](http://www.amazon.com/Large-Scale-Software-Design-John-Lakos/dp/0201633620) gives exactly same advise on not using unsigned integers. – zzz777 Jan 31 '14 at 15:57
  • Exceptions are one of the cleanest ways to signal failure from constructors. They're one of the corner-stones of the RAII idiom. As for unsigned integers - I actually agree that you should avoid them unless you have a good reason not to, as I said in my answer. – bstamour Jan 31 '14 at 18:14
  • @bastamout, agree with you that constructor initialization list the only place where mere mortals can use exceptions. But in any case one would have to make constructor private and have public factory function to call constructor and catch the exception. So there is no point of dealing with exceptions event there. – zzz777 Feb 02 '14 at 00:27
  • 8
    That's not what I mean at all. Constructors are for establishing invariants, and since they are not functions they cannot simply `return false` if that invariant is not established. So, you can either separate things and use init functions for your objects, or you can throw a `std::runtime_error`, let stack unwinding happen, and let all of your RAII objects auto-clean themselves and you the developer can handle the exception where it is convenient for you to do so. – bstamour Feb 02 '14 at 15:51
  • If you have more than 200 developers, your business is application development (as opposed to library/infrastructure) there is a great probability that your plan would not work at all. – zzz777 Feb 03 '14 at 02:47
  • 5
    I don't see how the type of application makes a difference. Any time you call a constructor on an object you are establishing an invariant with the parameters. If that invariant cannot be met, then you need to signal an error else your program is not in a good state. Since constructors cannot return a flag, throwing an exception is a natural option. Please give a solid argument as to why a business application would not benefit from such a coding style. – bstamour Feb 03 '14 at 13:14
  • 2
    Even Microsoft agrees with me (gross!) http://msdn.microsoft.com/en-us/library/vstudio/hh279678%28v=vs.110%29.aspx – bstamour Feb 03 '14 at 13:24
  • If one is writing libraries (a) it is easier to write exception safe code, (b) one can expect that library writers are capable of writing exception-safe code. None of which is true for application development - it is much harder to write app in exception-safe manner and we can expect that 50% of application developers are simply incapable of writing exception safe code. – zzz777 Feb 03 '14 at 13:46
  • 8
    I highly doubt that half of all C++ programmers are incapable of using exceptions properly. But anyways if you think that your co-workers are incapable of writing modern C++ then by all means stay away from modern C++. – bstamour Feb 03 '14 at 14:06
  • Staying away is not an option for growing companies with shipping products - in some areas c++ is the only reasonable tool and there are only few developers in the world capable of reliably writing exception-safe code. That is why Google does not allow exceptions. – zzz777 Feb 03 '14 at 15:13
  • 4
    I said stay away from *modern* C++, not C++ in general. Modern C++ style is full of exceptions and other scary things, apparently. – bstamour Feb 03 '14 at 15:19
  • According to you 'modern' means so modern that is not worth being used. May be c++14 will allow to develop static analysis tools that are capable of dealing with exception safety - it will take a few years and then another 10 years to become a common practice. – zzz777 Feb 03 '14 at 15:47
  • 6
    @zzz777 Don't use exceptions? Have private constructors that are wrapper by public factory functions which catch the exceptions and do what - return a `nullptr`? return a "default" object (whatever that may mean)? You didn't solve anything - you have just hidden the problem under a carpet, and hope nobody finds out. – Mael Feb 22 '18 at 07:34
  • @Mael. Yes return nullptr or crash the box - these are only two practical options available to companies staffed with ordinary developers. – zzz777 Feb 22 '18 at 20:46
  • 5
    @zzz777 If you are going to crash the box anyway, why do you care if it happens from an exception or `signal(6)`? If you use an exception, the 50% of developers that know how to deal with them can write good code, and the rest can be carried by their peers. – IllusiveBrian Feb 22 '18 at 22:23
  • @Mael. Sorry did not have time. You are mixing crashing the box with crashing the process and there is a sea of difference in between. 99.9% of developers are mentally unfit to write exception safe code, and most people who feel that they can do it are simply overestimating their abilities by a wide margin.. – zzz777 Feb 28 '18 at 20:04
6

The other answers lack real world examples, so I will add one. One of the reasons why I (personally) try to avoid unsigned types.

Consider using standard size_t as an array index:

for (size_t i = 0; i < n; ++i)
    // do something here;

Ok, perfectly normal. Then, consider we decided to change the direction of the loop for some reason:

for (size_t i = n - 1; i >= 0; --i)
    // do something here;

And now it does not work. If we used int as an iterator, there would be no problem. I've seen such error twice in the past two years. Once it happened in production and was hard to debug.

Another reason for me are annoying warnings, which make you write something like this every time:

int n = 123;  // for some reason n is signed
...
for (size_t i = 0; i < size_t(n); ++i)

These are minor things, but they add up. I feel like the code is cleaner if only signed integers are used everywhere.

Edit: Sure, the examples look dumb, but I saw people making this mistake. If there's such an easy way to avoid it, why not use it?

When I compile the following piece of code with VS2015 or GCC I see no warnings with default warning settings (even with -Wall for GCC). You have to ask for -Wextra to get a warning about this in GCC. This is one of the reasons you should always compile with Wall and Wextra (and use static analyser), but in many real life projects people don't do that.

#include <vector>
#include <iostream>


void unsignedTest()
{
    std::vector<int> v{ 1, 2 };

    for (int i = v.size() - 1; i >= 0; --i)
        std::cout << v[i] << std::endl;

    for (size_t i = v.size() - 1; i >= 0; --i)
        std::cout << v[i] << std::endl;
}

int main()
{
    unsignedTest();
    return 0;
}
  • You can get it even more wrong with signed types... And your example-code is so brain-dead and glaringly wrong any decent compiler will warn if you ask for warnings. – Deduplicator Jun 13 '16 at 15:15
  • 2
    I have in the past resorted to such horrors as `for (size_t i = n - 1; i < n; --i)` to make it work right. – Simon B Jun 13 '16 at 15:55
  • @SimonB awesome ;) I will try to get this through code review next time, just for fun :D – Aleksei Petrenko Jun 13 '16 at 16:24
  • @Deduplicator see the updated sample in the code. Would be great if you could give less brain dead example on how one can get it "more wrong" with signed types. – Aleksei Petrenko Jun 13 '16 at 16:39
  • Personally, I am just trying not to be too smart and this is it. All my loops looks like for-each inline. If absolutely have to do something special I make it heavily commented, review it several times and make sure it got exercised during test. – zzz777 Jun 27 '16 at 20:50
  • UI coordinates calculation also involve subtractions and negative numbers and might be a better example. – rwong Oct 15 '16 at 19:25
  • 3
    Speaking of for-loops with `size_t` in reverse, there is a coding guideline in the style of `for (size_t revind = 0u; revind < n; ++revind) { size_t ind = n - 1u - revind; func(ind); }` – rwong Oct 15 '16 at 19:27
  • 2
    @rwong Omg, this is ugly. Why not just use `int`? :) – Aleksei Petrenko Feb 21 '18 at 16:56
  • @AlexeyPetrenko [That's what I said](https://softwareengineering.stackexchange.com/a/338107/620); my comment was simply saying that some coding guideline had shown examples for using `size_t` the ugly way. – rwong Feb 21 '18 at 19:45
  • Btw, in light of SPECTRE, my code example for for-loop in reverse would be very bad: even if `n` is zero (which means the loop body shall not be entered), speculative execution will proceed anyway, trying to execute the body of the inlined `func()` with value `n - 1u`. – rwong Feb 21 '18 at 19:48
  • 1
    @AlexeyPetrenko - note that neither the current C nor C++ standards guarantee that `int` is large enough to hold all valid values of `size_t`. Particularly, `int` may allow numbers only up to 2^15-1, and commonly does so on systems that have memory allocation limits of 2^16 (or in certain cases even higher). `long` may be a safer bet, although still not *guaranteed* to work. Only `size_t` is guaranteed to work on all platforms and in all cases. – Jules Feb 22 '18 at 09:48
  • There's `long long` and `int64_t`. Frankly, I think it's okay to use size_t for loop iteration with an index. You just need to be careful) – Aleksei Petrenko Feb 22 '18 at 09:58
  • In java Arrays.binarySearch returns either the non-negative index of the found key, or the negative complement of the index for inserting when not found. Admittedly a bizar usage. – Joop Eggen Feb 22 '18 at 15:36
  • Regardless of signed, unsigned, or direction of iteration, the valid range for a loop is the same (`begin <= i && i < end` assuming `begin <= end`). Unsigned iteration has the advantage that when `begin` is zero, then that check can be dropped. So a forward loop becomes `for( size_t i = 0; i < size; ++i )` and reverse loop becomes `for( size_t i = size - 1; i < size; --i )`. – Phernost Nov 06 '19 at 23:38
  • Arguable, but I feel like relying on overflow in such a simple case makes your code less elegant. – Aleksei Petrenko Nov 08 '19 at 05:49
6
for (size_t i = v.size() - 1; i >= 0; --i)
   std::cout << v[i] << std::endl;

The problem here is that you wrote the loop in an unclever manner leading to the erroneous behavior. The construction of the loop is like beginners get it taught for signed types (which is OK and correct) but it simply doesn't fit for unsigned values. But this cannot serve as a counter argument against using unsigned types, the task here is to simply get your loop right. And this can easily be fixed to reliably work for unsigned types like so:

for (size_t i = v.size(); i-- > 0; )
    std::cout << v[i] << std::endl;

This change simply reverts the sequence of the comparison and the decrement operation and is in my opinion the most effective, undisturbing, clean and shortes way to handle unsigned counters in backward loops. You would do the very same thing (intuitively) when using a while loop:

size_t i = v.size();
while (i > 0)
{
    --i;
    std::cout << v[i] << std::endl;
}

No underflow can occur, the case of an empty container is covered implicitely, as in the well known variant for the signed counter loop, and the body of the loop may stay unaltered in comparision to a signed counter or a forward loop. You just have to get acustomed to the at first somewhat strange looking loop construct. But after you've seen that a dozen times there's nothing unintelligible anymore.

I would be lucky if beginners courses would not only show the correct loop for signed but also for unsigned types. This would avoid a couple of errors that should IMHO be blamed to the unwitting developers instead of blaming the unsigned type.

HTH

Don Pedro
  • 161
  • 1
  • 2
1

Unsigned integers are there for a reason.

Consider, for example, handing data as individual bytes, e.g. in a network packet or a file buffer. You may occasionally encounter such beasts as 24-bit integers. Easily bit-shifted from three 8-bit unsigned integers, not so easy with 8-bit signed integers.

Or think about algorithms using character lookup tables. If a character is an 8-bit unsigned integer, you can index a lookup table by a character value. However, what do you do if the programming language doesn't support unsigned integers? You would have negative indexes to an array. Well, I guess you could use something like charval + 128 but that's just ugly.

Many file formats, in fact, use unsigned integers and if the application programming language doesn't support unsigned integers, that could be a problem.

Then consider TCP sequence numbers. If you write any TCP processing code, you will definitely want to use unsigned integers.

Sometimes, efficiency matters so much that you really need that extra bit of unsigned integers. Consider for example IoT devices that are shipped in millions. Lots of programming resources can then be justified to be spent on micro-optimizations.

I would argue that the justification to avoid using unsigned integer types (mixed sign arithmetic, mixed sign comparisons) can be overcome by a compiler with proper warnings. Such warnings are usually not enabled by default, but see e.g. -Wextra or separately -Wsign-compare (auto-enabled in C by -Wextra, although I don't think it's auto-enabled in C++) and -Wsign-conversion.

Nevertheless, if in doubt, use a signed type. Many times, it is a choice that works well. And do enable those compiler warnings!

juhist
  • 2,579
  • 10
  • 14
0

There are many cases where integers don't actually represent numbers, but for example a bit mask, an id, etc. Basically cases where adding 1 to an integer doesn't have any meaningful result. In those cases, use unsigned.

There are many cases where you do arithmetic with integers. In these cases, use signed integers, to avoid misbehaviour around zero. See plenty of examples with loops, where running a loop down to zero either uses very unintuitive code or is broken because of the use of unsigned numbers. There is the argument "but indices are never negative" - sure, but differences of indices for example are negative.

In the very rare case where indices exceed 2^31 but not 2^32, you don't use unsigned integers, you use 64 bit integers.

Finally, a nice trap: In a loop "for (i = 0; i < n; ++i) a [i] ... " if i is unsigned 32 bit, and memory exceeds 32 bit addresses, the compiler cannot optimise the access to a [i] by incrementing a pointer, because at i = 2^32 - 1 i wraps around. Even when n never gets that large. Using signed integers avoids this.

gnasher729
  • 42,090
  • 4
  • 59
  • 119
-6

Finally, I found a really good answer here: "Secure Programming Cookbook" by J.Viega and M.Messier (http://shop.oreilly.com/product/9780596003944.do)

Security issues with signed integers:

  1. If a function requires a positive parameter, it is easy to forget to check the lower range.
  2. Unintuitive bit pattern from negative integer size conversions.
  3. Unintuitive bit pattern produced by the right shift operation of a negative integer.

Update:

A. There are problems with signed<->unsigned conversions so it is not advisable to use a mix.

B. If you are using unsigned integers it is easy to check for overflow.

zzz777
  • 456
  • 1
  • 5
  • 11
  • 1
    Why is it a good answer? What is recipe 3.5? What does it say about integer overflow etc? – Baldrickk Sep 05 '18 at 12:51
  • In my practical experience It is very good book with valuable advice all other in aspects that I tried and it is pretty firm in this recommendation. Comparing to that dangers of integer overflows on arrays longer than 4G seem pretty weak. If I have to deal with arrays that big, my program will have a lot of fine tuning to avoid performance penalties. – zzz777 Sep 05 '18 at 12:59
  • 1
    it's not about whether the book is good. Your answer doesn't provide any justification for the use of the recipie, and not everyone will have a copy of the book to look it up. Look at the examples of how to write a good answer – Baldrickk Sep 05 '18 at 13:01
  • FYI just learned about another reason of using unsigned integers: one can easily detect overlow: https://www.youtube.com/watch?v=JhUxIVf1qok&list=PLcH4RogOL4rHSsCnDZtqCgPAfyKHjaRMm&index=70 – zzz777 Sep 26 '19 at 23:02