3

In C# (and other languages), we can define a numerical variable as a short, an int, or a long (among other types), mostly depending on how big we expect the numbers to get. Many mathematical operations (e.g., addition +) evaluate to an integer (int), and so require an explicit cast to store the result in a short, even when operating on two shorts. It is much easier (and arguably more readable) to simply use ints, even if we never expect the numbers to exceed the storage capacity of a short. Indeed, I'm sure most of us write for loops using int counters, rather than short counters even when a short would suffice.

One could argue that using an int is simply future-proofing, but there are certainly cases where we know the short will be big enough.

Is there a practical benefit to using these smaller datatypes, that compensates for the additional casting needed and decrease in readability? Or is it more practical to just use int everywhere, even when we are sure the values will never exceed the capacity of short (e.g., the number of axes on a graph)? Or is the benefit only realized when we absolutely need the space or performance of those smaller datatypes?

Edit to address dupes:

This is close, but is too broad - and speaks more to CPU-performance than memory-performance (though there are a lot of parallels).

This is close, too, but doesn't get to the practical aspect of my question. Yes, there are times when using a short is appropriate, which that question does a good job of illuminating. But, appropriate is not always practical, and increases in performance (if any) may not actually be realized.

mmathis
  • 5,398
  • 23
  • 33
  • Possible duplicate of [Is micro-optimisation important when coding?](https://softwareengineering.stackexchange.com/questions/99445/is-micro-optimisation-important-when-coding) – gnat Mar 22 '18 at 21:54
  • 6
    @gnat: That question doesn't apply to every instance of every question where someone is trying to decide whether to use a foo or a bar. – Robert Harvey Mar 22 '18 at 22:34
  • Related: https://stackoverflow.com/q/1097467 – Robert Harvey Mar 22 '18 at 22:35
  • 5
    @gnat Here's a better duplicate: [Which are the cases when 'uint' and 'short' datatypes are a better fit than the standard int(32)?](https://softwareengineering.stackexchange.com/questions/264464/which-are-the-cases-when-uint-and-short-datatypes-are-a-better-fit-than-the) – Robert Harvey Mar 22 '18 at 22:36
  • Depending on the language, it may make your code more readable, in that the intent is more clear. However, most languages have higher level data types than primitives, which you use to replace primitives in API calls (think enums, limited value types). It is more of an internal optimization, for example, when you need to reduce memory usage. – Frank Hileman Mar 23 '18 at 00:26
  • Many people spend large amounts of time worrying about cache sizes, and if data accessed in CPU intensive algorithms will be contained in those caches. Data sizes can have a huge effect on performance, and on the number of operations or users the machine can support simultaneously. It is impossible to create a blanket answer, because the average instance on the heap consumes an infinitesimal amount of memory. – Frank Hileman Mar 24 '18 at 21:56

5 Answers5

6

Regarding the primitive types when we feel we have to use them:

Local variables generally perform at the same speed or just slightly worse when using smaller data types, so this is of no to negative value in the context of statements — i.e. for, while, and associated counters & loop variables as well as other locals and even parameters. Further, you should try to use the native int size that is the same size as pointers or else you risk numeric overflow when iterating over collections of arbitrary size.

The only place to even consider concerning yourself with smaller primitive sizes is in data structures that have very high object volumes. Even there, most languages have a minimum object size that has to do with alignment in memory allocation. So using a single short vs. a single int may buy you nothing (depending on the language implementation) because the unused space simply goes to alignment-oriented padding, whereas using two shorts can often save space over two ints because the shorts can be packed together.

Erik Eidt
  • 33,282
  • 5
  • 57
  • 91
  • 1
    Seconded. While larger types actually have relevance (e.g., better precision or the need to store larger numbers), smaller types alone aren't an actual advantage. However, when creating native arrays of such types, the story is quite different (and I'm talking about a *real* difference, with objects several hundreds of MB in size). So, TL;DR, don't over-optimize like that when dealing with individual variables, but consider it seriously when declaring arrays or complex types (e.g., structs). – Jesus Alonso Abad Mar 22 '18 at 23:05
  • You can get overflow with any integer type; they are equally unsafe. – Frank Hileman Mar 23 '18 at 00:38
  • @FrankHileman, of course, good point! Let me clarify for others that the recommendation for `int` of same size as pointer is for pointer arithmetic and indexing into arrays. As long as you know the object exists or the array elements exist such pointer arithmetic and indexing is safe from overflow (generally speaking, modulo some signed vs. unsigned issues on 16-bit machines and potentially on 32-bit operating systems with unusual address space layout). – Erik Eidt Mar 23 '18 at 00:46
  • Perhaps we are talking about C/C++? Pointer arithmetic could be a+b, which, generally is unsafe regardless of the number of bits representing a or b. Indexing, would be unsafe in any language... My point mainly was you need explicit checks for these things; relying on the number of bits in an integer is not safe, unless the wraparound is an expected part of the behavior: for example, using an 8 bit integer (byte) and wanting it to wrap. – Frank Hileman Mar 23 '18 at 01:02
  • +1 for a perfect example of when this is a practical matter - single-value properties vs arrays – mmathis Mar 23 '18 at 01:20
  • Indexing is safe if you know you're accessing an array that exists and you only access elements of that array that exist if you use the same size bits for indexes as for pointers (on small machines the indexes have to be unsigned). – Erik Eidt Mar 23 '18 at 01:30
  • @mmathis this is why your question is downvoted (note: I'm not the downvoter): you are asking a question that has been answered many times before. My recommended answer is [Sunsetquest's answer on Stack Overflow](https://stackoverflow.com/a/31899978): *"If the variable is used in an array of roughly ... as long as it makes sense."* – rwong Mar 23 '18 at 02:20
  • If "you only access elements of that array that exist" means you are checking the range of the index; the number of bits in the index is irrelevant. I understand your argument: when we choose an index representation we must consider the maximum number of array elements that may possibly exist. This is to be able to access those elements at all. The number of bits choice does nothing to ensure safety or accessing correctness. – Frank Hileman Mar 24 '18 at 21:52
  • @FrankHileman, (1) There are many algorithms & data structure, like sorting, like priority queue/heap, etc, that work in C and Fortran by being correct; *if the algorithm is correct there is no need to range check indexing operations*. (2) Index size will never exceed pointer size (i.e. address space), so there is a simple limit to the size of index needed. For example, we would never need a 64-bit index in a 32-bit address space. (Of course, it is important to check that your allocations succeed, but if they do you don't have to worry about using an int the size of the address space.) – Erik Eidt Mar 25 '18 at 00:18
3

Erik Eidt provided the primary case where small data types should be used--big arrays of them.

However, there is one other case where it's worthwhile--when you a few megabytes of them in a data structure that will be accessed very heavily. The issue here is the CPU cache--there is a considerable benefit from keeping your working data in cache rather than having to go to main memory. The times when this is relevant are low.

Loren Pechtel
  • 3,371
  • 24
  • 19
2

There are a lot of file formats that actually use shorts and other types of small data types. If you need to read a field defined as a short, you don't want to read an int, because that would give you data that don't belong to this field.

2

There's unlikely to be any benefit unless you can predict tens of millions of bytes "saved". ie; big arrays, or multiple shortened items are packed into structures for which millions of copies will co-exist. Note that just declaring a bunch of byte variables in a structure won't guarantee that they'll be packed - compilers and languages vary how they handle this.

ddyer
  • 4,060
  • 15
  • 18
-1

Of course. Doubling the size of your data types may double the amount of RAM needed, which may make your program unusable. It all depends on how many instances you need in RAM.

Sometimes it is not an optimization, but rather a signal to the API user that a value must be in a certain range.

Frank Hileman
  • 3,922
  • 16
  • 18
  • 1
    1st paragraph is a mix of wrong and "even if it were right it probably doesn't matter.". The 2nd paragraph is a good insight. – user949300 Mar 22 '18 at 23:30
  • 1
    You have never run out of memory? Some people find it annoying or, deadly, depending on the circumstance. – Frank Hileman Mar 23 '18 at 00:23
  • 1
    Since I have coded since there were 8080s and 6502s, yes, I have run out of memory. Changing every int to a long will never "double" the RAM needed, unless you have zero code and no overhead for object instances. It is worth worrying about data types for huge **arrays,** which you fail to explicitly mention. If _object instances alone_ (without large arrays) are running you out of memory, you are probably doing something wrong. – user949300 Mar 23 '18 at 05:32
  • It just depends on the type of application. For the types I work on, instance sizes are typically important -- but only for a few types of instances. While there is overhead for instances, instances themselves usually use most of the space in the heap. This is not only a problem for arrays. "Data type" in my answer does not refer only to primitives but the types containing other types. – Frank Hileman Mar 24 '18 at 21:38
  • @user949300 I enjoyed working in 6502 assembly (apple II) as a teenager. – Frank Hileman Mar 24 '18 at 22:02