0

I'm writing C++ code, where the standard library has an idiomatic type for representing sizes: std::size_t.

Now, I'm writing a function which counts certain kinds of objects; and these objects have indices, used as their id's, which start from 0, with some indices in the middle possibly skipped due to device issues - although in practice they are never missing. Say these objects's index type is idx_t. Also, to make this question non-trivial, id_t is typically different, smaller than than std::size_t. It is guaranteed that the number of objects isn't 2^sizeof(idx_t), i.e. there are idx_t values to spare.

My "philosophical" design question: Which type should a count_objects() function return:

  • a std::size_t, the standard size type, or
  • an idx_t, the objects' numeric index type?

Please argue in favor of your choice.

Notes:

  • The objects' index type is a constraint, and cannot be altered, e.g. due to interaction with a device driver or third-party library.
  • I believe this question isn't really C++ specific, but I wanted to make it less vague, so I didn't generalize much.
  • idx_t might be int, or unsigned, or short, or std::int16_t, or std::uint16_t, or std::uint32_t, or or std::int32_t etc.
einpoklum
  • 2,478
  • 1
  • 13
  • 30
  • auto difference = x.count() - y.count() will produce nonsense if idx_t is unsigned. Which makes this a trap to happen when you port the software, or upgrade your compiler. – gnasher729 Jan 15 '22 at 15:50
  • so, essentially this is like an ordered list? Where each ID of an element is it's position within the list? Conventions are there for a reason. If you follow convention, you ensure better interoperability and avoid potential issues due to type mismatches, etc. Particularly outside of your code – Berin Loritsch Jan 16 '22 at 13:38
  • @BerinLoritsch: Basically, yes, like an ordered list, but I don't want to commit to that and make the question ultra-specific. As for conventions - the thing is we have conflicting conventions: In the standard library, `std::size_t` is conventional; in the code for working with these objects, it's never used. – einpoklum Jan 16 '22 at 14:06

4 Answers4

5

If idx_t is an alias for a standard unsigned integral type, consider using it. But size_t is a perfectly fine default choice that you shouldn't avoid without reason.

That collections generally define sizes in terms of size_t is important for generality: by definition, a memory space can contain no more than SIZE_MAX contiguous objects/addresses. This means that size_t will always be large enough to report the size of any array of objects – or, for that matter, the size of any single object.

Depending on the context, size_t might not be a good choice for representing sizes though. In practice, many collections will be far smaller so that large parts of the representable space will be wasted. A size_t might also be larger than the platform's native integral type. With a segmented memory model, we also get the curious effect that size_t could be smaller than a pointer-size and that collections that are not array-like could contain more than size_t elements.

Thus, there are many reasons why you wouldn't want to use size_t in specific cases. Leaving aside those reasons that relate to exotic architectures (anything non 64-bit), the main reason to avoid size_t would be that it wastes memory. This can matter if you have many objects that contain a size_t field, although it will not generally matter for a size_t return value that is passed via a fixed-sized register. So whether this matters depends on the wider context of your program.

Indications that you shouldn't use idx_t include:

  • the type is not a primitive integer type
  • it is a signed integer type, and there can be more than 2CHAR_BIT * sizeof(idx_t) - 1 objects in the collection
  • weak indication: idx_t is equivalent to char, since this type is not treated as an integer in some contexts
amon
  • 132,749
  • 27
  • 279
  • 375
  • Added information to the question about `idx_t`. I did mean to imply it is an alias for a standard integer type - but not necessarily unsigned. Still, its _values_ are never non-negative. – einpoklum Jan 14 '22 at 23:19
1

You'll want to consider how the size() function is used. It's likely that template functions (such as those in std::views objects) may call size() and expect it to return an unsigned type. So returning a possibly-signed type could cause compilation to emit warnings you don't want.

If it's reasonable for your class to act as a standard container, then it's helpful (least surprise) to your users to complete the job - so declare a size_type as well as value_type in your container. That also makes changing your mind easier if you discover you've gone down the wrong path!

Personally, I would begin with

class MySet
{
public:
    using size_type = std::size();

    size_type count() const noexcept;
};

and stick with that unless/until you find some good reason to switch to a smaller type such as std::make_unsigned_t<idx_t>.

Toby Speight
  • 550
  • 3
  • 14
  • Can you give an example of `std::views` code will balk at a `MySet::size_type` which is signed? – einpoklum Jan 17 '22 at 19:05
  • Also, as you may be aware, there is no single return type for `std::size()` - it takes a container and returns its size type. – einpoklum Jan 17 '22 at 20:03
  • Not to mention the fact that we now have `std::ssize()` too... – einpoklum Jan 17 '22 at 20:04
  • No, I don't have an example available, and that kind of failure is more likely from third-party code than from well-reviewed Standard Library code, which really should be using the container's `size_type`. – Toby Speight Jan 17 '22 at 20:22
1

A variant on @gnasher729's suggestion:

  • Insist on signdness of the result type, to allow x.count() - y.count() to not produce junk.
    • ... and thus std::size_t goes out the window.
    • ... and perhaps idx_t too, unless you can rely on its sigendness.
  • Stick to being size_t-like

And consequently:

using ssize_t = std::make_signed_t<std::size_t>;

the signed size type we know on Unix systems from <sys/types.h>.

einpoklum
  • 2,478
  • 1
  • 13
  • 30
  • Although you've reached a different conclusion to mine, you've presented reasoning which can help others to make an informed choice, and I've upvoted for that. – Toby Speight Jan 17 '22 at 17:42
-1

I’ll just take a signed 64 bit integer. It’s large enough to count the bits on 10,000 large hard drives. Being signed means the difference of two counts is valid, avoiding stupid bugs with unsigned numbers. I haven’t written 32 but code in years, so I don’t worry about some wasted bits.

gnasher729
  • 42,090
  • 4
  • 59
  • 119
  • 1
    So, you're suggesting rejecting both alternatives I suggested? Why, considering they are both able to count the number of objects? – einpoklum Jan 15 '22 at 07:52
  • So you are downvoting me, not because there's something wrong with my answer, but because I don't agree with you? Seriously? – gnasher729 Jan 15 '22 at 15:47
  • 1
    I didn't downvote you, so I'm not sure who you were referring to... – einpoklum Jan 15 '22 at 19:14