31

I am a C++ programmer with limited experience.

Supposing I want to use an STL map to store and manipulate some data, I would like to know if there is any meaningful difference (also in performance) between those 2 data structure approaches:

Choice 1:
    map<int, pair<string, bool> >

Choice 2:
    struct Ente {
        string name;
        bool flag;
    }
    map<int, Ente>

Specifically, is there any overhead using a struct instead of a simple pair?

Philip Kendall
  • 22,899
  • 9
  • 58
  • 61
Marco Stramezzi
  • 665
  • 2
  • 6
  • 16
  • 18
    A `std::pair` *is* a struct. – Caleth Mar 24 '17 at 11:26
  • Possible duplicate of [Is micro-optimisation important when coding?](http://softwareengineering.stackexchange.com/questions/99445/is-micro-optimisation-important-when-coding) – gnat Mar 24 '17 at 13:20
  • 3
    @gnat: General questions like that are seldom suitable dupe targets for specific questions like this one, especially if the specific answer doesn't exist on the dupe target (which is unlikely in this case). – Robert Harvey Mar 24 '17 at 15:21
  • 18
    @Caleth - `std::pair` is a **template**. `std::pair` is a struct. – Pete Becker Mar 24 '17 at 15:21
  • 1
    Don't forget, too, that if you use them as keys you'll need a custom comparison function and for std::pair it's easier to create a generic templated one. – JAB Mar 24 '17 at 16:11
  • 6
    `pair` is entirely devoid of semantics. Nobody reading your code (including you in the future) will know that `e.first` is the name of something unless you explicitly point it out. I am a firm believer in that `pair` was a very poor and lazy addition to `std`, and that when it was conceived nobody thought "but some day, everybody is going to use this for *everything* that is two things, and nobody will know what anybody's code means". – Jason C Mar 24 '17 at 16:45
  • 1
    @JasonC I agree, but there are always exceptions, just like "use meaningful variable names" has the exception of loop counters. –  Mar 24 '17 at 16:48
  • 2
    @Snowman Oh, definitely. Still, it's too bad things like `map` iterators aren't valid exceptions. ("first" = key and "second" = value... really, `std`? Really?) – Jason C Mar 24 '17 at 16:52
  • @JasonC, it is possible to `std::tie()` them or use structured bindings as of C++17. The first version will be slightly verbose though. – Incomputable Mar 24 '17 at 21:25
  • 1
    @JasonC, I think the general agreement that I've seen regarding tuples (in any language) is that they shouldn't *typically* be exposed publicly, but they are very handy in internal code. Especially in languages that have shorthand for constructing tuples. They're useful for some cases, though, where the semantics are obvious (or where a tuple is the exact semantic). Consider generic interfaces. An example is Numpy's `shape` field, which returns the size of all dimensions of a matrix. That's naturally a tuple. – Kat Mar 28 '17 at 18:32
  • @Kat Definitely. Then again, the Numpy example, say, is a great example of where a tuple makes sense... until you start mixing other semantically different tuples into the same code. Then it becomes a clarity / maintenance / error-risk issue. On the other hand the same could be said about arrays of primitives, so this isn't really a strong argument, but still worth mentioning. There are lots of situations when `pair`, etc. is appropriate, but there are *way* more times when it's not, yet the temptation to use it is great. C++'s design in general throws implicitly enforced sanity to the wind. – Jason C Mar 28 '17 at 18:39

3 Answers3

37

Choice 1 is ok for small "used only once" things. Essentially std::pair is still a struct. As stated by this comment choice 1 will lead to really ugly code somewhere down the rabbit hole like thing.second->first.second->second and no one really wants to decipher that.

Choice 2 is better for everything else, because it is easier to read what the meaning of the things in the map are. It is also more flexible if you want to change the data (for example when Ente suddenly needs another flag). Performance should not be an issue here.

risingDarkness
  • 489
  • 5
  • 12
18

Performance:

It depends.

In your particular case there will be no performance difference because the two will be similarly laid out in memory.

In a very specific case (if you were using an empty struct as one of the data members) then the std::pair<> could potentially make use of Empty Base Optimization (EBO) and have a lower size than the struct equivalent. And lower size generally means higher performance:

struct Empty {};
struct Thing { std::string name; Empty e; };

int main() {
    std::cout << sizeof(std::string) << "\n";
    std::cout << sizeof(std::tuple<std::string, Empty>) << "\n";
    std::cout << sizeof(std::pair<std::string, Empty>) << "\n";
    std::cout << sizeof(Thing) << "\n";
}

Prints: 32, 32, 40, 40 on ideone.

Note: I am not aware of any implementation who actually uses the EBO trick for regular pairs, however it is generally used for tuples.


Readability:

Apart from micro-optimizations, however, a named structure is more ergonomic.

I mean, map[k].first is not that bad while get<0>(map[k]) is barely intelligible. Contrast with map[k].name which immediately indicates what we are reading from.

It's all the more important when the types are convertible to one another, since swapping them inadvertently becomes a real concern.

You might also want to read about Structural vs Nominal Typing. Ente is a specific type that can only be operated on by things that expect Ente, anything that can operate on std::pair<std::string, bool> can operate on them... even when the std::string or bool does not contain what they expect, because std::pair has no semantics associated with it.


Maintenance:

In terms of maintenance, pair is the worst. You cannot add a field.

tuple fairs better in that regard, as long as you append the new field all existing fields are still accessed by the same index. Which is as inscrutable as before but at least you don't need to go and update them.

struct is the clear winner. You can add fields wherever you feel like it.


In conclusion:

  • pair is the worst of both worlds,
  • tuple may have a slight edge in a very specific case (empty type),
  • use struct.

Note: if you use getters, then you can use the empty base trick yourself without the clients having to know about it as in struct Thing: Empty { std::string name; }; which is why Encapsulation is the next topic you should concern yourself with.

Matthieu M.
  • 14,567
  • 4
  • 44
  • 65
  • 3
    You cannot use EBO for pairs, if you are following the Standard. Elements of pair are stored in _members_ `first` and `second`, there is no place for Empty _Base_ Optimisation to kick in. – Revolver_Ocelot Mar 24 '17 at 15:47
  • 2
    @Revolver_Ocelot: Well, you cannot *write* a C++ `pair` that would use EBO, but a compiler could provide a built-in. Since those are supposed to be members, however, it may be observable (checking their addresses, for example) in which case it would not be conforming. – Matthieu M. Mar 24 '17 at 15:52
  • 1
    C++20 adds `[[no_unique_address]]`, which enables the equivalent of EBO for members. – underscore_d Jan 01 '19 at 21:45
3

Pair shines the most when used as the return type of a function together with destructured assignment using std::tie and C++17's structured binding. Using std::tie:

struct Ente {/*...*/};
std::map<int, Ente> map;
auto inserted_position = map.end();
auto was_inserted = false;
std::tie(inserted_position, was_inserted) = map.emplace(1, Ente{});
if (!was_inserted) {
    //handle insertion error
}

Using C++17's structured binding:

struct Ente {/*...*/};
std::map<int, Ente> map;
auto [inserted_position, was_inserted] = map.emplace(1, Ente{});
if (!was_inserted) {
    //handle insertion error
}

A bad example of usage of a std::pair (or tuple) would be something like this:

using player_data = std::tuple<std::string, uint64_t, double>;
player_data player{};
/* ... */
auto health = std::get<2>(player);
/* ... */

because it is not clear when calling std::get<2>(player_data) what is stored at position index 2. Remember readability and making it obvious for the reader what the code is doing is important. Consider that this is much more readable:

struct player_data
{
    std::string name;
    uint64_t player_id;
    double current_health;
};
player_data player{};
/* ... */
auto health = player.current_health;
/* ... */

In general you should think about std::pair and std::tuple as ways to return more than 1 object from a function. The rule of thumb that I use (and have seen many others use as well) is that objects returned in a std::tuple or std::pair are only "related" within the context of making a call to a function that returns them or in the context of data structure that links them together (e.g. std::map uses std::pair for its storage type). If the relationship exists elsewhere in your code you should use a struct.

Related sections of the Core Guidelines: