How to optimize reusing a large std::unordered_map as a temporary in a frequently called function?

Question

Simplified question with a working example: I want to reuse a std::unordered_map (let's call it umap) multiple times, similar to the following dummy code (which does not do anything meaningful). How can I make this code run faster?

#include <iostream>
#include <unordered_map>
#include <time.h>

unsigned size = 1000000;

void foo(){
    std::unordered_map<int, double> umap;
    umap.reserve(size);
    for (int i = 0; i < size; i++) {
        // in my real program: umap gets filled with meaningful data here
        umap.emplace(i, i * 0.1);
    }
    // ... some code here which does something meaningful with umap
}

int main() {

    clock_t t = clock();

    for(int i = 0; i < 50; i++){
        foo();
    }

    t = clock() - t;
    printf ("%f s\n",((float)t)/CLOCKS_PER_SEC);

    return 0;
}

In my original code, I want to store matrix entries in umap. In each call to foo, the key values start from 0 up to N, and N can be different in each call to foo, but there is an upper limit of 10M for indices. Also, values can be different (contrary to the dummy code here which is always i*0.1).

I tried to make umap a non-local variable, for avoiding the repeated memory allocation of umap.reserve() in each call. This requires to call umap.clear() at the end of foo, but that turned out to be actually slower than using a local variable (I measured it).

What is your goal? Please provide more context. In particular its unclear from your question if map could be reused. — Basilevs, Jan 24 '19 at 03:43
But you show us no code. Is there anything to speed up? I suggest to at least explain an algorithm you are trying to implement. — Basilevs, Jan 24 '19 at 03:56
What are the types of your keys and values? Clearing or removing N items will take O(N) if your keys and/or values are non-trivial, which requires doing cleanup on N items. When there are millions of items the cleanup time will add up. — rwong, Jan 24 '19 at 05:12
@rwong I added a piece of code with the type of keys and values. That's true. The other implementation that I explained, sparsepp, takes less than 4 times, in this exmple, to clear(). I was hoping that there is something better I could try to get better performance. — Abaris, Jan 24 '19 at 06:52
@DocBrown Since my functions are big, I tried to simplify. In the part "// piece of code to use umap ...", umap is being used to update the parameters in the class that foo() belongs to. — Abaris, Jan 24 '19 at 07:11
@DocBrown Exactly, I added Edit2 about that. umap is used to store entries of a sparse matrix, so keys are for storing indices in 1 dimension (which is actually of type unsigned int in my code) and they range from 0 up to let's say 10M. — Abaris, Jan 24 '19 at 07:59
@Abaris: question is better now, but you actually should have left the comments in about the real intentions of that code, otherwise it is hard to understand what you are after. I took the freedom to add some, please double check if I got it right. Moreover, I think you should not have stripped the original idea of making `umap` a non-local variable out of the question, and your observations about the `clean` function. — Doc Brown, Jan 25 '19 at 07:19
@DocBrown I appreciate it, Doc. From this post, I've also learned how to explain my coding questions in a better way. — Abaris, Jan 25 '19 at 07:46

Doc Brown · Accepted Answer · 2019-01-25T07:28:25.390

This depends heavily on how umap is used inside, but

if you have a fixed known upper limit for the size,
the umap items are really just pairs of ints and doubles (or of similar simplicity)
the access pattern for umap is really that simple (just adding, content retrieving, and clearing "all at once")
you have perfomance requirements which justify this effort

then you could implement your own simplified hash table, based on an array or vector of int/double pairs of large-enough size. That is actually not too hard, and it will give you much better control over the memory allocation strategy (like "allocate all at once, "free all at once"), something you don't have when you pick a general-purpose container.

An unordered map from the standard lib is intended to serve multiple purposes (like deleting arbitrary elements again, or not allocating too much memory beforehand). That is why its internal implementation might be more complex than necessary for your specific case. In such a situation sometimes hand-optimized data structures can be the faster choice. However, they come for the price of increased implementation and debugging effort, and they are typically less extensible. So better check carefully if you really have those performance requirements which are worth the additional effort.

How to optimize reusing a large std::unordered_map as a temporary in a frequently called function?

1 Answers1