How to test and benchmark mutex implementations

Question

As the title says: How do you properly test and benchmark different implementations of mutexes in c++?

Essentially I wrote my own std::mutex like class for a project running on a 2 core, armv7 with the aim to minimize the overhead in the uncontested case. Now I'm considering using said mutex in more places and also different architectures, but before I do this I'd like to make sure that

it is actually correct
there aren't any pathological cases in which it performs much worse than a standard std::mutex.

Obviously, I wrote a few basic unit tests and micro-benchmarks and everything seems to work, but in multi-threaded code "seems to work" doesn't give me great comfort.

So, are there any established static or dynamic analysis techniques?
What are common pitfalls when writing unit tests for mutex classes?
What are typical edge cases one should look out for (performance-wise)?

I'm only using standard library types for the implementation, which includes non-sequential-consistent load & store operations on atomics. However, I'm mainly interested in implementation agnostic advice, since I'd like to use the same test harness for other implementations, too.

I know it is not required but I'd appreciate if downvoters would comment on what the problem is with this question. I'm new to SE and not completely familar with thespecifics of this site. — MikeMB, Jun 18 '17 at 13:31
Not the down voter, but I will say that this site is particularly bad for anonymous down votes on perfectly good questions. It feels to me like a lot of people down vote based on what I'd refer to as "religious" reasons. That said, one possibility is that you're asking for recommendations on tools, which I believe is [frowned on](https://softwareengineering.stackexchange.com/help/on-topic) here. But that's just a guess. And many people have discussed such tools in other questions, so make of that what you will. — user1118321, Jun 18 '17 at 16:32
In fact, check out [this meta post](https://softwareengineering.meta.stackexchange.com/questions/8547/downvoting-because-we-dont-agree-with-askers-approach-or-logic?cb=1) titled "Downvoting because we don't agree with asker's approach or logic." — user1118321, Jun 18 '17 at 16:38
@user1118321: that meta post does not fit to this question since IMHO there is no flawed assumption in this question. However, two of the 3 close votes I currently see are using the predefined "third party resource request" close reason. MikeMB, you can try to edit your question and remove those parts from it, however in the current form, I guess the community could also close it for being too broad. If you narrow the focus of the question down and ask specificially *what* you want to test and what you tried so far, you may increase the chance of survival of your question. — Doc Brown, Jun 18 '17 at 17:50
One problem with this question is that "Questions asking us to find or recommend tools, libraries, programming languages, resources (including books, blogs, tutorials, and examples), or projects to undertake are off-topic here as they attract opinionated answers that won't have lasting value to others." — David Hammen, Jun 18 '17 at 17:51
Thanks everyone for the explanation. I modified my question and hope it is now in line with the site's requirements. — MikeMB, Jun 18 '17 at 18:54
Not enough for an answer but for testing correctness, building your whole application (if possible) with ThreadSanitizer might be valuable. (Unless your mutex implementation uses such low-level features that ThreadSanitizer won't understand them.) — 5gon12eder, Jun 19 '17 at 00:39
@5gon12eder: That is good advice, I added a comment about the implementation. Do you know if TS is platform aware or not (does it detect races according to the c++ standard, even if they aren't on the current platform (e.g. due to strong memory ordering model on x86)? — MikeMB, Jun 19 '17 at 08:32
@MikeMB Hi, did you figure this out yet? I need a benchmark for my mutex implementations too. If not, maybe we can cooperate? — Carlo Wood, May 02 '18 at 16:22

score 1 · Answer 1 · answered Jan 16 '18 at 12:03

The issue is complex:

Some sources of complexity include:

How many context switches are taking place: This is very important depending on the platform these tests run on. Some platforms handle this better than others
Are the functions that the mutexes are tested in inlined or not. i.e. Does the mutex perform well only in well optimized or optimizable code.
Are these mutexes designed for cache locality. Will cache misses significantly reduce performance or cause more context switches. before and after the mutex is entered.
Will the mutex itself cause loss of cache locality. i.e. is mutex state data dynamically allocated.
Will these mutexes perform well where context switches are contained within the mutex. i.e. io, malloc etc.
Will the mutex perform well were kernel time is contained in the mutex.i.e. dynamic memory allocation and dealocation.
Does the performance hold when running within VM's
Is the destruction or construction of the mutex expensive i.e. state data is located in dynamic memory

Not sure if I agree with the construction/destruction part. If a program is creating and destroying your mutexes all the time, there is (imho) something wrong with the design design. But otherwise thank you for the pointers. — MikeMB, Jan 16 '18 at 14:59

score -1 · Answer 2 · answered Nov 27 '17 at 11:34

-1

Your idea is very interesting: a compliance benchmark against which a mutex implementation could be tested against.

Unfortunately, as far as I could see, there's no widely known compliance benchmark for mutex implementations. So, I guess you have in your hands the very interesting problem of creating a proposal for such compliance benchmark.

And, since you've been involved in the creation of a benchmark implementation, you're the guy.

If you allow me a suggestion, maybe you could start this research with the POSIX standard for threads in one side, and some study of the theoretical literature of concurrent processing, like CSP, or Communicating Sequential Processes. This kind of papers usually deals with the classical concurrent problems, like the Dining Philosophers.

An implementation of them could be an interesting part of your compliance benchmark, I guess.

answered Nov 27 '17 at 11:34

Hilton Fernandes

129

3

I didn't downvote you, but this doesn't seem to answer any of my questions. – MikeMB Nov 27 '17 at 18:14
Thanks for not downvoting. And sorry for not answering your questions. Would you mind if I ask you if you are considering to create a compliance benchmark for mutexes ? – Hilton Fernandes Nov 28 '17 at 00:24
Unlikely. And even if, I the only standard I care about is the c++ standard (although that might be the same as posix with regards to mutexes) – MikeMB Nov 28 '17 at 06:47
To qualify my previous statement: If I should come up with a good test-suite for my own mutex I'll most likely make it open source, but I very much doubt that it will have the quality or be complete enough to become an actual "compliance" benchmark - that might be something that is better handled by static analysis anyway. – MikeMB Nov 28 '17 at 10:10
I agree with you that there's not a good test suite for mutex primitives. I suppose it should come from three distinct sources: the theory of concurrent processing, the specification of a POSIX mutex, and concurrent algorithms expressed using mutexes. Do you agree with that ? – Hilton Fernandes Nov 28 '17 at 12:23
As I said before, the only standard that is relevant to me is the ISO c++ standard (and we can debate which revision) if that refers to any of the other specifications, then they bocome important too, otherwise: NO I don't care what POSIX or some research papers say about mutex semantics. – MikeMB Nov 28 '17 at 14:13

How to test and benchmark mutex implementations

2 Answers2