When do micro-benchmarks make sense?

Question

I've been researching micro-benchmarks (e.g. using JMH or Caliper, though not necessarily limited to Java). Now, there are many articles which describe how to write a benchmark but I haven't found much on when to write a benchmark (and when not to). There were a few things that go in that direction, e.g.

Continuous delivery, continuous testing… is one thing, but continuous measuring is another step.

or

Always Measure

Before you start optimizing, make sure you have a problem that you need to solve. Make sure you can accurately measure your existing performance, or you won't be able to measure the benefit of the alternatives you try.

But that's only of limited value to me.

So my question is: In which scenarios do micro-benchmarks make sense? When don't they? What should and shouldn't they be used for? And does anyone here have real life experience with micro-benchmarking in a large project?

It's not clear to me what, if anything, your bullets have to do with *micro*-benchmarking, specifically. Running a profiler to find the hot spots is always an option. — Robert Harvey, Jun 24 '15 at 19:44
@RobertHarvey I guess all of the **measure** bullets would be as valid for a macro-benchmark as for a micro-benchmark, though on a different scale of course. The **don't measure** bullets are a bit more specific to micro-benchmarks since macro-benchmarking a network application can make sense if the outside dependencies have an average time much lower than the total time you're measuring. And the time limit is of course very micro-benchmark specific. — blalasaadri, Jun 24 '15 at 19:49
@RobertHarvey Oh, and running a profiler is great to find hot spots of course. I'd see micro-benchmarks as a tool to compare various ways of fixing those hot spots. Not as replacements for a profiler. — blalasaadri, Jun 24 '15 at 19:53
related: [Is running “milli”-benchmarks a good idea?](http://programmers.stackexchange.com/questions/154207/is-running-milli-benchmarks-a-good-idea) — gnat, Jun 24 '15 at 19:56
This bullet point, "Don't measure ... *if the operation will take longer than a few milliseconds to complete*," appears contradictory to the bullet point right above it, "*if the measured piece of code won't be called very often (or if you're unsure) unless your application is REALLY time sensitive*". Any clarifications? — rwong, Jun 24 '15 at 19:58
@rwong The first of those is related to http://programmers.stackexchange.com/questions/154207/is-running-milli-benchmarks-a-good-idea in that microbenchmarks are designed to measure code snippets that run only a very short time - the longer a piece of code takes the less those pitfalls that microbenchmark frameworks try so hard to avoid will matter. The second point is basically saying "If you hardly ever run that piece of code, don't optimize it. It won't make any difference." Does that clear things up? — blalasaadri, Jun 24 '15 at 20:02
@blalasaadri It doesn't mean longer operations aren't worth measuring. It simply means (1) organize your microbenchmarks by orders-of-magnitude, i.e. run your milliseconds-level tests in one suite, and seconds-level in another suite, and long-running tests in yet another. (2) Microbenchmarks need to implement a timeout mechanism, because it is possible that a coding error will make a function that used to take milliseconds now take infinitely forever. This timeout mechanism can be brutal, e.g. by killing the testing process. — rwong, Jun 24 '15 at 20:10
@rwong I'm not saying that long operations aren't worth measuring. I just have the impression that micro-benchmarks are not the right tool to do so. Also, the way I see micro-benchmarks is as a tool which is used to analyze specific problems and evaluate possible solutions for these. Therefore, unlike unit tests, I wouldn't run them regularly. Is that view to limited? — blalasaadri, Jun 24 '15 at 20:17
@blalasaadri I agree with your viewpoint. Each project has its own optimal choices. — rwong, Jun 24 '15 at 20:19
Seems you already have your answer. I suggest you cut that parts from the question, post it as an answer and wait if you get up- or downvotes. — Doc Brown, Jun 25 '15 at 15:14

score 4 · Answer 1 · edited Apr 12 '17 at 07:31

Micro-Benchmarks can be used in addition to a profiler to find and fix problematic parts of your code. So, you can run a profiler to identify the problematic regions and then test those specific regions with micro-benchmarks.

Generally speaking, micro-benchmark frameworks (such as Caliper and JMH) are designed to prevent many of the common mistakes and make it easier to create meaningful benchmarks (as is described in the answer to this question). These are in many cases problems which don't occur in the same form when you're testing larger pieces of code; therefore it makes sense to keep micro-benchmarks small. (That's why they're called micro-benchmarks after all.)

Some things I've come up with are the following.

Measure:

if you've found a bottleneck and you may have a better solution
if you're comparing several technologies which can do the same job (e.g. xml parsers) and want to know which one is fastest before using them in your real code
if you have nothing else to improve about your code

Don't measure:

if the code has time wise unreliable outside dependencies (e.g. it talks over a network)
if the measured piece of code won't be called very often (or if you're unsure) unless your application is REALLY time sensitive. (= If you rarely run a piece of code, don't try optimize it. It probably won't be worth the time.)
if the operation will take longer than a few milliseconds to complete
just because your boss, manager or someone else who is in charge thinks that it's a good idea to do so

Also, while some people run micro-benchmarks regularly (as you would run unit tests), since I use them only to analyze problems I already know about (thanks to the profiler), these benchmarks are not something I'll need to run very often. There may be situations in which this would make sense (and if you know of one, please comment!) but that would probably be a totally different way of using such tools.

When do micro-benchmarks make sense?

Always Measure

1 Answers1

Measure:

Don't measure: