What is the point of repeatedly executing the same test?

Question

I have recently learned about the not-well-known and not-widely-used annotation @RepeatedTest that, as the name implies, repeats the very same test n-times. Baeldung provides a short guide to this feature, however; I have found nowhere to explain why this annotation exists in the first place.

I initially assumed the reason was to test a feature that involves some randomization. However, in that case, I would personally mock the randomness by extracting the hardcoded generation as a dependency and testing the boundaries and fixed values only to keep the tests predictable.

Of course, unless I write my own random number generator for which solely the test repeating feature was unlikely implemented at all.

I'm curious as to why the identical test should be run several times from the unit and integration testing perspective. Could you provide me with a few examples of when this feature is required?

It is often useful to test code with random, unpredictable tests mainly because bugs are usually not caused by something you can see and predict (which would be covered by your extraction technique) but by things you completely missed. It's not a guarantee to catch a bug but is just one of the many tools available to us. One of the BSD distros famously did an audit a few decades ago by piping random 8-bit stream into the stdin of standard unix utilities like cat and grep and found (and fixed) several crashing bugs. — slebetman, Dec 23 '21 at 15:06
@slebetman Exactly. [Fuzzing](https://en.wikipedia.org/wiki/Fuzzing) is a well-known technique used by security researchers to find way to exploit bad inputs to existing tools. They find a lot of bugs each year. IMHO having fuzzy testing run for every build doesn't really make much sense. *But* I believe that having a session of fuzzing maybe before releasing a milestone/new version would make a lot of sense. So your CI/CD pipeline doesn't have to stop for random test failures, but once in a while you still try to throw everything at it and see what happens. — Bakuriu, Dec 23 '21 at 15:15
Then again, tests should really focus on: 1) "standard" values (to verify that at least for the "normal inputs" the code works) 2) boundary conditions. You should really test one set of input for each "category", but in complex code these can easily become too many. Thus fuzzing once in a while will find all the actual bugs for you. — Bakuriu, Dec 23 '21 at 15:17
@Bakuriu If you have a CI/CD pipeline you really should not care about the time to run tests unless your project's tests take something like 12 hours to run. It's better to run the fuzzing tests on every CI test run because each time you run it there is no guarantee that the test will trigger a bug (it's probabilistic). Thus running it once per day will increase your chances of catching an issue. — slebetman, Dec 23 '21 at 15:28

score 41 · Answer 1 · answered Dec 22 '21 at 10:59

41

This is to allow you to detect the difference between passed, failed and flaky tests in a single test run.

A flaky test is one where it intermittently fails which can be due to a number of reason sometimes outside your control. Rather than run the build again and shrug your shoulders when it passes this time, you run tests multiple times and only count them as passing if there are no failures.

For example you might have "click a button on the webpage and check the div appears" test. but due to load on the box and load order of elements on the page there is an unknown delay before the div appears. In your test you have to choose a timeout.

Or the order in which the tests runs affects the test result. You cant find the exact cause of this rare failure but need to know which tests are affected by it

Or you may simply have an intermittent bug you are ignoring but want to avoid having to rerun your builds multiple times to get a pass.

answered Dec 22 '21 at 10:59

Ewan

70,664
5
76
161

11

In a perfect deterministic world this would never be needed. So in addition to the wonderful points you already made, this also clearly signals that the unit under test is not deterministic. That there is some magic left to be squeezed out of it. – candied_orange Dec 22 '21 at 14:52
1

@candied_orange This is my exact thought. In the case where a `
` should appear, I would rather do a periodical check within a test with a `while` loop and a timeout in case nothing was really loaded in a reasonable time - the test itself would be only executed once.
– Nikolas Charalambidis Dec 22 '21 at 16:04
1

@NikolasCharalambidis That's not what this does. This lets you run 1 test X times and get X reports of what happened. Do your own looping and you have to cram every run into one report. – candied_orange Dec 22 '21 at 16:11
2

The question isnt so much "what should do when the flaky test fails" but "what _do_ you do when the flaky test passes". Running it multiple times allows you to ignore it as flaky when it fails, but also not allow the build to pass when it passes – Ewan Dec 22 '21 at 16:14
@Ewan I'm reading between the lines here but I think by that last line you mean you can use the annotation to keep a flaky test from passing the build just because it got lucky once. – candied_orange Dec 22 '21 at 16:44
yeah, i mean say it only fails 1 in 10 times, you still want to fail every build – Ewan Dec 22 '21 at 17:15
14

you can argue the philsophy of whether you want to count flaky tests as passes or fails, but the practical matter is that you need to identify them as flaky before you can get to that decision – Ewan Dec 22 '21 at 17:17
I like this answer, except flip the logic. Sometimes you just need a test to pass once. One pass out of ten times is a success. I think this annotation is mostly useful in automating a user interface, where the nature of what you are testing causes lots of failures that are not application defects, but tests that are not stable. Some tests just will never be stable, due to the environment they are run in. – Greg Burghardt Dec 22 '21 at 17:44
I'm going to have to agree with Greg here. If there was a way to run it 10 times and clearly decide to pass or fail every time then just loop. This thing's job is to make this code be more than one test. Only really makes sense with a human in the loop making a judgement call. – candied_orange Dec 22 '21 at 18:17
@candied_orange : Maybe. I have a deterministic computation performed by an external process (which passes it through a third party's process to a fourth party's process containing a library that actually does the work). Repeatedly sending the same computation through ~900 times (min observed: 850, max observed: 1020) causes Cython faults. My process guards against this by only allowing 400 (varying) computations before switching to a new chain of external processes. I run a long test on every release of the third/fourth party's code, hoping for a fix to finally happen. – Eric Towers Dec 23 '21 at 19:43

Doc Brown · Accepted Answer · 2021-12-23T11:44:41.220

@Ewan posted a nice answer, but let me add this feature isn't only useful for potentially unstable or intermittently failing tests.

By using the RepetitionInfo argument, this can be used as a simple way of making parametrized tests, where the increasing loop index is the parameter which controls it. The loop index might be used directly as an input to the method under test, or indirectly by picking some input values from an array or other source, or calculating different input values. It may be possible to achieve similar functionality through JUnit's @ParameterizedTest feature, but I guess that wouldn't necessarily become simpler.

Another use case are automated tests with potential side effects (for example, a test which runs against some database, maybe a small local file-based database), where one wants to validate the idempotency of a certain function which makes modifications to the database. In the @BeforeAll method, the test database is initialized, then the test is repeated two or three times and validates that the outcome is still the same.

A third use case I can think of are performance tests or benchmarks. Though JUnit originally was most probably not intended to be used for this purpose, one can utilize it for this, since the total running time for all tests (and individual ones) is a standard result one gets from each run. Repeating the same test here several times might be used to measure the average running time of a certain process, for example.

The first example with a loop index sounds rather like a parametrized test. The second one is spot on! Validating idempotency requires at least two run. Thanks for the answer. — Nikolas Charalambidis, Dec 23 '21 at 07:33
@NikolasCharalambidis: the first example does not only *"sound rather like a parametrized test"*, it **is** is a form of implementing a parametrized test, that is exactly the point. Not sure if JUnit has a simpler alternative for providing the same functionality through these `@ParameterizedTest` attributes, — Doc Brown, Dec 23 '21 at 08:25
It is starting to make sense. Basically, the `@RepeatedTest` is the same as `@ParametrizedTest`, except it is solely up to you to differentiate each run. — Nikolas Charalambidis, Dec 23 '21 at 09:26
@NikolasCharalambidis: I came up with another use case, see my edit, — Doc Brown, Dec 23 '21 at 10:26

score 5 · Answer 3 · answered Dec 23 '21 at 23:40

I initially assumed the reason was to test a feature that involves some randomization.

Randomization or other non-determinism is indeed a very common reason to run a unit test multiple times.

However, in that case, I would personally mock the randomness by extracting the hardcoded generation as a dependency

This is indeed a good idea: if you don't, failures may be hard to reproduce. But it is often not possible. There are many circumstances where you have no control over the source of nondeterminism.

A very common case is concurrent programming, where the nondeterminism comes from the thread scheduler. If the test is multithreaded and has a race condition, running it multiple times increases the chance that the race will happen.

You may also want to run multiple instances of the same test concurrently.

It can also happen that the nondeterminism comes from a library or an external program that's out of your control.

Thank you for your contribution. I have completely forgotten about concurrent programming - this is a very valid point! — Nikolas Charalambidis, Dec 24 '21 at 10:31

What is the point of repeatedly executing the same test?

3 Answers3