What are the reasons not to use random values in unittests?

Question

I've read the questions regarding the use of random values in unit-tests and, well, I still don't quite understand what the argument against random values is.

I'm trying to understand because I've had an entire course dedicated to testing while I was doing my Masters in SE. The course focused heavily on random value generation, set theory and Hoare Logic. Maybe those ideas were without merit but I don't see why to be quite frank.

Why using random values is a good idea

The reasoning during the course was as follows.

If I have a Unit Under Test (UUT) which has a string parameter in its signature then, the UUT's post-condition predicate should remain true when I pass any element from the set of all strings.

Hence, I should be able to pass the UUT a random string value. I.e. a randomly selected element from the set of all strings.

If the UUT has a precondition predicate for the string parameter then, the UUT's post-condition predicate should remain true when I pass any element from the set of all strings for which the precondition predicate is true.

Hence, I should be able to pass the UUT a random string value for which the precondition predicate is true. I.e. a randomly selected element from the set of all strings for which the precondition predicate is true.

So what is the argument against random values?

Following this logic, I do not understand what the argument is against using random values in unit-tests. If my UUT has a parameter then, its value should be expected to vary. Parameters are variables after all.

If the parameter value should not be expected to vary then, the parameter can be removed because it's a constant.

If I only test a UUT with a constant then, what am I really testing except for a single value from an ocean of valid values?

A unit-test should be written in such a way that it can never fail because you've used a random value. If it does fail then, it does not do so because of the random value. Unless, the random value generator no longer generates values that match the precondition predicate or, it never has.

The only argument I could make against using random values is that people do not understand how the values are being generated. Hence, they find it "hard to debug". However, this is not an argument against random values in unit-tests. It is a reason to improve people's understanding.

Note that I am disregarding the "repeatability" of unit-tests on purpose. It is a given that a proper seed is to be used.

Thanks for any responses in advance.

This is a confusion about words. Input from a source *called* "random", but actually pseudo-random because it's controlled by a seed, is not actually random, but deterministic. Therefore, none of the arguments commonly made about randomness in testing apply. — Kilian Foth, Jan 06 '22 at 13:30
Why would you want to use random values for an *automated test*? Either all possible values should be tested, or a subset chosen for some rational basis. If random values are acceptable, then why not just execute the test coverage partially and randomly too? One day it might pass, another it might fail, who can know!? — Steve, Jan 06 '22 at 13:31
"Random" means different things to different people, and you're getting caught up on this ambiguity. As a simple example, if I were to tell you my surname, that'd be just a random name to you. But to me, that name is not random. It is my father's surname, and it was always set in stone that I (as the child of my parents) would have that surname. In your question, you are using "random" to mean "any of these, without further selection between them". However, what you missed is that if you have a bug in which **some** of the valid values cause errors, the test would yield inconsistent results. — Flater, Jan 06 '22 at 13:42
[..] In other words, your test suite should be consistent in what it tests so that you can identify a bug; and randomization is inherently the antithesis of consistency. It needlessly hinders you, and you're doing it for no other reason than "because it should theoretically be okay", even though that claim makes no sense when dealing with bugs and behavior that does not follow your expectations. — Flater, Jan 06 '22 at 13:45
Using a deterministic pseudo-random generator implies you are going to generate more input values than you would do manually. But that implies that in several cases you cannot effectively specificy the expected outcome of a test any more beforehand (at least not manually). That does not mean that pseudo-random tests are useless - as you correctly wrote, when the UUT (or SUT, subject-under-test) has inbuilt checks for postconditions (or invariants), this can result in a useful group of test. However, this a restricted subset among other kind of test, and one has to evaluate ... — Doc Brown, Jan 06 '22 at 13:58
... if the time invested in creating pseudo-random tests would not be better invested into other kind of tests (which may be somtimes the case, sometimes not). — Doc Brown, Jan 06 '22 at 13:59
... let's say you test your newest string concatenation algorithm. You can invest time to write a pseudo-random test here concatenating 10000 different string combinations. Or you use a more white-box approach, knowing that the implementation does actually not care for any specific characters, and focus on testing special cases like concatenation of all kind of combinations like empty string, short strings, and long strings, strings with equal and unqual lengths etc. In reality, you have only a limited time frame and have to pick whatever test brings you the most "bang for the buck". — Doc Brown, Jan 06 '22 at 14:08
Most unit tests use a test-by-example strategy where you hardcode the expected result. Randomness is then very fragile. But other strategies like **[property based testing](https://hypothesis.works/articles/what-is-property-based-testing/)** are just what you're trying to do: instead of an example scenario, you define properties that should hold for all inputs, and strategies to generate inputs. The test runner then tries to find the smallest example that violates your properties, i.e. causes the test to fail. This can be extremely useful, but it's not yet mainstream. — amon, Jan 06 '22 at 14:53
@Steve Randomness and automation aren't mutually exclusive. As I explained, a string parameter in your signature comes down to: "any element in the set of all strings". That means that a UUT should be able to handle any string value. Which you can interpret as: "It doesn't matter, so I always pick the same one." or, "It doesn't matter, I will pick one from this set at random.". — Byebye, Jan 06 '22 at 15:21
@Steve Basically you're matching against a pattern. The pattern is part constant (it is a string), part variable(any length, any character). I argue that if you're matching a pattern, the variable part should be allowed to vary. Whereas you're saying it should not. Because the outcome is no longer deterministic. Which is not true. The test should never fail if you create a random string for a string parameter unless, what you actually wanted was a value that is an element of a subset of all strings. If it does, your code is wrong and you just found a bug. — Byebye, Jan 06 '22 at 15:26
@Byebye: the real question one have to ask here is: does (pseudo-)random test data make sense *for the specific case*. Sometimes it does, sometimes it is a waste of time. — Doc Brown, Jan 06 '22 at 15:47
@DocBrown "Using a deterministic pseudo-random generator implies you are going to generate more input values than you would do manually. [...]". Yes, that is exactly my point. A unit test to me is much like an experiment with independent and dependent variables. I vary the independent variables and verify if the dependent variables are as I expect, the rest may, nay, must be utterly random. If not, they are a confounding factor. — Byebye, Jan 06 '22 at 15:48
@Byebye: did you read the next sentence after that one you cited from me? And understand its implications? — Doc Brown, Jan 06 '22 at 15:49
@DocBrown I did. You are saying that one should add pseudo-random values on a "case by case" basis. Whereas I am saying they should be the de-facto standard unless, you want to control for certain factors. — Byebye, Jan 06 '22 at 15:59
Well, it seems you did not get my point. You wrote *"and verify if the dependent variables are as I expect,"* - my point is, to really define what you expect can be pretty cumbersome when you have a huge amount of (pseudo-)random input data. — Doc Brown, Jan 06 '22 at 16:14
And not only be cumbersome, but often a reimplementation of the same logic you were trying to test ("marking your own homework"). Tests should really be _substantially simpler_ than the implementation, so it's easier confirm that they're correct. Things like _parameterised_ testing, when you hard-code multiple inputs (which may be generated randomly at a given point in time) and their expected outputs then run through all of them as separate cases, can help here. — jonrsharpe, Jan 06 '22 at 16:56
@Byebye, I'm still not following the logic of automated tests using random values (in the sense of random values that vary on each run of the test). Most processing doesn't fail for random inputs, it fails on thresholds and edge cases. If you can't be exhaustive by testing every possible input, then clearly there needs to be a manual analysis of the code to try and create a sample of values that is most likely to catch (current or future) error. Generating random values on each run just seems absurd, as it provides neither the exhaustive guarantee, nor the analytical guarantee. (1/2) — Steve, Jan 06 '22 at 19:05
It is not even an efficient way of accumulating an exhaustive guarantee over time, since the random nature means some values will be tested multiple times unnecessarily, others perhaps never, and the random values used on each run would have to be logged (otherwise the test results would be unreproducible) so in that sense it might eventually involve more overall processing than just doing an exhaustive test outright. (2/2) — Steve, Jan 06 '22 at 19:11
@Steve: I am with you concerning unit tests, where every new random input value would be used for a new, fresh test. Picking "interesting" values deliberately will typically be way more effective than picking values randomly. However, using (pseudo-)randomness is useful for [Fuzz testing](https://en.wikipedia.org/wiki/Fuzzing), or when one tests larger modules with complex internal state. Here, one test case alone conists of a stream of random data, trying to break a program. But that's usually not unit testing any more. — Doc Brown, Jan 07 '22 at 07:23
@DocBrown, I realise that in some sense random values must be better than *no attempt to test at all*, but honestly, if the system is so complex that it can't be tested either exhaustively (because of too many possible states) or analytically (because it's too complex), then what real chance is there that the system is designed correctly? If the random test happens to fail, then you just swing your fist into the cake again until the test passes for one random sample, then it seems to work for a while until at a random time the system fails again, or until the test is found faulty. (1/2) — Steve, Jan 07 '22 at 08:10
So I really think random testing is a tacit realisation that the system exceeds the understanding of its own creators, which is a really serious failure of the test that developers need to know what they are doing when writing the system in the first place. Otherwise it's a thousand monkeys at a thousand typewriters - the very blurst of times. (2/2) — Steve, Jan 07 '22 at 08:13
@Steve: as I wrote, this is not for unit tests, but for larger modules. The real chance that a system or subsystem of a real world software is bug-free is indeed zero, that's old wisdom. Do you think "log4j" is bugfree in its current version? Probably not. Would randomized test data have helped to identify that "log4shell" issue you surely have heard of? Most probably not, too, since for detecting such kind of bugs, specificially designed test data is necessary. But can we imagine fuzz testing to be useful for finding other bugs in "log4j"? I think the answer is "yes. absolutely". — Doc Brown, Jan 07 '22 at 09:05
@DocBrown, well you aren't making a very compelling case for yourself. *"Let's look at bugs in 'log4j' and 'log4shell'. Would random testing have helped? Probably not."*. — Steve, Jan 08 '22 at 10:46
@Steve: what I am questioning here is your idea of a system when being "too complex that it can't be tested exhaustively or analytically", it must be maldesigned. But in fact *any* system or a certain size can't be tested exhaustively or analytically, and any such system has bugs. To get those bugs under control. it is a really good idea to apply different testing approaches. For certain systems, utilizing (pseudo-) random data for certain tests makes perfectly sense, there is neither a point in generally rejecting this nor in overstating the value of this (like the OP). — Doc Brown, Jan 08 '22 at 11:56
@DocBrown, I think the problem I have, is that random-each-time testing is like trying to shoot game birds by firing randomly into the sky. Logically, it is not impossible to catch birds that way, and someone somewhere will declare that he has, but it cannot possibly be worth the time and the bullets compared to time spent on basic hunting techniques. Even when the birds are located randomly and can teleport, a search strategy based on a systematic sweep has *no less* chance of success as a random shot. For real birds, a systematic approach is *always* more fruitful. — Steve, Jan 08 '22 at 14:47

score 0 · Answer 1 · answered Jan 06 '22 at 13:11

0

It is a given that a proper seed is to be used.

The real world is very different from your masters course. People in the real world very often don't do this, and then it's a problem for the reasons you know about.

answered Jan 06 '22 at 13:11

Philip Kendall

22,899
9
58
61

In that case, people in the real world have unit tests that cannot be reproduced. – gnasher729 Jan 06 '22 at 20:20
They do. I've seen it more times than I can count, and sometimes people have even listened to me when I've explained why it's wrong :-) – Philip Kendall Jan 06 '22 at 20:28

score 0 · Answer 2 · answered Jan 06 '22 at 13:50

0

Unit tests are meant to be reproducible, i.e. unless the code changed somehow, their results must not change.

For unit tests, it's much more helpful to devise test cases for border conditions (for example, does the code handle long strings correctly, possibly by throwing an exception?). If your code needs to handle special cases, for example attempted division by zero, you better check these cases reliably every time you run your test suite instead of randomly whenever your RNG feels like outputting a zero for that parameter.

There's a use for random values in fuzz testing where you call code with weird parameter values trying to break things. Once you found such parameter combinations you've probably got some more border conditions to handle or off-by-one errors to fix.

answered Jan 06 '22 at 13:50

Hans-Martin Mosner

14,638
1
27
35

Unit tests are still reproducible if you provide the same seed as was used for the test run. If your UUT has the predicate that it should fail if a 0 is provided, the generator should match that predicate and not output a 0. This is why I explicitly said that a generator should randomly pick an element from a set that matches the predicate. Essentially you are picking a random value from a well-defined set. This means your test will only fail if at some point you find a bug. Which is much better than using the same value every test-run. – Byebye Jan 06 '22 at 15:32
Boundary value analysis works on ordered data. For partition data there may or may not have a least / greatest / supremum /infimum. Generating 'random' values in a equivalence partition may be useful way to, if not 'prove' a partition is an equivalence partition, it increases the chance of finding out it is not without doing exhaustive testing. – Kristian H Jan 06 '22 at 19:29

What are the reasons not to use random values in unittests?

2 Answers2