What are best practices for testing programs with stochastic behavior?

Question

Doing R&D work, I often find myself writing programs that have some large degree of randomness in their behavior. For example, when I work in Genetic Programming, I often write programs that generate and execute arbitrary random source code.

A problem with testing such code is that bugs are often intermittent and can be very hard to reproduce. This goes beyond just setting a random seed to the same value and starting execution over.

For instance, code might read a message from the kernal ring buffer, and then make conditional jumps on the message contents. Naturally, the ring buffer's state will have changed when one later attempts to reproduce the issue.

Even though this behavior is a feature it can trigger other code in unexpected ways, and thus often reveals bugs that unit tests (or human testers) don't find.

Are there established best practices for testing systems of this sort? If so, some references would be very helpful. If not, any other suggestions are welcome!

Can't you mock the kernel ring buffer too ? And other random aspects of your code ? — Jonathan Merlet, Oct 18 '12 at 14:21
@JonathanMerlet Potentially, but the issue is that, when deployed, the code will have access to the _real_ ring buffer (indeed, to a real OS). So if I only test on a mocked up version, then I'm just deferring the discovery of these bugs until later. — John Doucette, Oct 18 '12 at 14:28
It seems to me that the problem is not related to the program's *random* behaviour (since this can be controlled by the random seed) but to particular states of this 'kernel ring buffer'. So your question is actually 'how do I test a program that depends on external state', right? — AakashM, Oct 19 '12 at 09:51
@AakashM, yeah, that's a better way to phrase it. To be more specific, a program with an external state, which stochasticly accesses or alters the external state. — John Doucette, Oct 19 '12 at 11:36

score 7 · Accepted Answer · answered Oct 18 '12 at 21:10

It is useful to add hooks, as suggested, to recreate exact states. Also instrument the system so that it can dump its "seeds" (in your case, including the PRNG seed as well as the kernel ring buffer, and any other sources of nondeterministic input.)

Then run your tests both with true random input, and regression-style with any previously-discovered interesting cases.

In the particular case of your access to the kernel, I'd recommend making a mock in any case. Use the mock to force equivalence classes that are less likely to show up in practice, in the spirit of "empty" and "full" for containers, or "0, 1, 2^n, 2^n+1, many" for countable things. Then you can test with the mock and with the real thing, knowing that you have handled and tested the cases you've thought of so far.

Basically, what I'm suggesting amounts to a mix of deterministic and nondeterministic inputs, with the deterministic ones being a mix of those you can think of and those you were surprised by.

score 6 · Answer 2 · answered Oct 18 '12 at 13:59

6

One reasonable thing to do is to seed the random number generator with a constant value for the tests, so that you get a deterministic behavior.

answered Oct 18 '12 at 13:59

Dima

11,822
3
46
49

1

this; or mock out the prng entirely – jk. Oct 18 '12 at 14:08
1

Thanks for the suggestion! I do this for unit testing already, but I cannot test all possible programs by hand. – John Doucette Oct 18 '12 at 14:29
2

but this means that you can't test whether the randomness works properly.. – Louis Rhys Oct 19 '12 at 07:54

score 2 · Answer 3 · answered Oct 19 '12 at 06:23

2

I think statistical testing is the only way. Just like random numbers are "tested" for randomness by statistical tests, so need to be algorithms that use of random behavior.

Simply run the algorithm multiple times with either same or different input and compare it to each other. The problem with this approach is massive increase in computational time required to finish the testing.

answered Oct 19 '12 at 06:23

Euphoric

36,735
6
78
110

Not necessarily, because you can choose a small "spanning" set of inputs and run multiple times on them- the number of inputs required to ascertain reliability may be smaller. This "spanning" set should enter every branch of the code, initialize all objects, etc. – Daniel Moskovich Dec 04 '19 at 14:04

score 2 · Answer 4 · edited Mar 26 '22 at 10:05

2

I'm not a specialist in this domain, but there is a scientific litterature relative to stochastic program testing.

If you cannot easily create test classes, a statistical test can be used, as #Euphoric said. Borning et al. compare a traditional approach and a statistical one. A generalisation of the statistical tests suggested by @Euphoric could be the one discussed by Whittaker. He suggested to create a stochastic model of the desired (stochastic, in your case) behavior and then generate specific test cases from this model (see his dedicated paper).

edited Mar 26 '22 at 10:05

Glorfindel

3,137
6
25
33

answered Oct 19 '12 at 09:43

mgoeminne

1,158
6
11

Thanks! Looks very helpful. For those outside academic institutions, a preprint version of the paper can be pulled from the author's google code repository here: http://team4model.googlecode.com/svn/trunk/resources/paper/Stochastic%20software%20testing.pdf – John Doucette Oct 19 '12 at 11:59

What are best practices for testing programs with stochastic behavior?

4 Answers4

Linked

Related