How should I unit test mathematical formulae?

Question

I have a number of functions that are pretty close to the mathematical definition of a function. For example, a simplified version of one of these functions may look like:

int function foo(int a, int b) {
    return a * b;
}

I'd like to unit test them, but I'm not sure what the best way to approach those tests, as these functions have very large ranges for parameter values. Every approach I've come up with feels lacking:

If I test with a discrete pre-defined subset of possible values, I fear the set I choose may not be representative of the whole, leading to a potentially endless cycle of adding new values to the subset every time an unexpected value is found. This seems like defeating the point of having a unit test in the first place.
If I test with a discrete subset of possible values that's randomly determined, I inherit all of the problems of picking the subset manually as well as the real chance that tests will fail randomly every run.
If I test with every possible value (e.g. 0 to 2⁶⁴), the tests will take forever.

My final thought on how to approach the problem was to avoid testing the implementation of the function entirely, either by:

Only testing that the function accepts two integers and returns an integer, or
Not testing the function at all

But if I'm not testing the functions at all—either in practice or in principle—I'm not sure how I can guarantee the correctness of the functions in the case of an eventual refactor.

What is the best way to test functions that have completely deterministic output, but can operate on a large range of possible inputs?

I don't see how the first approach (test some reasonable values, add new ones as failures are exposed by other means) defeats the point of having a unit test. Many unit tests are regression tests. Turning real bug reports into test cases is a valid, useful and widespread method of creating regression tests. I don't really see all that much potential for that though, if the implementation is really just straightforward transliteration of a formula that already does what you want. In which case a few manually-picked values (try to choose edge cases) makes a perfectly fine unit test. — , Feb 11 '15 at 22:48
`I'm not sure how I can guarantee the correctness of the functions in the case of an eventual refactor.` Tests never *guarantee* correctness. That aside, an issue you didn't touch upon is that comparing floating point numbers is a complicated business, so even if you know what the result ought to be, comparing that to the output of the function isn't exactly trivial. And then there's [numerical stability](https://en.wikipedia.org/wiki/Numerical_stability); sometimes a naive implementation of a function results in errors accumulating. — Doval, Feb 11 '15 at 22:50
@Doval Right, which is why I'm thinking not testing the function at all (or just testing the signature) is not a good idea. — Evan, Feb 11 '15 at 22:57
@delnan If testing a subset and adding new cases as they're found ad infinitum is how people generally approach this problem, I'm prepared to accept that as the way it is. The purpose of my question is, after all, to double-check that with other programmers because it just seems a bit icky to me. — Evan, Feb 11 '15 at 22:59
@Evan Testing can never rule out bugs, only find them. Generally this process does not continue ad infinitum though, since you keep the accumulated tests around, and hence (should) never re-introduce any of the bugs already discovered, thus steadily improve reliability. That's what *regression* tests are all about. There's really no reasonable way to test *all* inputs. Testing methodology is about getting the most bang for the buck, and a bug which has already occurred in the wild is hard to beat at that: You already know that this problem can occur, and the test case is practically free. — , Feb 11 '15 at 23:08
@delnan Is there a reason why you're not providing your thoughts on this as an answer so that others can vote on it and I can accept it if need be? — Evan, Feb 11 '15 at 23:14
Since you are dealing with Mathematical functions, have you considered *proving* them correct? — Giorgio, Feb 19 '15 at 21:47

score 19 · Answer 1 · answered Feb 11 '15 at 23:25

19

Create one or two typical test cases, and then focus your remaining testing on boundary conditions.

For example, how does your function behave with zeroes? with ones? with int.MaxValue? What are the largest positive or negative numbers that can be used in the function before it overflows or underflows? What happens when an overflow or underflow occurs?

And so forth.

answered Feb 11 '15 at 23:25

Robert Harvey

198,589
55
464
673

1

My thoughts exactly: have a few "good" tests to validate your typical inputs produce expected results. Then try your hardest to _break_ your code. Boundary values, invalid inputs (see if it throws an exception, maybe), NaN, +/- infinity, etc. – Feb 11 '15 at 23:35

score 12 · Accepted Answer · answered Feb 12 '15 at 00:14

12

It's very important to have reproducible failing or succeeding tests, so forget about any random input approach. Let's say you test the sin(x) function and you know its result is always in the range of [-1,1] and you say you test 100 random numbers then your assertion is mathematically right but what if there is any dirty fast sin(x) implementation which may produce a 1.00000000000001 and your test fails nearly every 1000th run.

Try to find good input parameters:

values producing nice round results
inflection points
relative and absolute borders

Sometimes parametrized functions keep those properties when changing only a specific parameter (for example the zero crossings and a scaling parameter), so also test some odd values.

And reduce boilerplate code with unit test data-providers if available.

answered Feb 12 '15 at 00:14

Aitch

709
4
13

Thanks: both Robert Harvey's answer and yours generally suggest the same thing, but I'm accepting yours because you more directly addressed some of the concerns I had mentioned. – Evan Feb 12 '15 at 01:02
You can use random values by remembering the seed and using it to reset your random generator if the test fails. Just save the current seed of your generator when the test is first created and reset the generator with the saved seed when the test is set up. – Leandro Caniglia Feb 12 '15 at 17:58
Yeah I know what you mean. Actually you would generate any seed for good pseudo-random numbers from the best random value you can get or at least any time dependent input to have unique seeds. In the case of testing you would always use the same seed to have exactly the same set of numbers. The point is only why would you suppose those numbers to unit test something better than a user-defined set. This approach would also say, that 101 numbers is a better coverage than 100 numbers, so you're running into testing all possible parameters. – Aitch Feb 12 '15 at 20:58
1

You don't necessarily need the seed. You can also output the randomly generated inputs in the error message. If the test fails, you create a new reproducible test with the inputs of the failing test. This method has proven very effective for me. – Reinstate Monica Dec 29 '16 at 16:33

Benjamin Hodgson · Answer 3 · 2015-02-19T22:16:38.487

Mathematical functions are especially amenable to property-based testing.

With property-based testing, you declaratively define high-level invariants that your API must satisfy. The test framework generates a large number of test cases and tries to find a counterexample to your invariants.

Let's write some properties for your example - a function * which multiplies two numbers. Property-based testing originated in the functional programming community, so I'm going to use QuickCheck (a Haskell-based tool), but similar test frameworks are now available for most imperative languages.

We define properties by writing functions which take arbitrary values as arguments. For each of these functions, QuickCheck will randomly generate 100 combinations of arguments and check that the property holds for each of them.

prop_commutative x y = (x * y) == (y * x)

prop_associative x y z = (x * y) * z == x * (y * z)

prop_distributive x y z = x * (y + z) == (x * y) + (x * z)

prop_identity x = (1 * x) == x

prop_zero x = (0 * x) == 0

Our tests look like mathematical definitions! They say "For all x and y, x * y equals y * x" and so on. If, say, you were developing a set of trigonometric functions, you could write a similar set of properties testing things like the double-angle formulae. Property-based testing really comes into its own for mathematical programming.

The output of running these tests looks like this:

=== prop_commutative ===
+++ OK, passed 100 tests.

=== prop_associative ===
+++ OK, passed 100 tests.

=== prop_distributive ===
+++ OK, passed 100 tests.

=== prop_identity ===
+++ OK, passed 100 tests.

=== prop_zero ===
+++ OK, passed 100 tests.

When a property is untrue, QuickCheck finds a counter-example. This property doesn't hold for numbers less than 1:

prop_multiplyingMakesItBigger x y = (x * y) > x

QuickCheck spots our mistake and prints out the values of x and y which disproved our property:

=== prop_multiplyingMakesItBigger ===
*** Failed! Falsifiable (after 1 test):  
0
0

QuickCheck tests your properties with values you hadn't anticipated, which helps you find bugs.

You can also use property-based testing to test against a model. If you have a naïve implementation of a function which you know is correct (but perhaps is not fast enough), you can use it as a model for regression testing of an improved version. Here's an example where we check that * is equivalent to repeated adding:

prop_repeatedAddingModel (NonNegative (Small x)) y = (x * y) == (foldr (+) 0 (replicate x y))

Here I'm also demonstrating how to use QuickCheck to constrain test values. Our model doesn't work for negative x, and is prohibitively slow for large x, so we tell QuickCheck that we want x to be small and non-negative.

Real World Haskell has a great chapter on QuickCheck, with lots of examples.

Property-based testing libraries exist for Python ([Hypothesis](https://hypothesis.readthedocs.io/)) and JavaScript ([FastCheck](https://github.com/dubzzz/fast-check)) too. — Ninjakannon, Oct 05 '22 at 21:17

score 2 · Answer 4 · edited Dec 29 '16 at 10:10

Perfect is definitely the enemy of good when it comes to testing. Testing is rarely easy. You have to really understand what your function/formula is supposed to do. Even something as simple as multiplying two numbers has several properties we can check.

You may find that some of them will hold up only if you don't consider some of the edge cases.

// False assumption that when numbers are multiplied, the result is greater
foo(a,b) > a + b
* fails on 2,2
// Update bad assumption
foo(a, b) >= a + b

// A negative times a negative is a possitive
isneg(a) and isneg(b) and ispos(foo(a,b))

// Test for zero
foo(a, 0) = 0
// Test for 1
foo(a, 1) = a
// Commutative property
foo(a, b) = foo(b,a)

// Associative property
foo(a, b) * c = a * foo(b,c)

Being able to do this means you don't have to test every single parameter combination.

These kinds of things get caught manually in the business world all the time. Anyone with enough domain knowledge can help with making these assumptions. I may not know the quarterly report figures but it's a safe bet they will never exceed the year to date figures.

score 2 · Answer 5 · answered Feb 12 '15 at 18:44

You mention the dilemma of testing a fixed set of cases (which might just happen to "miss" the values which your function fails on) or testing many randomly generated cases (which can pass one time and fail the next, and is slow to boot).

Both of these techniques are useful, and they are not mutually exclusive. It can be useful to have one (fast) test suite which uses a fixed set of cases, and another (slow) suite which generates many, many random cases and 1) checks whether the outputs obey certain invariants, as well as 2) seeing if you hit a failed assertion, or throw an uncaught exception.

Run the fast suite on every commit. Run the slow (randomized, generative) suite as often as you want to, perhaps before each release. If the slow suite discovers an input which "breaks" your function, add that input as a new test case in the fast suite.

When the randomized suite starts, it should generate a new random seed and print it to the console. It should also provide a way to manually set the seed, for cases where you want to repeat a previous run exactly.

How should I unit test mathematical formulae?

5 Answers5

Linked

Related