64

Having worked in complex solutions that had Unit Tests and Integration Test in the CI/CD pipeline, I recall having a tough time with tests that failed randomly (either due to random values being injected or because of the async nature of the process being tested - that from time to time resulted in some weird racing condition). Anyway, having this random behavior in the CI pipeline was not a good experience, we could never say for sure the change a developer committed was really causing the build issue.

I was recently introduced to AutoFixture, which helps in the creation of tests, by randomly generating values - surprisingly I was the only one who did not feel it was a great idea to introduce it in all tests of our CI pipeline.

I mean, I understand fuzz testing, monkey testing, etc but I believe this should be done out of the CI/CD pipeline - which is the place I want to ensure my business requirements are being met by having sturdy, solid and strict to the point tests. Non linear behavior tests like this (and load testing, black box, penetration, etc) should be done outside of the build pipeline - or at least should not be directly linked to code changes.

If these side tests ever find a behavior that is not expected, a fix should be created and a new concrete and repeatable test case should be added to avoid going back to the previous state.

Am I missing something?

  • 5
    it seems `AutoFixture` at least supports property-based testing, which is what it looks like you're looking for. [`QuickCheck`](https://hackage.haskell.org/package/QuickCheck) popularized this approach, which allows you to test classes of inputs rather than using random inputs. – Nathaniel Ford Jun 23 '21 at 00:19
  • 2
    In digital design we do constrained random verification all the time. It’s unlikely that a random test is going to fail after you’ve run it (hundred)thousands of times and it’s too much effort to write good, dedicated non-random tests just for CI. If a random test does fail in CI: Re-run with the same seed and see if it’s a real bug. If it’s not: fix the test bench or randomness constraints. – Michael Jun 23 '21 at 05:41
  • @DocBrown, thanks! I have not found that before. I believe the core of the question is indeed the same. Although I can see value in both deterministic and non-deterministic, having a non deterministic test linked to the CI pipeline is my main concern. – Vinicius Scheidegger Jun 23 '21 at 11:22
  • 2
    In _tests_: yes, have some random tests (that record the seed as part of the results). In _unit tests_: no, but any failures from the random tests should be used to construct new unit tests that exercise the discovered failure condition. – Toby Speight Jun 24 '21 at 13:37
  • If the test failed randomly because of the random data, that meant the test did have a need to discriminate on the values. So in that sense it was helpful. This can happen if it is hard to pick boundary conditions or if you want to stress-test concurrency effects. In both cases however it is good to have that only as additional test runs with value logging or pre-specified seeds to make it replayable. – eckes Jun 24 '21 at 22:22

5 Answers5

78

Yes, I agree that randomness shouldn't be part of a testing suite. What you want is to mock any real randomness, to create deterministic tests.

Even if you genuinely need bulk random data, more than you can be bothered generating by hand, you should generate it randomly once, and then use that (now set in stone) data as the "random" input for your tests. The source of the data may have been random but because the same data is reused for each run, the test is deterministic.

However, what I said so far applies to running the tests you knowingly wrote and want to run. Fuzz testing, and therefore Autofixture, has a separate value: bug discovery. Here, randomization is actually desirable because it can help you find edge cases that you hadn't even anticipated.

This does not replace deterministic testing, it adds an additional layer to your test suite.

Some bugs are discovered through serendipity rather than by intentional design. Autofixture can help cycle through all possible values for a given input, in order to find edge cases that you likely wouldn't have stumbled on with limited hand-crafted simple test data.

If and when fuzz tests discover a bug, you should use that as the inspiration to write a new deterministic test to now account for this new edge case.

In short, think of fuzz tests as a dedicated QA engineer who comes up with the craziest inputs to stress test your code, instead of just testing with expected or sensible data.

Flater
  • 44,596
  • 8
  • 88
  • 122
  • 10
    The secret is to control the random, not be at its mercy. If you let it go wild be sure to capture it. Because the test is perfectly deterministic and repeatable if you know exactly what the random was that caused the behavior you observed. Don't make it hard to match the record of the random with the behavior. – candied_orange Jun 22 '21 at 19:04
  • 28
    Also, rather than store all the random data, you can use a deterministic pseudo random number generator and only store the seed. But obviously in such a way that test ordering doesn't influence the values (so reset the seed every test). – Gregory Currie Jun 23 '21 at 02:22
  • 2
    @GregoryCurrie: Yes. And it's a good idea to manually change the seed and see if the tests still pass. I've seen tests which were suspiciously tailored to only accept a specific list of random values. And also check if the tests really test something... Tests don't bring much if they can never fail. – Eric Duminil Jun 23 '21 at 09:14
  • 2
    https://xkcd.com/221/ – Eric Duminil Jun 23 '21 at 11:58
  • 2
    The problem with fuzz testing is that if your total inputs are more than about 32 bits, you can't exhaustively test every possible input. If there are corner cases right at the maximum input value or something, random is unlikely to generate that. (e.g. the last time I was playing with a bigint multiply that used 2 chunks, random products didn't care whether I propagated carry-out from the lo x hi cross products into the hi x hi product, or something like that. I had to craft that corner-case input by hand. Although I think I was using bash `(($RANDOM * $RANDOM))` which can't make UINT_MAX) – Peter Cordes Jun 23 '21 at 17:24
  • @PeterCordes `$RANDOM` is an extremely crude way of obtaining random values. For randomised testing to be successful, you should use application-aware generators. QuickCheck is excellent at that, in part benefitting from the fact that Haskell types tend to express intent more precisely than would be feasible in other languages. – leftaroundabout Jun 23 '21 at 23:29
  • @leftaroundabout: Indeed, that was just a one-off test for [an SO answer I was writing](https://stackoverflow.com/questions/33217920/32-bit-extended-multiplication-via-stack/67922154#67922154) (a 2x2 limb integer multiply in asm, very simple and I already understood the math, just wanted to generate some examples for the answer and manually check the corner cases), not for production code. The randomness was just an afterthought using bash to drive the command-line instead of writing more C. – Peter Cordes Jun 23 '21 at 23:57
  • 3
    @PeterCordes Regardless of random testing or not, a few deterministic tests like "all 1 bits" or "all 0 bits" are obvious things to use for a big-int arithmetic package. IMO. – alephzero Jun 24 '21 at 03:56
  • @alephzero: Yes, that was what I did first, like I said. And BTW, what inspired me to think about the problem in the first place (and to do actual testing) was a corner-case bug on a similar answer for the same algorithm, pointed out in 2013, a year after it was posted, and again in another answer. [multiply two 32-bit numbers to get a 64-bit number, on a 8086 (32x32 => 64-bit with 16-bit multiplies)](https://stackoverflow.com/posts/comments/27980378) (I edited that answer to fix the bug, as well as posting my optimized version on a probable duplicate of the question.) – Peter Cordes Jun 24 '21 at 03:56
  • 1
    @PeterCordes: I get your point but my answer wasn't rooted in the assumption that fuzz testing catches _all_ remaining bugs. It's just an extra check, with an extra chance to catch any leftover bugs, not unlike the QA engineer I compared it to. To be honest I rarely fuzz test, unless I already have reason to believe I'm working with old systems or homebrew external services which may have validation quirks or shoddy build quality. – Flater Jun 24 '21 at 08:00
  • 1
    Good, we agree on that, then. Your answer didn't explicitly say so, so I commented. Manual corner-case tests are important. (Unless you really can test the entire problem space against a known-good reference implementation, like for a function of a single 32-bit input, such as `float` `log` or `sin`. Bruce Dawson: [There are Only Four Billion Floats–So Test Them All!](https://randomascii.wordpress.com/2014/01/27/theres-only-four-billion-floatsso-test-them-all/).) – Peter Cordes Jun 24 '21 at 14:12
  • 1
    @PeterCordes excellent blog post, thanks. I've never thought about that. – Eric Duminil Jun 27 '21 at 17:25
51

I've worked on projects which use anywhere from no to extensive randomness in tests, and I'm generally in favour of it.

The most important thing to remember is that the randomness must be repeatable. In the current project we use pytest-randomly with a seed based on the pipeline run ID in CI, so it's trivial to repeat a failing run identically, even though each pipeline run is different. This may be a showstopper if you want to run tests in parallel, because I could not find a (pytest) framework which will split tests into parallel runs reproducibly.

The randomness is used in two ways:

First, tests are run in random order. This virtually guarantees that any test interdependencies will eventually be discovered. This avoids those situations where a test fails when running the whole test suite, but when running the failing test on its own it succeeds. When that happens, it could take much longer to find and fix the actual issue, since you're effectively debugging two things at once, you can't be sure whether the test failure is because of a bad test or bad production code, and each test run to check a potential fix could take a long time.

Second, we use generator functions for any inputs which are irrelevant to the test result. Basically we have a bunch of functions like any_file_contents (returns bytes), any_past_datetime (datetime.datetime), any_batch_job_status (enum member) and any_error_message (str), and we use them to provide any required input which should not affect the test results. This can surface some interesting issues with both tests and production code, such as inputs being relevant when you thought they weren't, data not being escaped, escaped in the wrong way, or double escaped, and even ten-year-old core library bugs showing up in third party libraries. It is also a useful signal to whoever reads the test, telling them exactly which inputs are relevant for the result.

While this approach is not a replacement for fuzz testing, it is a much cheaper way to achieve what I would expect are similar results. It's not going to be sufficient if your software requires much more extensive fuzzing, such as a parsing library, but it should be a simple way to improve run-of-the-mill tests.

I don't have the numbers to back this up, but I believe this is the single most reliable test suite (in terms of false positives or negatives) I've worked on.

l0b0
  • 11,014
  • 2
  • 43
  • 47
  • 10
    I don't know why someone downvoted this. Thanks for the voice of experience, and good point about all randomness needing to be reproducible—it's unfortunate that so many answers, both here and at the linked question, raise a point like "you wouldn't be able to reproduce it" with no sign of having considered the fact that reproducible pseudorandom functions exist, and are in fact what are present in most standard libraries etc. – ShreevatsaR Jun 23 '21 at 07:26
  • 3
    +1 - i do it the same most of the time. Inputs that are not relevant (or maybe only a subset of possible values should behave equally) i'll do randomly. E.g. `randomFirstname()`, `anyOf()`, ... I even had tests succeed when they shouldn't because someone used the same ID for multiple mocked input objects. And with this setup i still had flaky tests - but never due to the random fill-ins. – marstato Jun 23 '21 at 07:47
  • 1
    +1. This covered the key point : repeatability. When a CI test fails, the first thing to do is dig its seed out of the logs and re-run the previous commit with that seed. – user_1818839 Jun 23 '21 at 09:57
  • 8
    And of course, once you have a failing test, you should create a new test with the specific data, or property of the data, that generated the failure, and make that part of the CI flow. – Gregory Currie Jun 23 '21 at 09:58
  • @ShreevatsaR to be the devil's advocate, perhaps people disagree that it brings the least amount of false positives? Technically having a random string means the test might work by accident, or that an error could occur one in a million times (so even if a test fails, you might have problem reproducing it and it could pass next time) – Pierre Arlaud Jun 23 '21 at 14:35
  • This. A common test we have here is "one million random key presses." If the test succeeds, there is little point in repeating the test with the exact same set, but if it fails, we want to know which they were, and the bug is closed by verifying that the software now passes the test with exactly this set. The same test is also applied to stable releases that are still supported to check if they have the same bug. – Simon Richter Jun 23 '21 at 15:15
  • 2
    @PierreArlaud The point of this answer is that you should make the "random" test perfectly reproducible. That is, it's ok to use an external input like pipeline id or time-of-day or whatever as the seed for whatever randomness is in your tests, as long as you have a good way of rerunning the tests with the same seed, to get the same result. Then if a test fails, you won't have trouble reproducing it. (I will note though that your team must have a culture of taking each failure seriously and not just retrying flaky tests to pass!) – ShreevatsaR Jun 24 '21 at 00:08
  • +1. If you are going to land a spacecraft on the Moon or Mars, or make a spacecraft rendezvous with another, you are going to be building a Kalman filter. A unit test on a Kalman filter is just about worthless. The only way to test a Kalman filter is to throw lots of pseudo randomness at it and see how it performs over time. The goal isn't even to pass all of the Monte Carlo runs. Passing 99.73% of the test runs is good enough. The filter will not come close to that passing rate initially. It helps to be able to exactly reproduce a run that failed, so pseudo-randomness. – David Hammen Jun 24 '21 at 17:40
  • 1
    @PierreArlaud If you've already covered all of your non-random tests that you were going to, then adding additional random tests can't create any additional false positives. (If it falsely passes the non-random + random tests, then it would have also falsely pass just the non-random tests....) – user3067860 Jun 24 '21 at 18:14
  • 2
    It's 2023 - I randomly generate the seed value and output it in CI before each test suite gets run - I don't tie it to any pipeline ids or other heuristic, I generate a random one for every single test suite every time a build happens, and I log it so I can see it. If a test suite fails, I can just copy paste the seed from CI into my local and reproduce the failure. In the rarest of cases it's a real failure, which is awesome. If it's a non application-breaking failure, I've still learned that I need to improve my test suite. I am a very very strong advocate of random mock data in tests – Adam Jenkins Jun 08 '23 at 22:24
12

No. Random values in unit tests cause them to be not repeatable. As soon as one test will pass and another will fail without any change, people lose confidence in them, undermining their value. Printing a reproduction script is not enough.

That said, randomized edge case testing and fuzz testing can provide value. They’re just not unit tests at that point. And personally, I like linking them to CI even if they don’t necessarily block a deployment, or are necessarily run on every commit.

Telastyn
  • 108,850
  • 29
  • 239
  • 365
  • 9
    Only non repeatable if the unit tests are stupid. You set a hard coded seed value at the beginning of each test and the random sequence is perfectly reproducible. – gnasher729 Jun 23 '21 at 07:12
  • 5
    @gnasher729 It isn't random then is it? – Blaž Mrak Jun 23 '21 at 07:37
  • 6
    @BlažMrak: That's why they are called *pseudorandom* number generator. – Eric Duminil Jun 23 '21 at 09:15
  • @EricDuminil I know what they are called, my point is that "random sequence" is not "perfectly reproducible"... by definition. – Blaž Mrak Jun 23 '21 at 09:37
  • @BlažMrak: What's the alternative that you'd suggest? Heisenbugs & Heisentests, which are both green and red? – Eric Duminil Jun 23 '21 at 09:47
  • @EricDuminil I never said that this is not a valid approach and that there needs to be an alternative. What I said was that gnasher's comment was basically irrelevant to the answer. The point of the author was that you should not do tests that are not repeatable. How you achieve that is irrelevant: might be a generator with a seed, an array of numbers or hardcoding each number directly where it is needed. I hope you notice that all of the above are the same as far as the test is concerned. BUT using a generator is "easier" for bulk data. That is the only difference between the three. – Blaž Mrak Jun 23 '21 at 10:05
  • 8
    @BlažMrak gnasher729's comment is effectively a frame challenge. I assume that none of us are thinking that anyone is using a hardware RNG (or anything that creates actual random numbers). Therefore we *are* (even in the premise of the question) talking about PRNGs. Gnasher's (quite valid in my opinion) point is that *pseudo*random numbers do not cause unit tests to be unrepeatable *unless* you are using them wrong. Or, to frame it in terms of your first comment, the data was never "random" to begin with, it was merely (by design) changing between each run. – Jon Bentley Jun 23 '21 at 15:19
  • @JonBentley you are reaching really hard with the "but these are not actually truely random numbers". As far as I understand the answer, it is saying that having no control over the generated values (basically having values that change every run) causes tests to not be repeatable -> you will not know why your test failed, which is bad. If you frame it like that you are saying that the data is effectively random... Would "unpredictable" explain better what random means in the programming context, so we can move past the "it isn't really random, but just changes" discussion? – Blaž Mrak Jun 23 '21 at 19:28
  • @JonBentley my point is that if no two runs are meant to be the same, the data is random, otherwise it is not. – Blaž Mrak Jun 23 '21 at 19:30
  • 4
    @BlažMrak And my point is that choosing PRNG data as the input is not done so that "no two runs are the same", but rather so that you can get a variety of input coverage. Following from that, you can achieve that goal (variety of input coverage) *without* "no two runs being the same", simply by using the PRNG correctly. Seeing this as a nitpick between random vs unpredictable is missing the point being raised in these comments. – Jon Bentley Jun 23 '21 at 19:52
  • One example: MATLAB’s RNG default uses 0 for the PRNG seed. It’s effectively random, but if you start up two separate sessions and run the same code each time that uses values sourced from its RNG, you will end up with identical results between runs, even though each run appears random. – fyrepenguin Jun 23 '21 at 22:58
  • @gnasher729 -`printing a reproduction script is not enough`. People won’t even look at the seed if they know “just rerun it” unblocks them. And if the seed is truly hard coded, why not just hardcode the values? – Telastyn Jun 23 '21 at 23:35
  • 1
    Why not just hard-code the values? Because maybe you want a *lot* of test cases, like loop over 10k random values, and copy/pasting a huge initializer into source code, or having a data file you open and read, is a lot bulkier and uglier than just `srand(1234)` / call `rand()`. At least if you can trust that `rand()` is the same across all your test environments. It's not always *totally* trivial to guarantee a repeatable sequence, but it's certainly going to be reproducible on that test machine, in the worst case you do need to dig into it and generate the sequence for use elsewhere. – Peter Cordes Jun 24 '21 at 00:09
  • 1
    @PeterCordes I don't believe rand() is deterministic across multiple machines. – Gregory Currie Jun 24 '21 at 02:36
  • @PeterCordes Nobody uses `rand()` anymore. Or at least, nobody should use `rand()` anymore. There are notoriously bad implementations of `rand()`. In my industry, one would probably get fired as incompetent for using `rand()`. No one will get fired (in my industry) for using Mersenne Twister, even though there are better algorithms. There are industries that demand cryptographically secure random number generators, and one would probably get fired for using Mersenne Twister in that context. – David Hammen Jun 24 '21 at 17:50
  • @DavidHammen: Mea culpa, I knew I shouldn't have tried to save typing by using actual srand/rand as the example. :/ I was mostly thinking of them as placeholders for the reasons you mentioned, but of course I didn't say that, and not literally everyone knows those reasons. Bad examples are dangerous. It could be ok if you know your test platforms use glibc where rand() = random(), but yeah best to use something that's guaranteed to be the same everywhere, even copy/paste a simple LCG or xorshift. RNG quality isn't important here, e.g. a no repeat LCG with period = 2^64-1 is a feature. – Peter Cordes Jun 24 '21 at 18:02
  • @BlažMrak I am testing my "add two numbers" function. Every time I test it, I roll two dice and enter what they show into the input boxes. Then I write down 1) the inputs, and 2) the output, and compare the output against the expected results. If I have a failure, I don't roll more dice to try and repeat it, instead I look at what I wrote down and re-use those same values to test if I fixed it (by writing a fixed test). Are you telling me that either 1) I can't re-enter the same numbers I wrote down, or 2) dice aren't random? – user3067860 Jun 24 '21 at 18:23
  • @Telastyn It's also possible for people to disable all of the tests (no failures) or not even write any (even better, less work AND no failures). That's certainly a thing that happens often with non-random tests... But "developers might not bother to care about stuff" doesn't seem like a real strike against random tests. – user3067860 Jun 24 '21 at 18:25
  • @user3067860 you hardcode the numbers into tests so they will always remain the same. The same test will not run with different numbers each time. We are talking about the randomness from test's perspective. So while your numbers were generated "randomly", your tests will always run the same numbers. Also, it is not that "developers might not bother to care", but you do not know why the test failed once and why it passed all other times -> a flaky test. – Blaž Mrak Jun 24 '21 at 20:35
  • @BlažMrak It does _not_ run with the same numbers every time, I roll two dice completely separately for every time I want to run the test. Why it failed is easy... you look at the results, last time it passed and it says "passed, input 2,3, output 5" and this time it failed and it says "failed, input 4,4, output 9"...go run it locally with 4,4 and see why it turns into 9. – user3067860 Jun 24 '21 at 20:42
  • @user3067860 Good job, you produced a flaky test. If it passes it tells you nothing, because it might fail the next run. If it fails, should you really care? if not it is useless, if you do, why not make a seperate test with the input hardcoded? I'm not saying that you should never write such tests, but they should not be a part of your "regular" test suite, because the intention of the test is not to test that the behavior of the program didn't change, but to explore the input space for inputs that produce bugs. – Blaž Mrak Jun 24 '21 at 21:56
  • @BlažMrak Because I didn't know what input was going to cause it to fail. I already captured the ones I thought would fail in separate tests, but obviously--well, no, I am perfect, it's other people on my team who aren't, right? (Or maybe it's our users, lets blame them for not entering the same data every time, that seems pretty safe.) But of course I care that it failed. If a user reports something failed when they did X but worked when they did Y, do we care? It is "working now", or it "works on my machine"... Well--they might *need* to do X... definitely it is the users' fault. – user3067860 Jun 25 '21 at 11:40
  • ...your non-random unit tests don't guarantee that the behavior of the system hasn't changed since last time the tests ran. They say it hasn't changed *for the test values*. So it's not sufficient to, say, randomly test once and call it a day. You have created tests for 1,2,3,5,6...but someone accidentally introduces a bug only impacting 4... oops... You could run the entire range of all possible inputs with every test cycle, but of course that would be quite..slow...... – user3067860 Jun 25 '21 at 11:43
  • @user3067860 I don't know what we disagree on... You have a deterministic test suite that runs on every commit. The more tests you have in that test suite, the less valuable this flaky test becomes. Saying that my tests don't guarantee that the behavior hasn't changed is a straw man once again. They guarantee that the system still satisfies the requirements that we expected from it. The tests will never be able to guarantee that a new bug was not introduced. Even with your random tests. There is a chance, but a very slim one. – Blaž Mrak Jun 25 '21 at 12:40
  • And yes, sometimes when you develop you do not care that the user entered X doesn't work, because it might be to expensive to implement a proper solution. That is how development works in real world at least when developing for customers with limited budget. It is not that "it works on my machine", but that it brings no additional business value to support all user's inputs. – Blaž Mrak Jun 25 '21 at 12:42
  • Let us [continue this discussion in chat](https://chat.stackexchange.com/rooms/126856/discussion-between-user3067860-and-blaz-mrak). – user3067860 Jun 25 '21 at 13:26
  • I have used randomised data not because I wanted lots of test cases, but ONE test with a lot of cases. To check: What does my address book do if I have 10,000 users in it? Or 100,000? Does it crash? Does it practically freeze? Or does it run acceptably fast? – gnasher729 Jun 25 '21 at 14:14
  • @Telastyn: Using a random number generator with a fixed seed, I can easily run a million test cases. If I put all the numbers into the unit test, the whole million, my compiler might not agree with that. – gnasher729 Jun 25 '21 at 14:17
  • What you can also do is use one serious of random numbers with a fixed seed, and one where you derive the random seed from the build number, or the current date, something that you could hard code if you need. – gnasher729 Jun 25 '21 at 14:23
6

I would recommend covering "obvious" edge cases with explicit test data inputs, rather than hoping the fuzz testing will catch them. E.g. for a function that operates, handle empty arrays, single-entry arrays, and arrays with multiple (e.g. 5) items.

This way, your fuzz tests are strictly additive to a baseline level of solid test coverage.

One way to help reduce the pain is to ensure that your CI logs contain enough information to fully reproduce a test case locally.

Anyway, having this random behavior in the CI pipeline was not a good experience, we could never say for sure the change a developer committed was really causing the build issue.

Think of the flip side: if the fuzz testing wasn't there, nothing else would have caught it, so you'd have a false green. Sure it won't disturb your development/shipping experience, but it'll disturb production, instead.

Alexander
  • 3,562
  • 1
  • 19
  • 24
0

To quote AutoFixture:

"...designed to minimize the 'Arrange' phase of your unit tests in order to maximize maintainability. Its primary goal is to allow developers to focus on what is being tested rather than how to setup the test scenario.."

So I can see why you wouldn't want a test such as:

x = random int
actual = SquareRoot(x)
Assert(actual = x^2)

You would want to explicitly test max int, negative numbers, etc and be sure that the test is repeatable.

However, this isnt what AutoFixture is proposing. They are more interested in tests like

x = new Customer
x.firstname = ...
x.lastname = ..
x.middlename = ...
x.Address = new Address()
x.Address.Street = ...
....
x.Account = new Accout()
...
etc 
repo.Save(x)
actual = repo.Load(x.Id)
Assert(actual = x);

Now you can see that your test is unlikely to fail due to the values you assign to the various customer and sub classes fields. That's not really what you are testing.

But! it would save you a lot of typing and unimportant code if you could auto-populate all those fields.

Ewan
  • 70,664
  • 5
  • 76
  • 161
  • The examples on the Github page are really bad then because they only seem to show how important expected values are generated. The example is basically `Assert.Equal(expectedNumber, sut.Echo(expectedNumber));` where `expectedNumber` is generated. I figured they didn't want to show anything more complex because then they would need to reimplement the SUT's logic in the test. – kapex Jun 23 '21 at 08:54
  • yeah its difficult to see how that would work – Ewan Jun 23 '21 at 09:31
  • So, from your example, if repo.Save limits the number of chars in x.Address, the test may or may not fail, depending on the random value generated. (i.e. if the generated random value is short the test passes, if the value is long the test fails) I see the value of finding such edge cases, but such tests cannot be considered regression tests, they are fuzz tests, and I struggle to believe they should belong in the CI pipeline – Vinicius Scheidegger Jun 23 '21 at 11:00
  • 1
    @ViniciusScheidegger AutoFixture seems to generate UUIDs by default for all stings, so they are all of the same length. The library doesn't seem to be about finding edge case or fuzz testing. The purpose is rather to generate valid dummy values. For example if the customer name can't be null but otherwise doesn't affect the test, then instead of hardcoding a dummy name, you let AutoFixture generate a value. – kapex Jun 23 '21 at 13:03
  • @ViniciusScheidegger yes, I'm using a simple example here you would have to assume that the generated fields fall within the expected range. But note this question, which tells use the AutoFixture will use data annotations to pick the correct length. its not trying to break your test, its trying to save you typing : https://stackoverflow.com/questions/10125199/autofixture-configure-fixture-to-limit-string-generation-length – Ewan Jun 23 '21 at 14:43
  • I understand the primary goal of the library is not finding edge cases, but helping filling out dummy values. Doing so with random values is the key. I also get the usage of fixed length UUID and the possibility of limiting the generated values, this only limits the amount of the impact (in the test writer's idea, in a good way). The outcome is still non-deterministic. My point is that CI tests should be synonym for regression tests and use only deterministic "limit" values to find the expected behaviors (pass, fail, throw exceptions, etc). Introducing any randomness affects this principle. – Vinicius Scheidegger Jun 24 '21 at 22:50
  • @ViniciusScheidegger yes I agree in principle. but you ask "am i missing something?". Yes you are. in practice the tests are deterministic and the library has made your tests shorter – Ewan Jun 25 '21 at 11:58