Flakiness comes from randomness, the sources of this randomness might be different.
But let's not focus on that, let me give you a guide I learned from an awesome engineer who is a SO user I need to credit: https://stackoverflow.com/users/273699/dnnx.
The first thing to do when you spot a failing spec: note down the seed and all the files that have been executed in a suite (this can be your whole project, or in case of huge codebases - this may be just a subset of all tests when the whole suite is split into parallel parts, I'd just assume spec/*
is the set and 9000 is the seed).
The simplest thing to do is to run the failing spec with this seed
rspec spec/your_failing_spec.rb:33 --seed 9000 # I assumed this is the single assertion that fails
if it fails you probably are...
Using rand to initialize the state
e.g. let(:something) { create(:foo, status: FOO::VALID_STATUSES.sample)
. Then you deal with it accordingly.
If you're not lucky, no worries yet. Run the whole set with the --bisect
flag.
rspec spec/your_failing_spec.rb --seed 9000 --bisect
If you're lucky, and this command gives you the minimal command to reproduce you probably have...
Spec's state leaking from other specs
Like I recently had one search
spec indexing data in the ElasticSearc, not properly cleaning up after itself, and causing other search
spec to fail if the order was right.
Sometimes --bisect hangs indefinitely
In those desperate times - do it manually. Remove half of the files from your set, run.
- If it still fails: remove half again, repeat
- otherwise, bring back what you've removed, remove the other half - run again.
- rinse and repeat until the set is small for you to analyze using your brain. You can break any time and try re-running automatic
rspec --bisect
on a smaller set, sometimes it's enough to unblock it).
Sometimes, for whatever reason, you don't have the SEED
What I do is, run such script after I finish work:
while rspec spec/*; do ; done;
If you're lucky, you should have a failing run after some time with a seed.
The examples of why one spec would make another fail are too many to enumerate here, but usually the preceding spec is leaving something in the app state. Could be junk in the DB (use DB cleaner), or monkey-patched class that was not reverted, etc.
Bisect says the failure is not order-dependent.
(or the manual bisect got you nowhere)
That's a tough one. I don't remember if I even solved such a case.
It might not be the seed that's the source of the random failures. Are you running specs in parallel? If so - are there 100% separated?
If they use the same DB they're not separated. If they store tmp/cache files in the same directories - they might be interfering with each other.
Consider timezones
We had a lot of TZ related flakiness. Specs would fail on Fridays because a code should schedule something on the next working day, and on Fridays, it would be Monday.
Use something like timecop and freeze time in those examples.
That's just a basic guide, I might have forgotten something. But this should be enough to get you started and weed out most of the flakiness. You'd have to be creative with the rest.
If flakiness is a big or growing problem in your project
Coach your team how to hunt them, make it a celebrated achievement to kill flaky spec. When they learn how to fix them - they also learn how not to introduce them, and how to spot them in reviews.
I don't think there's any way to prevent them the way you prevent regression. In a big enough project, they will happen. They will grow if you ignore them, but you can keep them in check when you kill all those that are easy to track. You can ignore the really pesky ones or solve them as a hobby if you want.
In my current work, thanks to this approach, on a project with 700+ models, and a business layer involving 5000+ complex actions and more than 6000 spec files, we managed to reduce this issue from annoying PITA disturbing everyone a few times a month to a nuance: we're "lucky" if you see a flaky spec once a quarter.