5

Currently working on a Ruby-on-Rails codebase, testing is done with Rspec. We use BuildKite to run tests.

We are in a situation where sometimes tests will fail, and upon a retry or two, pass. AKA, we have flaky tests.

Initially, we thought there would be some quick way to add a related rspec gem that recorded all test failures and surfacing the issue that way. But upon some research, we are kind of surprised that there aren't any established or obvious solutions to this problem.

Do you all have any good tips on how to track and deal with flaky tests in Rspec?

Julien Chien
  • 151
  • 3

3 Answers3

4

Flakiness comes from randomness, the sources of this randomness might be different. But let's not focus on that, let me give you a guide I learned from an awesome engineer who is a SO user I need to credit: https://stackoverflow.com/users/273699/dnnx.

The first thing to do when you spot a failing spec: note down the seed and all the files that have been executed in a suite (this can be your whole project, or in case of huge codebases - this may be just a subset of all tests when the whole suite is split into parallel parts, I'd just assume spec/* is the set and 9000 is the seed).

The simplest thing to do is to run the failing spec with this seed

rspec spec/your_failing_spec.rb:33 --seed 9000 # I assumed this is the single assertion that fails

if it fails you probably are...

Using rand to initialize the state

e.g. let(:something) { create(:foo, status: FOO::VALID_STATUSES.sample). Then you deal with it accordingly.

If you're not lucky, no worries yet. Run the whole set with the --bisect flag.

rspec spec/your_failing_spec.rb --seed 9000 --bisect

If you're lucky, and this command gives you the minimal command to reproduce you probably have...

Spec's state leaking from other specs

Like I recently had one search spec indexing data in the ElasticSearc, not properly cleaning up after itself, and causing other search spec to fail if the order was right.

Sometimes --bisect hangs indefinitely

In those desperate times - do it manually. Remove half of the files from your set, run.

  • If it still fails: remove half again, repeat
  • otherwise, bring back what you've removed, remove the other half - run again.
  • rinse and repeat until the set is small for you to analyze using your brain. You can break any time and try re-running automatic rspec --bisect on a smaller set, sometimes it's enough to unblock it).

Sometimes, for whatever reason, you don't have the SEED

What I do is, run such script after I finish work:

while rspec spec/*; do ; done;

If you're lucky, you should have a failing run after some time with a seed.

The examples of why one spec would make another fail are too many to enumerate here, but usually the preceding spec is leaving something in the app state. Could be junk in the DB (use DB cleaner), or monkey-patched class that was not reverted, etc.

Bisect says the failure is not order-dependent.

(or the manual bisect got you nowhere) That's a tough one. I don't remember if I even solved such a case.

It might not be the seed that's the source of the random failures. Are you running specs in parallel? If so - are there 100% separated? If they use the same DB they're not separated. If they store tmp/cache files in the same directories - they might be interfering with each other.

Consider timezones

We had a lot of TZ related flakiness. Specs would fail on Fridays because a code should schedule something on the next working day, and on Fridays, it would be Monday. Use something like timecop and freeze time in those examples.

That's just a basic guide, I might have forgotten something. But this should be enough to get you started and weed out most of the flakiness. You'd have to be creative with the rest.

If flakiness is a big or growing problem in your project

Coach your team how to hunt them, make it a celebrated achievement to kill flaky spec. When they learn how to fix them - they also learn how not to introduce them, and how to spot them in reviews.

I don't think there's any way to prevent them the way you prevent regression. In a big enough project, they will happen. They will grow if you ignore them, but you can keep them in check when you kill all those that are easy to track. You can ignore the really pesky ones or solve them as a hobby if you want.

In my current work, thanks to this approach, on a project with 700+ models, and a business layer involving 5000+ complex actions and more than 6000 spec files, we managed to reduce this issue from annoying PITA disturbing everyone a few times a month to a nuance: we're "lucky" if you see a flaky spec once a quarter.

Greg
  • 425
  • 3
  • 6
1

A couple of tracking options I have seen

Roll it yourself:

  • output JUnit XML reports using a gem like rspec_junit_formatter
  • save them to somewhere like S3
  • write a custom script to analyze the XML files, pushing the results to a CSV

Open source:

  • Flexport's quarantine is ruby specific and stores flaky test data in dyanmodb or google sheets
  • rspec-rety will help with the flaky tests, there might be a way to fire a callback on retry that records the flake

SaaS:

  • TestRecall does all the XML report analysis and shows the data/visuals on flaky tests over time, making it really easy to find flaky tests and track trends. There is a ruby example that boils down to adding that rspec junit gem and uploading the results. Full disclosure I work on TestRecall
0

In case you don't have a seed and Grzegorz's script didn't work for you, you can try to execute this script:

for i in `seq 50` ; do rspec spec ; [[ ! $? = 0 ]] && break ; done

This script was taken from here.

lennon310
  • 3,132
  • 6
  • 16
  • 33