Is there a pattern for unit/integration testing where tests that are higher level are intended to act as "gates" for other more specific tests?

Question

The motivating concept here is that the fewer tests you have, the faster your test suite runs. This kinda feels like I'm basically describing smoke tests, but I think smoke tests and other tests are commonly treated as entirely separate test suites: separate steps in a pipeline. Broadly speaking, we may have smoke, integration, and unit test suites, but they're all separate from one another.

Imagine a build pipeline set up such that an integration test on components A B and C runs, and since it's designed to be somewhat comprehensive in its checking, it's deemed safe to skip individual unit test suites under components A and B so long as this test does not fail.

In my experience this level of granularity over dispatch is not generally provided in test frameworks.

What we can expect to see is that during relatively stable periods of development, only this integration needs to be run (sort of as a smoke test), which provides potentially big gains for pipeline throughput and latency because the unit test suites are usually skipped if nothing goes wrong. The system could still opt-in to execute all unit tests in times of low load, on a schedule, and/or on demand.

This provides benefits for situations in which a heavily tested component of a project approaches maturity and the overhead of executing tests begins to impact CI resources available for other jobs. This system would be a way to provide a level of granularity of control. Being able to skip tests for systems that have not changed is kind of a base case that this generalizes.

It could be represented as a skip-dependency graph. Tests could optionally be marked as being superceded by zero or more tests and thus be skippable if all of those pass.

I like this concept because in my experience, when working on complex systems, testing (and instrumentation that goes along with it) can potentially be prohibitively resource-intensive. What this forces upon us is to at least in some significant way dismantle that instrumentation from the deployment of the build or test pipelines of the system once we finish working on a particular component. Practically speaking, pressures materialize which are proportional to the comprehensiveness of testing of a particular component, and although this does balance (in an organization that is well-adjusted!) with the relative importance of that component (and is perhaps a good natural force to regulate over-engineering tests), I sometimes end up feeling that these pressures don't actually need to exist.

It seems that a prohibitively expensive full test suite could and should exist for the entire system, and to wire it up internally in such a way that it shortcuts the bulk of low level testing under almost all typical circumstances means that it could retain all of its potential exhaustiveness while still being fast enough to actually use all the time. If I have a very low level exhaustive test of an algorithm that generates a lot of cases to validate, consuming lots of time and energy to execute, it makes a lot of sense to automatically gate a test like this on the success of a series of less expensive tests of the algorithm. If this can be done, then the exhaustive test can actually remain in the system, hooked up and ready to be called upon to provide its value when it is needed, and to not have its drawbacks (time and energy) impact the pipeline otherwise.

Are there test frameworks that provide an implementation of this concept? If so, does it have a name?

In either case, what might be some good approaches we could use to avoid overmarking and overculling the testing?

Perhaps I could capture the underlying concept without something as general as requiring all tests to live under the same umbrella where highest-level smoke tests are able to assert relations upon the lowest-level unit tests that perform exhaustive checks, since that seems overambitious and rife with potential for dependency nightmares, so maybe it is, and should be sufficient, to merely afford the gating dependency across one level of the "test stack", as it were.

As such, all we'd need instead of one test runner to unite them all (and all the practical challenges that come with that) is a mechanism at every level of the test stack that is able to specify a subset of the tests to run on the next lower stack. That does seem a lot more tractable. I can't decide if this has much of a chance of working out in reality, though, because it's probably not that reasonable to expect to engineer the high level tests (esp. if we're simultaneously trying to optimize them to complete quickly) to fail as a precondition for even being able to run those lower level tests. But even if we had to relegate running a full (full without loss of generality) un-gated run to once a week on the off hours of the weekend (weekly without loss of generality), then it's still way better than having every single test that failed to make the cut be actually dismantled (this would be a loss of capability). You'd at least have a one-week resolution on those results!

And i'll also offer the notion that things could also get a lot more nuanced than simply predicating on test success vs. failure, as you can have routines which measure performance to help catch performance regressions, and all the same logic as above applies to this as well. Dependent actions could also encompass more things than just running more tests. They could perform automated VCS bisection to help pinpoint regression introductions, for example. These additional dimensions of test logic would contribute to resource consumption and bring more relevance to these proposed mechanisms to be smarter about what to run when.

We could continue even further by trying to bring machine learning into the picture and start really handwaving, but I'll stop here.

Does this answer your question? [How to write unit tests a method with a result that is highly based on another method](https://softwareengineering.stackexchange.com/questions/389039/how-to-write-unit-tests-a-method-with-a-result-that-is-highly-based-on-another-m) — gnat, Dec 05 '20 at 06:44
Unit test frameworks expects tests to be independent. This makes it harder to use them for dependend integration tests. — Thorbjørn Ravn Andersen, Dec 05 '20 at 16:07
The "theory" of unit tests is that they are supposed to be _fast_, you run every minute if not every keystroke ... There are very easy and common ways to make them _slow_ - usually due to hacks used to make poorly written failure prone tests more "deterministic" (fail only 1 out of 10 times instead of 1 in 2). Those slow tests should identified and then taken out and shot and replaced with properly written fast tests. Then you have no problem with unit tests. I have done this with a large codebase with great results on the team. BTW, _fast_ means < 0.3s, preferably < 0.1s, per test. — davidbak, Dec 06 '20 at 17:15
P.S. Another problem sometimes found with large suites of unit tests is that they take a long time to _build_. This indicates that the unit tests are too big - including too many dependencies. It could possibly just mean a problem with the build system (great! easy to fix!) or it could signal a troubling lack of modularity in your system (hard to fix but probably worth it) or, most likely, just badly written unit tests (that are _not in fact **unit** tests_) (tedious to fix but worth doing). — davidbak, Dec 06 '20 at 17:19
Your proposed approach implies the ability to know which tests hide behind which gate. What's the expected failure rate on the gating configuration itself? Because if you make one mistake and this fails to catch a bug that now slips through, you've just broken your release. Is cutting down testing time really worth potentially compromising your release quality? Unit tests aren't even that slow to begin with as they are short, in-memory, and easily multithreadable. This seems like a matter of confused priorities. — Flater, Dec 07 '20 at 12:54
Secondly, you open yourself up to a different delay. In the classic unit-then-integration testing, you don't even start on the slower and lengthier integration tests until the quick unit tests have run. If you invert that order, that means that any tiny bug will need to run both the slow integration tests _and_ the subsequent unit tests for each failed integration test. That's not going to be faster. — Flater, Dec 07 '20 at 12:57

Flater · Answer 1 · 2020-12-07T14:55:21.423

There are a few red flags in your question and its premise.

1

Your question is based on a very unusual premise that your unit tests are somehow slower than your integration tests. If that is the case, then you need to be fixing your slow unit tests, rather than figuring out a way to avoid running them.
You didn't post any code so I can't diagnose the issue.

Unit tests should be significantly faster than integration tests, as they are short, in-memory, and easily multithreadable.

2

The motivating concept here is that the fewer tests you have, the faster your test suite runs.

The less my car weighs, the faster it drives. Seat belts add weight. Should I take out my seat belts?

What you say is correct, test runtime correlates to amount of tests. But what you're glossing over is that speed is not the goal of a test suite. The goal of a test suite is reliability.
If you want to compare it to anything, compare it to a Land Rover. It's not the fastest or prettiest car, but it's well-prepared and you can be damn sure that it will reliably get you to where you need to go.

3

Being able to skip tests for systems that have not changed is kind of a base case that this generalizes.

You are vastly underestimating how hard it is to know exactly how far-reaching some changes may be.

Your proposed approach implies the ability to know which tests hide behind which gate. What's the expected failure rate on the gating configuration itself? Because if you make one mistake and this fails to catch a bug that now slips through, you've just broken your release.

On top of that, having to manually construct an entire chain of causality for a codebase (that has reached a size where the performance of the test suite starts being prohibitive) is going to be massive.

This is why it's such a bad idea to skip your tests. You are relying on your developers to configure all the gates and doing it all correctly. But if you could rely on your developers doing everything right, then you wouldn't even need tests to begin with.

Testing starts from the acknowledgement that problems can slip through the cracks for anyone. If tests were only written for low-skill developers, it would be better to train your developers once than to write tests for all your projects. But that is just not the case.

As is often the case, when people try to justify skipping or avoiding tests (conditionally or not), it always hinges on some unspoken expectation that no one will make a mistake if they just agree to do things the right way.

Having spent literal years trying to convince people that testing is needed, you start to notice a pattern in their arguments and justifications. Yours is no different from this. It's well-intentioned, but it's trying to be clever.

As developers, we all like being clever and finding unusual solutions. But clever will bite you in the end because it's hard to read and not intuitive. Code golf is a shining example of how I can be deeply impressed at the cleverness of certain solutions while at the same time abjectly refusing to ever put it to use in my codebase.

The test suite is not supposed to be clever, it's supposed to be boring, predictable and rigid.

Think of tests as the code-translated equivalent of a (good) analysis. It's a boring read, it may be lengthy and pedantic, but it certainly answers all the questions.

4

What we can expect to see is that during relatively stable periods of development, only this integration needs to be run (sort of as a smoke test)

You're putting the cart before the horse. You can't know that the new changes are stable until you've tested them, so it makes no sense to skip tests based on some unproven presumption of stability.

What did you develop? Clearly you must've changed something. So why are you skipping the unit tests for that thing you've changed?

That is not to say that you should selectively run unit tests either. Your change could have caused another component to now break, as it depends on the changed logic.

5

This provides benefits for situations in which a heavily tested component of a project approaches maturity and the overhead of executing tests begins to impact CI resources available for other jobs.

If this component is frequently changed and the heavy testing is warranted, then this is just a logical consequence. The solution is then to increase build agent resources.

You know what doesn't need to be retested? Released software. Once it's built and tested, there's no point in ever running the tests for that same build again. The outcome will be the same (and if it isn't, then you wrote bad tests).

If the component does not frequently change (compared to the rest of the codebase), then it may be better to simply extract it from the codebase, put in on its own repository and give it its own build pipeline, test suite and release cycle, and have your codebase import it as an external dependency (e.g. Nuget package).

This way, your codebase no longer needs to test the component (any released version is presumed to have passed its tests) and you only need to retest your heavily tested component when it changes, not when completely unrelated consumers in your codebase change.

6

I like this concept because in my experience, when working on complex systems, testing (and instrumentation that goes along with it) can potentially be prohibitively resource-intensive.

The more complex a system is, the higher the need for it to be rigorously tested. Because people struggle to fully grasp complex systems, the test suite becomes even more important.

The bigger the codebase, the larger the test suite and the higher the need for testing. So there's two ways of dealing with this correctly: increased the build agent resources, or breaking up your codebase into multiple projects, each with an easier to manage life cycle.

7

We could continue even further by trying to bring machine learning into the picture and start really handwaving

Machine learning is built on the premise of trial and error. CI builds and test suites are built on the premise of avoiding errors as much as possible.

If you're already complaining about running your tests suites now, good luck in trying to run it a few 100,000 times, each time making controlled alterations in the logical flow to change the pass/fail state of individual tests, in the hopes that a machine will figure out the sequence of failures that lead to other failures.

Oh, and every time you make a change to your codebase, the situation changes, and the machine will have to learn again.

No, this is not a solution. Not by a long shot.

Across the board, your question makes me feel that either your unit tests are horribly written (in terms of performance and brevity), or that you're working with a codebase that has grown to a size and proportion where it should be really split into several different projects with their own lifecycles.

You talk about "systems" and claim to have knowledge than a change in A cannot cause a change in B. That suggests that these are completely separate bounded contexts, and therefore that then can be split off from each other.

The basis of the question is that there are many components, many of which do not change during a development cycle. This further suggests that the codebase has aggregated more responsibilities than it reasonably should.

Thanks! You make many good points here. I’ll need to take some time to respond to this. Your response is much appreciated. — Steven Lu, Dec 11 '20 at 08:15

score 0 · Answer 2 · answered Dec 06 '20 at 14:34

I haven't seen such a pattern, ie unit tests which conditionally run based on other tests.

However, what you are trying to achieve, skipping tests which you know will pass, is trivially done by separating code into libraries.

You then run all the tests for each library, when that library is compiled, but not when the compiled library is used in another project.

Generally if you have slow tests, say a complete end to end suite that you run against your live product, you don't want to skip any tests. You want to guarantee your product works. If it takes all night to run the tests, ok that's slow for a compile, but its fast for manual testing.

Is there a pattern for unit/integration testing where tests that are higher level are intended to act as "gates" for other more specific tests?

2 Answers2

Linked