The motivating concept here is that the fewer tests you have, the faster your test suite runs. This kinda feels like I'm basically describing smoke tests, but I think smoke tests and other tests are commonly treated as entirely separate test suites: separate steps in a pipeline. Broadly speaking, we may have smoke, integration, and unit test suites, but they're all separate from one another.
Imagine a build pipeline set up such that an integration test on components A B and C runs, and since it's designed to be somewhat comprehensive in its checking, it's deemed safe to skip individual unit test suites under components A and B so long as this test does not fail.
In my experience this level of granularity over dispatch is not generally provided in test frameworks.
What we can expect to see is that during relatively stable periods of development, only this integration needs to be run (sort of as a smoke test), which provides potentially big gains for pipeline throughput and latency because the unit test suites are usually skipped if nothing goes wrong. The system could still opt-in to execute all unit tests in times of low load, on a schedule, and/or on demand.
This provides benefits for situations in which a heavily tested component of a project approaches maturity and the overhead of executing tests begins to impact CI resources available for other jobs. This system would be a way to provide a level of granularity of control. Being able to skip tests for systems that have not changed is kind of a base case that this generalizes.
It could be represented as a skip-dependency graph. Tests could optionally be marked as being superceded by zero or more tests and thus be skippable if all of those pass.
I like this concept because in my experience, when working on complex systems, testing (and instrumentation that goes along with it) can potentially be prohibitively resource-intensive. What this forces upon us is to at least in some significant way dismantle that instrumentation from the deployment of the build or test pipelines of the system once we finish working on a particular component. Practically speaking, pressures materialize which are proportional to the comprehensiveness of testing of a particular component, and although this does balance (in an organization that is well-adjusted!) with the relative importance of that component (and is perhaps a good natural force to regulate over-engineering tests), I sometimes end up feeling that these pressures don't actually need to exist.
It seems that a prohibitively expensive full test suite could and should exist for the entire system, and to wire it up internally in such a way that it shortcuts the bulk of low level testing under almost all typical circumstances means that it could retain all of its potential exhaustiveness while still being fast enough to actually use all the time. If I have a very low level exhaustive test of an algorithm that generates a lot of cases to validate, consuming lots of time and energy to execute, it makes a lot of sense to automatically gate a test like this on the success of a series of less expensive tests of the algorithm. If this can be done, then the exhaustive test can actually remain in the system, hooked up and ready to be called upon to provide its value when it is needed, and to not have its drawbacks (time and energy) impact the pipeline otherwise.
Are there test frameworks that provide an implementation of this concept? If so, does it have a name?
In either case, what might be some good approaches we could use to avoid overmarking and overculling the testing?
Perhaps I could capture the underlying concept without something as general as requiring all tests to live under the same umbrella where highest-level smoke tests are able to assert relations upon the lowest-level unit tests that perform exhaustive checks, since that seems overambitious and rife with potential for dependency nightmares, so maybe it is, and should be sufficient, to merely afford the gating dependency across one level of the "test stack", as it were.
As such, all we'd need instead of one test runner to unite them all (and all the practical challenges that come with that) is a mechanism at every level of the test stack that is able to specify a subset of the tests to run on the next lower stack. That does seem a lot more tractable. I can't decide if this has much of a chance of working out in reality, though, because it's probably not that reasonable to expect to engineer the high level tests (esp. if we're simultaneously trying to optimize them to complete quickly) to fail as a precondition for even being able to run those lower level tests. But even if we had to relegate running a full (full without loss of generality) un-gated run to once a week on the off hours of the weekend (weekly without loss of generality), then it's still way better than having every single test that failed to make the cut be actually dismantled (this would be a loss of capability). You'd at least have a one-week resolution on those results!
And i'll also offer the notion that things could also get a lot more nuanced than simply predicating on test success vs. failure, as you can have routines which measure performance to help catch performance regressions, and all the same logic as above applies to this as well. Dependent actions could also encompass more things than just running more tests. They could perform automated VCS bisection to help pinpoint regression introductions, for example. These additional dimensions of test logic would contribute to resource consumption and bring more relevance to these proposed mechanisms to be smarter about what to run when.
We could continue even further by trying to bring machine learning into the picture and start really handwaving, but I'll stop here.