Should test cases show desired behavior or actual/current behavior, or can they do both?

Question

Let's say I want to add tests to a software that has some flaws/bugs.

I want to add the tests before I fix the bugs. And I want to actually merge them in the main branch, not just have them in a feature branch or bugfix branch somewhere.
(EDIT: See below as to why I might want to do this.)

If this is relevant, I am working with phpunit. But I am looking for more general answers.

I see the following options:

Create failing tests that describe the desired behavior, and merge those into the main branch. This means any QA pipeline has to be suppressed.
Create passing tests that describe the current (flawed) behavior, and merge those into the main branch. Add @todo comments if possible. Update them later as part of the bug fix.
Use a fancy mechanism that allows switching between "desired" mode and "current" mode. Running the tests in "desired" mode would make them fail, but running them in "current" mode would make them pass.
Don't merge the tests, until I fix them.

Personally, for my own projects, I like options 2 and 3.

But when working in a team, I want to be able to justify my strategies with online references, instead of just personal opinion.

So, is there any named "best practice" or "pattern" that matches option 2 or 3 above? Or am I missing something, and there are even better strategies available?

EDIT: Why merge the tests before fixing the bug?

(I am adding this section to avoid having to put all of it into the comments.)

Why would I want to merge the tests into the main branch, before the bug fix? This has been a contention in the comments, so I am summarizing possible reasons here.

Possible reasons why we can't fix the bug now:

The bug fix might be risky or disruptive, perhaps we want to postpone it for a new major version.
The bug fix, unlike the test, needs dedicated effort by the maintainer, whereas the new test can be developed by a contributor.
There is no budget / resources to fix the bug now.
The existing behavior is not fully understood, and we don't really understand the implications that fixing it would have.
We know the current behavior is wrong, but we don't know have precise spec for the desired behavior. Perhaps we need feedback from the client or management.
The bug has been discovered while doing something else, and fixing it now is out of scope.

So why would we want to merge the tests anyway?

(assuming option 2 from above)

The tests cover more than just this one bug.
We want to reward and credit contributors who provide tests, even before fixing a bug.
We want to provide a starting point for contributors who attempt to fix the bug.
We want a git diff of the fix to give a precise overview of behavior change.
We want to detect when the bug might be fixed as a side effect of other changes.
We want to detect when the wrong behavior changes to a different flavor of wrongness, as a side effect of other changes.
If a user of the software package discovers something that looks broken, they can look into the tests to find the behavior documented, and marked as a "known issue", with link to the issue queue.
The git history will reveal more information about the difference in behavior before and after the fix. It will also reveal in which version the problem was fixed.

Why do you want to add the tests to your main branch before you fix the bug? — mmathis, Jan 14 '22 at 03:11
Because then I can also catch side effects from other changes I might do before that bug is fixed. Perhaps fixing it is more complicated, and the tests cover a wider range of functionality. — donquixote, Jan 14 '22 at 03:16
Also, perhaps some users or other packages depend on aspects of the old "faulty" behavior. So the test for the old behavior can reveal the full scope of change from the bug fix. — donquixote, Jan 14 '22 at 03:23
IMO, if one of the roles associated with the tests is to *describe* behavior, then they should describe the *correct* behavior; however if you're refactoring, you shouldn't change the overall behavior of the system, just the structure and the way different components (old or newly created) interact. So you can write a temporary "safety net" test that allows you to do the restructuring first, to get you to a point where you can begin to understand what the hell is going on in the code - this sort of test could do something "stupid" like concat all the output in a string, and then check 1/2 — Filip Milovanović, Jan 14 '22 at 08:24
... if you get the same string when you shuffle code around (so, you're not even thinking what the behavior really means). Once you get to a point where you have established concepts (workable methods/classes) in your code that you can reason about on some level, you can write tests that *describe* (or, I guess, prescribe) the correct behavior for those, one at a time (probably not a good idea to do it all at once), and bring the system into shape bit by bit. Eventually throw away the ad hoc "safety net" test. 2/2 — Filip Milovanović, Jan 14 '22 at 08:24
There is a podcast by Brian Okken where he and a guest discuss a feature called xfail (tests which are known to fail) https://share.fireside.fm/episode/DOAjrBV2+iHYGDqvh . The podcast is about pytest but as far as I remember they also discuss quite generally the question if failing tests may or shouldn’t be merged into the main line. — Hartmut Braun, Jan 14 '22 at 12:22
@mmathis: [test-first programming (TDD)](http://en.wikipedia.org/wiki/test-driven_development) requires writing tests that are expected to fail before fixing bugs. TDD is part of agile software development. — David Cary, Jan 14 '22 at 17:57
@DavidCary But you don't write the (failing) test and *merge it* to your main branch without actually fixing the bug, do you? Sure you write the test first, but you fix the bug at the same time... — mmathis, Jan 14 '22 at 18:11
@mmathis: Many people recommend ["commit early, commit often"](https://softwareengineering.stackexchange.com/questions/83837/when-to-commit-code), so I definitely commit after I write a (failing) test and before I fix the bug. That test doesn't make the application any "less shippable" than it was before (it has zero effect on the end-user experience), so ["merge early, merge often"](https://softwareengineering.stackexchange.com/questions/395021/is-it-better-to-merge-often-or-only-after-completion-do-a-big-merge-of-feature) seems to imply merging right away. — David Cary, Jan 14 '22 at 19:34
@DavidCary Relevant to "should I merge a failing test?": https://softwareengineering.stackexchange.com/questions/201743/tdd-and-version-control - Overall consensus seems no. — donquixote, Jan 14 '22 at 20:10
@donquixote: If I'm reading that right, overall consensus seems to be, when "the test doesn't even compile", don't merge. I don't see how that's relevant to test cases that compile and run (and print out "failed") and the rest of the application is just as shippable as it was before. At least one commenter there recommends "For a large bugfix, commit after each improvement (e.g. refactoring), even if the bug is not fixed yet." and I claim adding a test that fails is an "improvement" that should be treated the same as a refactoring. — David Cary, Jan 14 '22 at 23:52
@HartmutBraun Very nice episode! xfaiil seems a good idea, but then the test won't describe in detail the current behavior. We want to detect if the buggy behavior changes over time, even if it is still wrong, but in a different flavor of wrong. — donquixote, Jan 15 '22 at 00:44
@DavidCary In [this reply](https://softwareengineering.stackexchange.com/a/201773/113434) I see "test driven development - which says nothing about committing code.". But I think the devil is in the detail: We don't want these failing tests to block the CI for other development. So if we can somehow mark them as "known to fail", in a way that is understood by CI, then we can merge them. — donquixote, Jan 15 '22 at 00:52
@donquixote then you need two tests: one that describes the correct behaviour and is marked as xfail, and another test that describes the current behaviour which is run normally. The second test should then be marked as „known problem“ and requires a reference to the correct-behaviour-test. — Hartmut Braun, Jan 15 '22 at 06:32
Are you asking about unit tests, integration tests or functional tests? The type and complexity of tests can make a difference. — Greg Burghardt, Jan 15 '22 at 20:01
@GregBurghardt When asking the question I was mostly thinking of integration tests and functional tests, and some unit tests for non-trivial components. — donquixote, Jan 16 '22 at 03:24
Ok. I think you are getting a variety of answers because most of us are thinking unit tests. Integration tests or functional tests change the question. You might get a better answer by posting another question scoped specifically to those kinds of tests. Plus a little more info about the complexity of the tests. — Greg Burghardt, Jan 16 '22 at 13:59
@GregBurghardt Or alternatively, it could be part of an answer to distinguish when to use which strategy. There might be a line somewhere, but I don't think it always has to be between integration vs functional vs unit tests. I think it applies to any case where the thing you test is too complex or too important to fix immediately. For a lot of unit tests, this is not the case, but for some it might be. Btw, I might post an answer myself, once I have developed a consistent opinion about it. — donquixote, Jan 16 '22 at 15:25

mmathis · Answer 1 · 2022-01-14T03:24:09.947

Add the tests at the same time that you fix the bug (your option 4). The final tests would then describe the desired, non-buggy behavior because they would pass. This is pretty typical for bug-fixing.

Unless you're dealing with a massive bug, I can't see much reason to spend effort writing and merging tests that you know are incorrect, only to have to come back at some point in the future and update those tests again. There's also the possibility that you look at the passing tests and close the bug ticket as OBE or not reproducible or some such - maybe not likely, sure, but why chance it.

Edit to clarify: you can certainly write the tests with the current, buggy behavior; fix the bug; and update the tests to be sure you're not breaking anything else (although those other things have tests, too, right?). But don't spend the effort merging tests with incorrect expected results into your main branch. And definitely don't merge failing tests into your main branch!

When you say "wrong tests" do you mean failing tests or tests with "wrong" expectations? — donquixote, Jan 14 '22 at 03:23
@donquixote You should never merge failing tests to your main branch, but I was referring here to tests with "wrong" expectations. Updated the answer to make that more clear — mmathis, Jan 14 '22 at 03:26

score 4 · Answer 2 · answered Jan 14 '22 at 08:14

4

Short answer: They can do both.

Slightly longer short answer:

Regression Tests describe the current system,
Progression Tests describe the target system.

Long Answer:

Option 1, is called Progressive Tests. More useful at the end to end/integration testing/use case level these tests serve as goals to be achieved. Its understandable that these tests are failing for a long time, and the QA part of the build pipeline should be aware of these. ie. no build with a failing progressive test goes to production, but it can go for local testing.

Option 2. Is refactoring. It is particularly good in areas of code that have descended into murkyness. Write a suite of tests, and (very importantly) refactor the code back into legibility, before doing any code fixes.

Option 3. ... Its horrible... Please don't do this. How are you supposed to know in which "special" mode the test is working. How many times do you have to re-run the test to ensure each of its modes works. How will you even known to re-run the test? If there is an anti-pattern here this is it.

But i could be miss reading you, see option 5 below.

Option 4 is what general development is all about. Make the changes to the code and to the tests necessary to describe the behavior change. When it is working check it in.

There is an option 5:

Option 5. It's useful when you want to preserve old behaviour even for a short amount of time. Write separate tests for each behaviour in the different supported modes. Each test is responsible for configuring the feature toggles for setting up the mode it is testing.

Should you no longer support a mode, delete the tests and the code behind the toggle. Simple.

answered Jan 14 '22 at 08:14

Kain0_0

15,888
16
37

"Option 3. ... Its horrible... But i could be miss reading you". In a test, I might write `if (knownIssueExists()) { doThis(); } else { doThat(); }`. Later, when fixing the bug, we replace the entire if/else with just ` doThat();`. There could be a cli/env param to make `knownIssueExists()` return FALSE - but by default, or in a CI pipeline, it would return TRUE, and perhaps write to a log somewhere to turn the result yellow (not red). – donquixote Jan 14 '22 at 21:35
More advanced, we could have different bugs represented as classes or methods, and then `if (KnowIssue12345::exists()) { assertThis(); } else { assertThat(); }`. Then a developer could look into the known issues directory and find documented bugs. When fixing a bug, they would delete the respective class and all references to it, leaving only `assertThat();`. Later you could look into the git history and find when exactly a documented bug was fixed. – donquixote Jan 14 '22 at 21:38
Tbh, I have not seen this kind of stuff anywhere else, and have not really thought it through to the end. It seems in most projects this would be a rare case anyway, because most bugs would be fixed directly. A simple `// @todo` comment might be good enough for the remaining cases. – donquixote Jan 14 '22 at 21:41
Btw.. seems like a "Progression Test" becomes a "Regression Test" once the feature is implemented. – donquixote Jan 14 '22 at 23:38
Refined version of the above: `try { self::assertDesired(); self::fail('Fixed, remove this'); } catch (AssertionFail $e) { KnownBug12345::report($e); self::assertActualBadBehavior(); }`. The reference to KnownBug12345 can be used for "find usages" search. – donquixote Jan 15 '22 at 01:05
@donquixote: since your main concern is integration and functional tests, feature toggles on the tests are your best option. There is no easy answer, and you will likely need a custom solution. – Greg Burghardt Jan 16 '22 at 16:20
@donquixote I'm not saying you can't write it. I'm not even saying it won't work. But if I didn't know of this particular code pattern in use in the code base, my first presumption would be that the tests are flaky, and unreliable. Because when run with one flag they work, when run with the other they don't. True it is like the Progression/Regression case I highlighted, but there is a difference. Those are part of QA/Test culture, if something was titled progression I would understand that their failure would be expected, because they are being worked on now. – Kain0_0 Jan 16 '22 at 22:27

Greg Burghardt · Answer 3 · 2022-01-14T12:29:36.477

3

Most times I agree with mmathis. Submit the updated tests along with the bug fix. There are cases where you might find a defect, but don't have time to fix it, or fixing that defect is out of scope for your current task. Not every test needs to pass or fail. Many test frameworks come with a third outcome called "inconclusive", "pending" or some similar name — basically a test that neither passes, nor fails.

I work mostly in .NET using MS Test. I have written failing unit tests before, but instead of letting it fail, I call Assert.Inconclusive("should begin passing when bug #1234 is fixed") at the beginning of the test. A quick search for PHPUnit seems like $this->markTestIncomplete("should begin passing when bug #1234 is fixed") might be a good choice.

This usually doesn't fail the build, but the test run might not appear "green", nor will it appear "red". When encountering these "in between" test outcomes, some test runners will report that the tests did not pass and did not fail. The test run would be appear "yellow". This might require some explanation to teammates or management.

The real challenge is to eventually fix the application, and get that test running and passing, but in the meantime at least you have a test that will fail if you comment out a line of code. That gives the future bug fixer an easy way to jump into correcting the code.

edited Jan 14 '22 at 12:29

answered Jan 14 '22 at 03:50

Greg Burghardt

34,276
8
63
114

Hm. With "->markTestIncomplete()", further assertions in that test won't be reached, even if these might cover valid behavior. E.g. perhaps the bug is about a string output being wrong in some cases, but the rest is still ok. Would be nice to have something that just logs or prints the warning, instead of completely failing. Or something where you can pass a cli param to switch to a different mode. – donquixote Jan 14 '22 at 04:16
1

Some test frameworks also have an "expected to fail" annotation that you can use on a testcase. That way, you can merge the testcase with the correct expected behavior in it before the bug is fixed and without breaking the mainline builds. – Bart van Ingen Schenau Jan 14 '22 at 07:18
@donquixote: in a case like this, I would advise *not* making multiple assertions in a single test. Ideally it should be single test, single assertion. Especially in this situation. – Greg Burghardt Jan 14 '22 at 12:26
And focusing on a single assertion allows you to give that failing test a good name, which can be just as important as writing the test in the first place. This is an opportunity to better describe the situation. – Greg Burghardt Jan 14 '22 at 12:30
@GregBurghardt If a test case simulates a process with multiple steps, it seems a good idea to have multiple assertions along the way. Or it can build up a "log" of different observations, and later assert the contents of the log against an expected log. The bug might only concern one part of this process, e.g. a wrong message being shown somewhere. If we ant to split this up, we have to faithfully simulate the intermediate state before each step of the process. – donquixote Jan 14 '22 at 21:26

Should test cases show desired behavior or actual/current behavior, or can they do both?

EDIT: Why merge the tests before fixing the bug?

3 Answers3