Won't a characterization/regression test fail when a bug is fixed?

Question

Let's assume that you are trying to refactor a legacy code to make it easier to understand and more testable, how can you do that?

In the book "Working with Legacy Code", a characterization/regression test was recommended. First, you start with a test that invokes the part of the code you want to refactor (e.g. a big method). What this test tells you is for that input you should expect that output. Therefore, if you refactor and the output has changed, this means something was broken.

The assumption here is that the legacy code is harder to understand and thus harder to test. You can find this case explained in another question here: Writing tests for code whose purpose I don't understand

Now you have a refactored code that you can start writing unit tests for (e.g. you broke the big method to smaller testable ones).

Unit tests might help you uncover bugs that you need to fix. Fixing those bugs will make you unit tests pass but will break the characterization/regression tests. Those tests are expecting a certain output which is the output of the code before the bug was found and fixed.

So what is the appropiate course of action here? Can I just remove/uncomment the test? I have seen in the book "Working with Legacy Code" an example of a test that was removed after refactoring. Does this apply here?

Do you mean "what happens if your refactoring uncovers existing bugs"? The same as when you identify any other existing bug - you change the test first to match what the result _should_ have been, and then use that to verify the bugfix. — Useless, Apr 22 '20 at 14:07
The regression testing helps ensure that the behaviour doesn't change. When the behaviour *needs to change*, whether because you found a bug or are implementing a new feature, common sense should tell you the test needs updating accordingly. — jonrsharpe, Apr 22 '20 at 14:16
The purpose of a "characterization" test is not to uncover bugs. Its goal is to help you refactor a legacy code (i.e. a code that has no tests and is difficult to test) while making sure you did not break it. After this phase you should end up with a better code that you can write unit tests for. Later on, you can start adding unit tests. My question is; fixing any bugs that are uncoverd by the unit tests will breaks the "characterization" test. So how do you deal with that? Are the "characterization" tests still needed or can you just comment them out? — Abdelrahman Shoman, Apr 22 '20 at 14:19
Be aware that if your new unittests uncover a bug that would break the characterization test, there is also the possibility that there is a second bug that reverses the negative effects of the first bug. This is most likely the case when the experts are absolutely sure that the expected results for the characterization test are correct. — Bart van Ingen Schenau, Apr 23 '20 at 06:16
"Refactoring" always, absolutely always means changing the code without changing any behaviour. Including bugs. Fixing bugs is a second step. I once had refactored code that said literally "if (condition1 and condition2 and condition3) crash();". I fixed the bug by removing that line, _after_ refactoring. — gnasher729, Sep 08 '20 at 05:36

VoiceOfUnreason · Accepted Answer · 2020-09-08T01:22:28.530

Won't a characterization/regression test fail when a bug is fixed?

Yes, usually.

So what is the appropiate course of action here? Can I just remove/uncomment the test?

"It depends".

You shouldn't be thinking of the characterization test as something that lives forever. That style of test is brittle when measuring a system whose intended behavior is changing over time. So we want to retire (deprecate) those tests when they are no longer providing value -- which is to say, when other (less brittle) tests have made the characterization test redundant.

If a deprecated test fails, then you of course remove it.

The problem case occurs when the characterization test fails (because of an intended change in the behavior) and that characterization test is not entirely duplicated by our stable tests.

The right thing: The "golden master" is, in effect, a prediction about what output we should see from the test subject under specific conditions. The thing that we are really doing with the test is checking that the current behavior of the system matches our current prediction.

(It just happens, by accident, that our existing prediction is that the system under test will behave the same way it did yesterday.)

So when you are intending to change the behavior, the correct thing to do is treat it as a test first problem; update the golden master (by hand, if necessary) so that it predicts the new behavior that you want, verify that the new prediction is not satisfied by the old implementation, patch the bug, verify that the new prediction is satisfied by the new implementation.

Because you have made changes to the prediction by hand, the prediction acts as a check against other unintended changes in the behavior. In effect, it acts as a way to measure how well you can predict the real effects of your change.

In practice, you are more likely to see people fix the implementation (hopefully adding new changes for the specific change), generating a new copy of the output, and then running an audit on the differences between the old master and the new -- when those differences are finally satisfactory, you replace the master, and use the updated characterization test until you can finally retire it.

Upvoted. However I have never seen a case where we were so convinced of our unit tests that it made sense to retire our regressions tests. Quite the opposite: regression tests for fairly complex calculation or data processing steps always provided additional value for us, and we usually extend both - small unit tests as well as larger regression tests. — Doc Brown, Apr 22 '20 at 15:38
... let me add that I can acknowledge especially the last paragraph from practical working. This is often a very cost-effective tactics, and it works best when the output is tailored to be "diff"-able. Of course, you need devs with the discipline not to replace the old output by the new one blindly. — Doc Brown, Apr 22 '20 at 15:48
@DocBrown this is a special case of using regression. I am using it while refactoring legacy code so I make sure I do not break anything. However, not breaking anything can mean you are preserving bugs in the code. This is fine as long you're objective is refactoring, but after that you will need to fix those bugs which will break some of those tests. In this case I believe it's more apporpiate to call them "Characterization" tests rather than regression. They characterize what the legacy code currently does, but they do not tell you if what it does is correct. — Abdelrahman Shoman, Apr 24 '20 at 10:52
@AbdelrahmanShoman: I know precisely what you mean, have been there by myself. My point is, you should not expect "once your refactoring is done", you won't need those tests any more, whatever you call them. Long-living applications will be extended over their lifetime, and you need to expect the necessity of extending and refactoring the application over a longer life time in a recurring manner. It is tempting to throw away a test which doesn't work because of a bug or because of a changed behaviour, but sooner or later you will regret that you did not invest this little bit of effort ... — Doc Brown, Apr 24 '20 at 12:29
.. to bring the test up-to-date. So even if you start those tests as "characterization tests" only, over time they will become regression tests, or integration tests, or both in one. So where I disagree to this (otherwise good) answer here is the sentence *You shouldn't be thinking of the characterization test as something that lives forever* - IMHO you will be better off by thinking how you can *keep* the characterization test as something which has a long-lasting value. — Doc Brown, Apr 24 '20 at 12:30

score 0 · Answer 2 · answered Sep 08 '20 at 13:42

You are looking at the wrong problem. “My test failed” is not a problem, it’s a feature. “My code changed its behaviour” is the problem.

With legacy code, changing the behaviour can be a problem. Even changing behaviour by fixing a bug can be a problem. You need to carefully investigate if the change is acceptable. And then you change the behaviour and the test, or you don’t.

Won't a characterization/regression test fail when a bug is fixed?

2 Answers2