14

I am curious whether there are metrics on whether code coverage actually improves code quality? Any research studies?

If so, at what percent does it become a case of diminishing returns?
If not, why do so many people treat it as a religious doctrine?

My skepticism is anecdotal and is brought on by 2 projects I was involved with - both implemented the same reasonably complex product. First one just used targeted unit tests here and there. Second one has a mandated 70% code coverage. If I compare the amount of defects, the 2nd one has almost an order of magnitude more of them. Both products used different technologies and had a different set of developers, but still I am surprised.

gnat
  • 21,442
  • 29
  • 112
  • 288
AngryHacker
  • 715
  • 1
  • 6
  • 11
  • I can imagine "any 70%" including the easiet to test. So "the best 50%" could be better – Richard Tingle Nov 15 '16 at 08:27
  • 14
    As stated, "code coverage", or "writing more tests" does not change a single bit of the actual code, so by itself it cannot change code quality whatsoever. – RemcoGerlich Nov 15 '16 at 08:47
  • 1
    Do you know the difference between improving quality on the one side, and improving the metrics implemented for that purpose on the other side? Well, if you force people to do something and they are rated according to some metrics, they will improve the metrics instead... – Bernhard Hiller Nov 15 '16 at 09:46
  • 2
    It does matter whether the code is compiled or interpreted. In Python you can get some really embarassing bugs in production, e.g. misspelled variable names, that would have been be caught by any test covering that code or by a compiler. But there are also linters that help. – RemcoGerlich Nov 15 '16 at 10:42
  • @RemcoGerlich: That has absolutely nothing to do with compilation or interpretation. You said it yourself: such bugs can happen with Python. Well, all currently existing implementations of Python have a compiler, so according to your own reasoning, such bugs should be impossible in Python. And, considering that there exist interpreters for C++, C, Haskell, those languages should be unsafe as well, right? – Jörg W Mittag Nov 15 '16 at 11:38
  • @JörgWMittag: Of course, and yet there is a clear difference between languages where such things are found at compile time and where they are found at runtime, and I think most people understand me when I describe the difference like that. – RemcoGerlich Nov 15 '16 at 11:40
  • I think that the only thing that code coverage does is providing initial guidelines which parts of the code possibly require testing. But you may reach high enough code coverage without any testing that code actually performs correctly, so it definetily shouldn't be the only guideline for writing tests and alone can't tell much about code quality. – Predelnik Nov 15 '16 at 11:50
  • 2
    http://thedailywtf.com/articles/at-least-there-s-tests – cbojar Nov 15 '16 at 12:05
  • Related: http://softwareengineering.stackexchange.com/questions/322256/time-difference-between-developing-with-unit-tests-vs-no-tests/322327#322327 – Jared Smith Nov 15 '16 at 12:10
  • 2
    @RemcoGerlich: I think the distinction is not interpreter-based versus compiler-based implementation but rather dynamically typed versus statically typed. E.g. a Haskell interpreter will type-check your code before executing it while even a Python compiler can find much fewer type errors. – Giorgio Nov 15 '16 at 12:44
  • @RemcoGerlich Come on! - there is a pretty clear assumption that bugs found will be fixed. If everything we wrote had to be protected against all pedantry, all writing would be as readable as legal contracts. I think S. J. Gould's later writing suffered from excessive pedantry-preemption. – sdenham Nov 15 '16 at 14:50

6 Answers6

19

Code coverage tells you how much of your code is covered by tests. It does not tell you much about the quality of the tests. For example, a code coverage of, say, 70% might be obtained by automated tests exercising trivial functionality like getters and setters and leaving out more important things like verifying that some complex computation delivers correct results, corner cases, and so on. Even if you have 100% code coverage, your tests might not consider special inputs to your code that cause it to fail. So, a relatively high code coverage does not necessarily imply that the code is well tested and therefore important defects may still not be detected by the tests.

On the other hand, a low code coverage means that a lot of the code is not tested at all, so it can be that some important modules are not properly verified. Sometimes it makes sense to have a relatively low code coverage for automated tests, e.g. it can be more effective to click on a GUI button and verify that the appropriate dialog opens (manual test) than to write a corresponding automated tests. Nevertheless, even in this scenario the combined coverage for automated and manual tests would be high.

So, IMO code coverage alone is not a good indicator of the quality of your tests because it only works in one direction:

  1. a low code-coverage score can correctly point out code that is not tested, and may be buggy or even dead code;
  2. a high code-coverage score can hide poor testing and can give you too much confidence in the quality of your code.

NOTE

Thanks to gnat for pointing me at code coverage for manual tests.

Giorgio
  • 19,486
  • 16
  • 84
  • 135
  • Do you know of any code coverage tool that can link code coverage with the cyclomatic complexity of the method? It would be interesting to be able to filter out straightforward methods and have the coverage of things where there are branches instead. – Matthieu M. Nov 15 '16 at 11:42
  • @MatthieuM. I guess this wouldn't lead to a more useful metric than coverage alone, as cyclomatic complexity doesn't necessarily correlate with intuitive complexity, and thus bug-proneness. – Cedric Reichenbach Nov 15 '16 at 11:59
  • 1
    I am confused by the mentioning of "automated tests" because as far as I know it doen't really matter if these are automatic or manual. I've been in projects that used [combined coverage](http://softwareengineering.stackexchange.com/a/264672/31260) of all tests, including manual and as far as I could tell this worked fairly well – gnat Nov 15 '16 at 12:51
  • @gnat: This is a good point: I have never used code coverage for manual tests and so my answer assumes code coverage only for automated tests. In fact, my argument was that what is not covered by automated tests will be covered by manual tests. This would imply that the combined code coverage is again high. Then a low code coverage would mean that a lot of code is not tested or that there is a lot of dead code. I have to think if considering code coverage for manual tests as well gives more useful information about code quality. Can I make the answer invisible while I think about it? – Giorgio Nov 15 '16 at 14:10
  • it's not accepted, so regular "delete" link should work (and "undelete" after you decide on it should work too). At 10+k rep you are even free to edit deleted post any way you want – gnat Nov 15 '16 at 14:13
  • 1
    Additionally, at least in the code coverage tools I've used. Code coverage is counted if a test simply "passes through" a branch. It does not take into account whether or not a relevant assert is made. In fact, I believe if you were to in a test method proceed through a complicated branch of logic, and make ***zero*** asserts, it would still count toward your code coverage. Code coverage can be used as a potential way to measure quality, but not by itself, it has to have relevant asserts to go along with the coverage – Kritner Nov 15 '16 at 15:26
13

I'm assuming you are referring to a Code Coverage metric in the context of unit testing. If so, I think you indirectly have already answered your question here:

First project just used targeted unit tests here and there. Second one has a mandated 70% code coverage. If I compare the amount of defects, the 2nd one has almost an order of magnitude more of them.

In short no, a Code Coverage metric does not improve the quality of a project at all.

There's also a common belief that Code Coverage reflects the quality of the unit tests but it doesn't. It doesn't give you an information what parts of your system are properly tested either. It only says what code has been executed by your test suite. What you know for sure is that code coverage gives you only an information what parts of your system are not tested.

However, the Code Coverage metric may relate to overall code quality if you are sure of the quality of your unit tests. The quality of a unit test can be defined as the ability of being able to detect a change in your code base that breaks some business requirement. In other words, every change that breaks particular a requirement (acceptance criterion) should be detected by good quality tests (such tests should simply fail). One of the simplest and automated approaches to measure the quality of your test suite which does not involve too much additional effort from you side is mutation testing.

UPDATE:

http://martinfowler.com/bliki/TestCoverage.html

  • 1
    Honestly I don't see how you can make such a bold statement without any research showing results one way or the other. It could were well be that having more tests correlate with less bugs, even with lower quality. Or it could be the opposite. How would we know? – Davor Ždralo Nov 15 '16 at 12:13
  • 3
    @DavorŽdralo Just having a metric for something has no effect. A metric is a standard of measurement. Using that metric to enforce a minimum amount of tests might start to have some effect, but it's quite easy to generate 100% code coverage without actually doing any useful testing. Software quality should be a collaborative effort between everyone who is working on the product rather than an arbitrary guideline that is somehow attempted to be imposed. – Cronax Nov 15 '16 at 13:22
  • @DavorŽdralo There is no research required to say the above. It's a pure empirical observation. Even the author of the question stated that (see the citation) having enforced a high threshold for the code coverage did improve the quality of the code but good quality of the unit tests did so. Maybe [this article](http://martinfowler.com/bliki/TestCoverage.html) will convince you :) – Marcin Kłopotek Nov 15 '16 at 13:37
  • OK, you two can pretend that making bold claims without evidence is totally cool if you want. – Davor Ždralo Nov 16 '16 at 20:57
13

As a reductio ad absurdum: the following test covers 60% of the lines of the function:

def abs(x):
    if x < 0:
        return -x
    else:
        return x

assertEquals(abs(-10), 10)

whereas in this example, we have 100% coverage:

def abs(x):
    if x < 0:
        return -x

assertEquals(abs(-10), 10)

Of course, only the latter has a bug.

RemcoGerlich
  • 3,280
  • 18
  • 22
  • 1
    Yes. That's why we need code coverage *and* requirements coverage. – Thomas Weller Nov 15 '16 at 14:46
  • @ThomasWeller: good luck getting 100% requirements coverage. Or more precisely, good luck getting a complete list of requirements to test. – RemcoGerlich Nov 15 '16 at 14:48
  • 1
    In the second example you can remove the conditional statement altogether and simply return `-x`. – Reinstate Monica Nov 15 '16 at 14:49
  • @ABoschman: I think that's what Remco meant. There should be a return statement for the positive case. – Thomas Weller Nov 15 '16 at 14:51
  • A Boschman: the idea of the example is that removing lines that were really necessary make the coverage go up. – RemcoGerlich Nov 15 '16 at 14:54
  • @RemcoGerlich Okay, either way your point is made. It just stood out to me because in some languages this wouldn't compile, because not all code paths return a value. – Reinstate Monica Nov 15 '16 at 14:59
  • In Python a function returns None if no explicit return statement is reached. – RemcoGerlich Nov 15 '16 at 15:04
  • If you can't get a complete list of requirements to test, how did you get a complete list of requirements to build? – JeffO Nov 15 '16 at 17:35
  • Your code will continue passing all the tests if you remove all uncovered code. Someone should write a tool that does exactly that, it would really improve compile times, maintenance cost and so on. – gnasher729 Jun 25 '19 at 21:22
2

Code coverage can help, but on its own is not a good indicator.
Where it can help is that it forces people to consciously work with the code in order to write the tests that provide that coverage, and that's likely to cause them to see potential problems and fix them.

But if the people doing this aren't actually interested in the code they can mechanically just build test code that covers everything without bothering to think about what the code actually does and if that's correct.

As a result it can lead to a false sense of security. But if the team is properly motivated and interested in delivering quality it's a good way to help a team find areas of the code that are suspicious and need to be looked at for potential problems.

And just counting covered lines isn't enough for that, you also need branch coverage for example, testing the different paths through conditional statements or all possible outcomes.

jwenting
  • 9,783
  • 3
  • 28
  • 45
1

No. Code Coverage doesn't improve code quality.

Simple, code coverage tells you how much your line of code was executed in test methods.
It is not telling you does result of your production code was asserted or not.

I think this cannot give you information about quality of production code.

If you write code in TDD style, then you don't need code coverage at all. You already write only code which covered by tests written before.

Fabio
  • 3,086
  • 1
  • 17
  • 25
  • Please explain downvote - I will be glad to improve answer – Fabio Nov 15 '16 at 12:30
  • 2
    this doesn't seem to offer anything substantial over points made and explained in prior 3 answers (not to mention [more answers in the duplicate question](http://softwareengineering.stackexchange.com/questions/192/is-test-coverage-an-adequate-measure-of-code-quality)) – gnat Nov 15 '16 at 12:32
  • @gnat - agree, but all answers used terms _what parts of your system are not **tested**_ - in my question I want point out that code coverage doesn't show which lines tested - it is show which lines **executed** - Where I see big difference(based on the fact how seriously code coverage can be taken) – Fabio Nov 15 '16 at 12:36
  • @gnat -really with this kind of explanation all other answers can be downvoted :) – Fabio Nov 15 '16 at 13:08
  • @Fabio: If you use TDD for all your code you may have to write long tedious tests for functionality that can be tested much more easily using a manual test. – Giorgio Nov 15 '16 at 14:49
  • @Giorgio - very not agree. What you mean by "manual test"? Does starting whole application with connections to database or services for testing one method can be easily and faster then once written unit tests? In addition if your application use database, then you need to be sure that expected values are configured there too. – Fabio Nov 15 '16 at 15:04
  • @Fabio: My typical example is GUI logic: you want to verify that when you push a button the corresponding dialog opens. Just start the application, push the button, and verify that the dialog opens. – Giorgio Nov 15 '16 at 15:10
  • @Giorgio - agree but only for UI testing. – Fabio Nov 15 '16 at 15:13
1

Was the 70% code coverage requirement put in before programmers where not writing unit tests? If so, I expect the result is more to do with the attitude of the programmers on the project rather than the 70% code coverage rule?

Code coverage is a good tool to help with targeting unit tests, but only when it is used by people who believe in the benefit of the unit tests and are skillful in writing unit tests.

Ian
  • 4,594
  • 18
  • 28
  • This percent came about because Salesforce platform won't let you deploy to production unless your code has 70% code coverage. So naturally, at the last minute, developers write a lot of dumb unit tests just to meet this requirement. – AngryHacker Nov 15 '16 at 21:13
  • @AngryHacker, a unit test is only useful if it is written by someone who is trying to break the code..... – Ian Nov 15 '16 at 22:05