how to measure defects per KLOC

Question

I have been reading on the internet about the metric "number of bugs per 1000 lines of code" and what would be a good number.

However, I wonder how someone would compute such a metric? The reason a bug is in there is that nobody has found it. And testing can't show you that there are no more bugs.

You may assume that the terms "Bug" and "KLOC" are well-defined.

One method I could think of is using formal methods to proof that the software is correct and count the number of bugs encountered on the way. However, this is not realistic for large codebases.

Another method would be to count the confirmed bugreports the vendor receives, without duplicates. However, this needs a lot of time to compute and only gives a lower bound. It is also heavily biased against widely used software, because more users means more bugreports.

I could also think of code checkers, static analysis and such things, but they won't find everything and produce a lot of noise.

Can "defects per KLOC" be computed or at least estimated reliably and without bias?

Every scientific publication about bug counts contains detailed descriptions about how the number of defects was actually estimated - and on a topic as hairy as this you should only believe primary sources, never secondary ones. — Kilian Foth, May 08 '16 at 20:44
@kilianfoth a primary source is only telling you what they did. Not what other people think of what they did. Well cited secondary sources can provide valuable citisim and comparison of methods. — candied_orange, May 08 '16 at 21:05

score 3 · Accepted Answer · answered May 08 '16 at 23:21

The general approach to measure these figures is:

Establish a test plan with sufficient coverage.
Execute the formal test plan (could be automated or manual tests), and register the failed test and if necessary the bug reports issued after root cause analysis.
Compare the figures with the KLOCs which can be computed automatically from source code.

Needless to say : if you're having a manual "ad-hoc" test approach, you wont get consistent bug numbers: as you've mentioned, many bugs aren't discovered immediately. However formal test plans with unit, integration and acceptance tests are very common for bigger and mission critical software. TDD further emphasises the tests, providing very detailed unit tests that can check and diagnose the promised functionality and all the invariants that your code is supposed to respect.

There's also the question if the results of preventive tests run by a developer before submitting his code for integration are to be counted or not. Same question for issues discovered in peer reviews.

The definition of bugs is also an issue. People overuse this word in common language, and the frontier is not clear: is it a non-compliance of the code with the specification ? or is it also issues caused by unclear requirements ? Here some standards with precise definitions, like ISO 9126, can really help.

Finally, the KLOC is a concept that was introduced in a time where dominant languages were line oriented (e.g. fortran, cobol). So it's really a question nowadays, what should count for a LOC: empty lines ? comment lines ? conditionally compiled lines ? active lines, or active instructions ? etc...

All this being said, you'll have of course variances in your absolute figures that will depend on your precise definitions and methodology. But if you remain consistent, interesting facts may emerge when you look at the evolution of these metrics rather than at the absolute figures.

There are companies that keep statistics on huge number of software, and have developed a predictive model that is used to predict bug rate based on evolution of the metric on the project. They then use this prediction in the decision making about releasing or not to market (I think I've read a paper from HP some years ago, but I couldn't find it back). Such predictions have of course only statistical value: the fact that it's meaningful in general, doesn't avoid that particular project might completely contradict the model.

Personally, I'm not sure that these predicting methods still make sense in an era of agile and TDD, where bugs are prevented in early stages and on the fly. However I have to admit that introducing such structured metrics on subcontracted projects (i.e. sotfware built according to well specified requirements) allowed to quickly control and address reliability issues of some contractors.

how to measure defects per KLOC

1 Answers1