Is cyclomatic complexity density a good software quality metric?

Question

Where I define Cyclomatic complexity density as:

Cyclomatic complexity density = Cyclomatic complexity / Lines of code

I was reading previous discussions about cyclomatic complexity and there seems to be a sort of consensus that it has mixed usefulness, and as such there probably isn't a strong motive for using it over a simple lines of code (LOC) metric. I.e. As the size of a class or method increases beyond some threshold the probability of defects and of there having been poor design choices goes up.

It seems to me that cyclomatic complexity (CC) and LOC will tend to be correlated, hence the argument for use of the simpler LOC metric. However, there may be outlier cases where complexity is higher within some region of code, i.e. there is a higher density of execution branches in some pieces code (compared to the average) and I'm wondering if that will tend to be correlated with the presence of defects.

Is there any evidence for or against, or are there any experiences around the use of such a complexity density metric?

Or perhaps a better metric is to have both a LOC and a CC threshold, and we consider passing either threshold as bad.

"Whenever I hear of attempts to associate some type of code-based metric with software defects, the first thing that I think of is McCabe's cyclomatic complexity. .." ([Experiments correlating code metrics to bug density](http://programmers.stackexchange.com/a/149027/31260)) — gnat, Jul 21 '15 at 22:40
There's probably a citation in Code Complete somewhere, but complexity metrics like this do have a (very loose) correlation with bugs. Personally, cyclomatic complexity is one of the few I've actually used, because it tells you the minimum number of unit tests required to get complete code coverage. — Ixrec, Jul 21 '15 at 22:46
Use cyclomatic complexity for hot-spot detection. It tells you where learning difficulties will occur and where changes are more expensive. Density has little to do with it. — BobDalgleish, Jul 22 '15 at 00:59
@Ixrec that would be "Code Complete, p.457, Steve McConnell says that "control-flow complexity is important because it has been correlated with low reliability and and frequent errors"" from the answer in [Experiments correlating code metrics to bug density](http://programmers.stackexchange.com/a/161910/40980) — , Jul 22 '15 at 01:01

Sjoerd · Answer 1 · 2015-07-22T10:54:28.417

6

There are no software quality metrics that are good - at least none are known yet. Years of research hasn't provided us with any good one yet.

So the answer whether any of your suggested metrics is a good metric for software quality is a disappointing "no."

There are some metrics that are reasonable indicators of bad software. But the lack of signs of bad software doesn't make software good. Besides, those indicators are very fussy with lots of exceptions, so automatic refusal of bad code is impractical in practice.

Researchers debate those metrics and correlate the metrics with bugs, but usually those correlations are no stronger than "larger programs contain more bugs." As this entry on Wikipedia admits:

The essence of this observation is that larger programs (more complex programs as defined by McCabe's metric) tend to have more defects. Although this relation is probably true, it isn't commercially useful.

As a result, those metrics are barely used in the commercial world - they don't save time nor money.

This should not stop researchers from looking for better ones. But so far, in my opinion it's a theoretical exercise with barely any practical usage.

During my 20 years of working in the commercial ICT, I've encountered two metrics used in practice, and neither can be automated:

"Number of paying customers satisfied," also known as "How much does it earn us?"
"WTFs per minute" when peers are reading your code.

Which is why this image is popular - I've seen it in multiple shops: WTF per minute

edited Jul 22 '15 at 10:54

answered Jul 21 '15 at 22:59

Sjoerd

2,906
1
19
18

+1 for the graphic. In general the cyclomatic complexity is a widely adopted way to roughly gauge if your code might be prone to bugs. My use of vague language here is by design ;-) – Roman Jul 22 '15 at 00:02
1

this doesn't even attempt to address the question asked, see [answer] – gnat Jul 22 '15 at 00:18
3

@gnat It seems pretty clear to me his answer is an elaborate "no". That answers the question, even if you disagree. – Doval Jul 22 '15 at 01:13
1

@Doval - I disagree with you in that this answer is heavy on opinion but light on supporting research / evidence. And it doesn't answer the salient points of the question. The OP mentions "years of research" but doesn't cite any specifics. Likewise, no examples are provided to back the claim "there are exceptions to every one of them." Finally, the suggested metrics are ... weak and subject to even more critique than the metric in question (cyclomatic complexity). – Jul 22 '15 at 01:21
3

@gnat The answer is stronger than the question: The question asks whether a particular metric is a good one; The answer states that no metrics are good (which includes the mentioned one). – Sjoerd Jul 22 '15 at 01:21
2

@Sjoerd - A "strong" answer needs stronger evidence than this to back it. The evidence may be there to back your claims, but you have not presented that within your answer. – Jul 22 '15 at 01:23
@GlenH7 Yes, the suggested metrics are weak - that's exactly the point of my answer! If you know any better ones, let us know in an answer. – Sjoerd Jul 22 '15 at 01:23
1

Reworded my answer, made the 'no' explicit, and added a reference. – Sjoerd Jul 22 '15 at 10:49

score 6 · Answer 2 · answered Jul 22 '15 at 02:30

Consider the following:

define dispatch_message (message_id, message_contents)
    if (message_id == MESSAGE_ID_1)
        FirstMessageId(message_contents).dispatch()
    else if (message_id == MESSAGE_ID_2)
        SecondMessageId(message_contents).dispatch()
    ...
    else if (message_id == MESSAGE_ID_N)
        NthMessageId(message_contents).dispatch()
    else
        raise_or_throw_an_error
    end if
end function

This function comprises 2N+5 lines of code and has a cyclomatic complexity of N+1 or N+3, depending on how you count raise_or_throw_an_error. Suppose N is 200. The complexity density is around 1/2. What does that mean? On the other hand, a function that has a line count of 4005 and a complexity of over 2000 means something.

Despite having a complexity of 2000+, my function isn't completely terrible. (Okay, it's terrible; there are more modern ways to do this.) This is surprisingly a fairly common construct in very high reliability systems. Done right, it's fairly obvious what the function is doing.

One problem with dividing SLOCs by complexity is that there's a strong correlation between SLOCs and complexity. My function is an anomaly. All that your metric will show is that my function is anomalous. At the other extreme, consider a very long auto-generated function with a cyclomatic complexity of one. These also are not that problematic in and of themselves. (It's the generator you need to look at, not the 40000 line long function.)

The two extremes of the SLOC to complexity ratio aren't the places to look for bugs. It's somewhere in the middle. Unfortunately, that's where you'll find most of your functions. SLOC and complexity give false alarms, but those false alarms are worth investigating. I don't see the same applying to your SLOC to complexity ratio metric. The buggiest of functions will most likely be hidden amongst a huge number of non-alarming functions that have a similar SLOC to complexity ratio.

Regarding generated code, see Pitrat's blog entry [The meta-bug, curse of the bootstrap](http://bootstrappingartificialintelligence.fr/WordPress3/2014/06/the-meta-bug-curse-of-the-bootstrap/) — Basile Starynkevitch, Jul 22 '15 at 04:54
"but those false alarms are worth investigating" - Good points. You can hone in on outliers identified by any number of metrics, and you'll find auto-generated code and other 'oddities', but that still leaves you with a majority of code that isn't an outlier in any metric but might be good or bad. I do still think that very large classes and functions are correlated with poor code, they /might/ be high quality and bug free, and smaller classes/functions may be bug heavy, but I believe LoC works as a reasonably good prior estimate of quality. — redcalx, Jul 22 '15 at 10:05

Is cyclomatic complexity density a good software quality metric?

2 Answers2