Is there any work into the application of the Halstead complexity measures to determine software quality?

Question

In 1977, Maurice Howard Halstead introduced his complexity measures for software systems, which included measurements of the program vocabulary, program length, volume, difficulty, effort, and an estimated number of bugs in a module. According to Wikipedia, difficulty relates to the difficulty of understanding the program when reading or writing it and effort can be translated into the time it takes to code an application where Time = (Effort / 18) seconds.

A measurement is useless unless the data and calculations relate to some aspect of software development. However, I haven't found any work which states that a difficulty of a certain value or higher tends to a statistically significant increase in defects or a relationship between difficulty and time to read code (a difficulty of N yields an average of M hours spent understanding the code base) or any analysis of being able to compute Time after the fact being useful in determining quality (especially since time to write should have been recorded as a measurement already). I'm especially interested in Halstead's bug estimation (which is not mentioned on Wikipedia) - the number of bugs in an application can be estimated by Volume/3000 or Effort^(2/3)/3000.

I'm looking for two things:

Has anyone used Halstead's software complexity measures in a real-world application to assess software quality? If so, how did you apply them and did they turn out to be a useful, valid, and/or reliable measurement?
Is there any academic research in the form of surveys, analyses, or case studies that discuss the validity (or invalidity) of Halstead complexity measures when applied to software quality?
Is there any academic research in the form of surveys, analyses, or case studies that demonstrate the use of Source Lines of Code (SLOC) to compute something similar to the Halstead metrics of Volume, Difficulty, Effort, Time, and Bugs? I would suspect that Volume might just correspond to a SLOC count and Difficulty might correspond to cyclomatic complexity (and possibly other measures). I'm also well aware that measuring effort, productivity, or time in SLOC is potentially misleading.

You're going to have some trouble finding results in the last 15 years, since Halstead's metrics work was done more like 30-40 years ago, and the strong correlation with SLOC was discovered almost immediately. (This is from memory, from a talk by a new Ph.D. faculty candidate at UT Austin ca. 1977.) — John R. Strohm, Sep 14 '11 at 01:42
Thanks for that. I'll update the question and refocus my search effort son older papers. — Thomas Owens, Sep 14 '11 at 02:31

score 5 · Answer 1 · answered Sep 13 '11 at 12:55

5

Microsoft Research has done some work in this area. Check out this page: http://research.microsoft.com/en-us/people/nachin/. Though not specifically based on Halstead, Nachi and his team have done some investigation using Halstead, cyclomatic complexity, code churn, and other measures to assess relative risk and fragility for making changes in areas of code. There's also an interesting paper about how organizational effectiveness also plays a big role but that's off topic. :)

answered Sep 13 '11 at 12:55

nithins

566
4
7

I'll have to read though some of those. Definitely something I'm interested in, but I am (at least right now), particularly interested in Halstead, so I'll be focusing there. I bookmarked the site so I can read it when I get some more time, but here's a +1 for the time being. – Thomas Owens Sep 13 '11 at 20:26
McCabe's cyclomatic complexity has been shown, on real code, to be very strongly correlated with raw SLOC, to the point that there is no incremental value whatsoever in computing it. – John R. Strohm Feb 11 '16 at 21:54

John R. Strohm · Answer 2 · 2011-09-14T01:53:01.033

0

There are quite a few such studies. Google is your FRIEND.

Halstead's metrics fell out of favor when it was demonstrated that all of them were strongly correlated with raw SLOC (source lines of code). At that point, it becomes easier to measure SLOC and be done with it.

Here's a result from Google Books.

edited Sep 14 '11 at 01:53

answered Sep 13 '11 at 13:54

John R. Strohm

18,043
5
46
56

1

I have been Googling since before I asked this question and have yet to find any published papers or other reputable sources. Also, I fail to see how a metric related to SLOC can be poor. SLOC/time is a poor measure of productivity, but other SLOC-based metrics are usually considered valid, an example being defects/SLOC. – Thomas Owens Sep 13 '11 at 20:18
1

@Thomas: It isn't that Halstead's metrics are "related" to SLOC, it is that they are strongly correlated. Statistics 102. Saying that Y and X are strongly correlated means that the ratio Y/X is essentially constant for all datasets. When that is the case, there is no point in measuring Y if it is easier to measure X, because Y isn't really telling you anything you don't already know from X. – John R. Strohm Sep 14 '11 at 01:33
That makes sense. Halstead's metrics are based on number of distinct and total operators and operands. It's common sense that a longer program will have more total operators/operands and will be more likely have more distinct operators/operands. The metrics of Volume and Difficulty could be obtained using SLOC instead of operators/operands. However, that doesn't address the validity, applications, and meaning (or lack of meaning) of the Effort, Time, and Bugs metrics. Even when computed with SLOC instead of operators/operands, do these metrics say anything meaningful about the system? – Thomas Owens Sep 14 '11 at 02:28
SLOC is easier to count, and probably more useful. Estimates of SLOC are used by several cost estimation techniques, tracked in the PSP and TSP, and can be used in other metrics such as defect density. That, to me, says counting SLOC might be better than counting operators/operands. Second, and unanswered so far, is the validity of the metrics of Effort, Time, and Bugs, regardless of what measurements are used to compute them. I agree that computing them with SLOC might be better, but even if I did, would they mean anything? – Thomas Owens Sep 14 '11 at 02:30
@ThomasOwens: Probably not. If Effort, Time, and Bugs are all strongly correlated to SLOC, then it tells you that all programs of a given size take about the same time and effort and have the same number of bugs. The first two are what SLOC-based estimating (e.g. COCOMO) is based upon, and are like saying water is wet. The third doesn't really help you. – John R. Strohm Aug 07 '12 at 14:42
-1: The "Google is your FRIEND" statement isn't very friendly especially because "FRIEND" is in all caps. This essentially indicates an LMGTFY attitude, which we don't want here. Furthermore, the OP has demonstrated research effort, so such a response is not justified. See also on Meta Stack Exchange: [Ban LMGTFY (let me google that for you) links](http://meta.stackexchange.com/questions/15650) – bwDraco Oct 25 '14 at 18:39

score 0 · Answer 3 · answered Nov 21 '13 at 15:33

0

That Halstead Volume is correlated with SLOC is interesting but limited. Basic statistics: linear correlation is not transitive. X correlated to Y, Y correlated to Z DOES NOT MEAN that X is correlated to Z.

answered Nov 21 '13 at 15:33

user1704475

111
1

When X and Y are merely correlated, and Y and Z are merely correlated, yes, X and Z are not necessarily correlated, because of the relatively high noise levels in the first two correlations. When X and Y are strongly correlated, and Y and Z are strongly correlated, there is very, very little noise involved, and it becomes highly probable in any given case that X and Z will be found to be correlated. – John R. Strohm Feb 11 '16 at 21:52

Is there any work into the application of the Halstead complexity measures to determine software quality?

3 Answers3