4

I'm looking for references about hypothesis testing in software management. For example, we might wonder whether "crunch time" leads to an increase in defect rate - this is a surprisingly difficult thing to do.

There are many questions on how to measure quality - this isn't what I'm asking. And there are books like Kan which discuss various quality metrics and their utilities. I'm not asking this either. I want to know how one applies these metrics to make decisions.

E.g. suppose we decide to go with critical errors / KLOC. One of the problems we'll have to deal with with that this is not a normally distributed data set (almost all patches have zero critical errors). And further, it's not clear that we really want to examine the difference in means. So what should our alternative hypothesis be?

(Note: Based on previous questions, my guess is that I'll get a lot of answers telling me that this is a bad idea. That's fine, but I'd request that it's based on published data, instead of your own experience.)

Xodarap
  • 227
  • 1
  • 6

1 Answers1

1

I hope this answer is not too basic, but a simple, but I believe effect method for evaluating metrics is the control chart. This chart shows an expected performance range, and when the metric goes outside, it is easy to spot. Informally, there may be some techniques to plot defects per hundred committed changes vs. day within a 28 day sprint that would show whether changes early in the sprint were relatively defect free, but at then end, things turn ugly. It might be interesting to show a before and after Agile that shows similar plots for days after the deadline vs. days before the deadline.

DeveloperDon
  • 4,958
  • 1
  • 26
  • 53
  • Not at all too basic. I do think that aggregating per hundred committed changes like you said is a good idea (because of CLT it becomes more normal), but significance on your aggregated data set is different from significance on your original data set, in a way which is hard to determine (for me at least). So going above the UCL doesn't mean what you might think it means... – Xodarap Nov 12 '12 at 17:32
  • What happens if you have a data point per day (or hour) that shows the defects/commits during the time period. Perhaps this is not a realistic thing to measure because you need to tie the defect to the commit and except for an initial run of a unit test, there might not be a good way to do this. – DeveloperDon Nov 13 '12 at 03:57