Does software testing methodology rely on flawed data?

Question

It’s a well-known fact in software engineering that the cost of fixing a bug increases exponentially the later in development that bug is discovered. This is supported by data published in Code Complete and adapted in numerous other publications.

However, it turns out that this data never existed. The data cited by Code Complete apparently does not show such a cost / development time correlation, and similar published tables only showed the correlation in some special cases and a flat curve in others (i.e. no increase in cost).

Is there any independent data to corroborate or refute this?

And if true (i.e. if there simply is no data to support this exponentially higher cost for late discovered bugs), how does this impact software development methodology?

This sounds logical, as having bug discovered later in most cases also involves data corruption. More over, having a corrupted data costs businesses a lot if that discovered later on on the process of fixing the bug. — Yusubov, Sep 04 '12 at 21:30
@ElYusubov It does indeed. But common sense can be **very** deceptive. Our minds are easily tricked by apparent logic when it’s actually the other way round. That’s why evidence-based software engineering is so important. — Konrad Rudolph, Sep 04 '12 at 21:32
@KonradRudolph: I agree that our minds are easily tricked but the problem is there isn't much good evidence and software engineering is a huge and constantly changing field. — Guy Sirton, Sep 05 '12 at 04:02
related: http://programmers.stackexchange.com/q/133824/41381 — Ryathal, Sep 05 '12 at 12:18
For the record (and mentioned in my answer), the earliest mention of this I have been able to find is well before Code Complete. The work of Fagan and Stephenson (independently) in 1976 is the earliest reference to this that I can find. The first edition of Code Complete wasn't published until 1993, almost 20 years later. I would expect that the work of Barry Boehm in the 1980s led to the increase in popularity of this idea - Boehm's work was very influential to software engineering process in the 1980s and even into the late 2000s. — Thomas Owens, Sep 05 '12 at 13:32
It's axiomatic that any statement about software engineering statistics is wrong, including this one. (The bugs you find later are generally the more complex bugs. And fixing them is complicated more by the "controls" put in place re late-term fixes.) — Daniel R Hicks, Sep 06 '12 at 16:03
The cost vs. time of discovery would probably also be influenced by how clean the code is. If it is done right it might just be a simple and equally cheap thing to fix, but if the code-base is a mess it might require a lot more changes through the system which will be harder, more expensive to do the more places the effects of the bug could have gotten into, and take more time. — LJNielsenDk, Jan 25 '14 at 09:00

score 25 · Accepted Answer · 2012-09-05T08:08:23.953

25

Does software testing methodology rely on flawed data?

Yes, demonstrably. Examining the Agile Cost of Change Curve shows that part of Kent Beck's work on XP (I'm not sure whether it was part of his motivation or his justification) was to "flatten the curve" of defect costs, based on knowledge of the "exponential" curve that lies behind the Code Complete table. So yes, work on at least one methodology - the one that did most to popularise test-first development - is at least in part based on flawed data.

Is there any independent data to corroborate or refute this?

Yes, there certainly is other data you can look to - the largest study I'm aware of is the analysis of defects done at Hughes Aircraft as part of their CMM evaluation program. The report from there shows how defect costs depended with phase for them: though the data in that report don't include variances so you need to be wary of drawing too many "this thing costs more than that thing" conclusions. You should also notice that, independent of methodology, there have been changes in tools and techniques between the 1980s and today that call the relevance of these data into question.

So, assuming that we do still have a problem justifying these numbers:

how does this impact software development methodology?

The fact that we've been relying on numbers that can't be verified didn't stop people making progress based on anecdotes and experience: the same way that many master-apprentice trades are learned. I don't think there was a Journal of Evidence-Based Masonry during the middle ages, but a whole bunch of big, impressive and long-lasting buildings were nonetheless constructed with some observable amount of success. What it means is that we're mainly basing our practice on "what worked for me or the people I've met"; no bad thing, but perhaps not the most efficient way to improve a field of millions of people who provide the cornerstone of the current technological age.

I find it disappointing that in a so-called engineering discipline doesn't have a better foundation in empiricism, and I suspect (though clearly cannot prove) that we'd be able to make better, clearer progress at improving our techniques and methodologies were that foundation in place - just as clinical medicine appears to have been transformed by evidence-based practice. That's based on some big assumptions though:

that the proprietary nature of most software engineering practice does not prevent enough useful and relevant data being gathered;
that conclusions drawn from these data are generally applicable (because software engineering is a skilled profession, personal variances in experience, ability and taste could affect such applicability);
that software engineers "in the field" are able and motivated to make use of the results thus obtained; and
that we actually know what questions we're supposed to be asking in the first place. This is obviously the biggest point here: when we talk about improving software engineering, what is it that we want to improve? What's the measurement? Does improving that measurement actually improve the outcome, or does it game the system? As an example, suppose the management at my company decided we were going to decrease the ratio between actual project cost and predicted project cost. I could just start multiplying all my cost estimates by a fudge factor and I'd achieve that "goal". Should it then become standard industry practice to fudge all estimates?

edited Sep 05 '12 at 08:08

answered Sep 05 '12 at 07:39

Awesome meta-answer about evidence-based engineering. Thanks. – Konrad Rudolph Sep 05 '12 at 08:00
4

Damn, I just realised that this is coming straight from the horse’s mouth. Hehe. Awesome. – Konrad Rudolph Sep 05 '12 at 08:07
1

I get the feeling that everyone's interpreting the use of "flawed data" as "completely untrue, the opposite is true", but I get the feeling that your position is simply to point out that it *may be* untrue. Is this correct? – Daniel B Sep 05 '12 at 08:09
3

@DanielB Correct. Show me evidence that it's _actually_ wrong and I might change my mind; until then I only know that it isn't _demonstrably_ right. – Sep 05 '12 at 08:11
@GrahamLee OK, I agree with that. I do think that phrasing it as "the table is completely false" sends the wrong message though. – Daniel B Sep 05 '12 at 08:16
@DanielB Well at least in some case(s?) there seems to be a flat curve, i.e. a direct contradiction of the table shown. But in general I think we all agree that we don’t have sufficient data either way. – Konrad Rudolph Sep 05 '12 at 08:23
@KonradRudolph Yes, and the extra publicity on evidence based software engineering is certainly welcome. Personally, I just thought the counter-evidence was sufficient for saying "this doesn't seem to be true in all cases, and I can't verify the sources" as opposed to effectively claiming false play. – Daniel B Sep 05 '12 at 08:29
@DanielB the table does not show "that, for example, an architecture defect that costs $1000 to fix when the architecture is being created can cost $15,000 to fix during system test", which is the claim in CC2E, nor does it show "that fixing bugs at the end of a project is the most expensive way to work" which is the claim in my book. It also doesn't reflect the data presented in the citations offered by McConnell. In these senses, the table is completely false. I stand by that assertion. – Sep 05 '12 at 08:57
I think the "for them" is extremely important here. There is huge difference between developing aircraft controllers where a bug can easily become really expensive safety issue and developing applications where you can dismiss many bugs as not important, because they only happen sometimes, have an easy workaround, users won't know they are bugs and such. – Jan Hudec Sep 05 '12 at 09:23
@JanHudec what I meant was not "everyone else gets to ignore bugs", but that their numbers are situated in their engineering teams and processes. What takes them 0.36 days to fix might not take you 0.36 days to fix, the way they do system test might not be the way you do system test, and so on. It's important to notice that they've reported on how their apples work and I don't know whether you have apples, oranges or shish kebabs. – Sep 05 '12 at 10:35
1

@GrahamLee I see your point (just think that the phrasing might have been a bit unnecessarily aggressive). Out of curiosity, I found the Fagan paper [here](http://www-sst.informatik.tu-cottbus.de/~db/doc/People/Broy/Software-Pioneers/Fagan_hist.pdf), and it says "...allows rework ... near their origin ... 10 to 100 times less expensive than if it is done in the last half of the process". I don't see any citations near this text though. – Daniel B Sep 05 '12 at 10:49
@DanielB thanks, I'm planning to go through the Fagan paper (and Grady 1999) today to see what I can add to my post. – Sep 05 '12 at 10:50
To follow up on the Fagan paper - it seems that the comment is lacking citations because it comes from an introduction to the rest of the paper. The rest of the paper seems to go into some detail on how they came to these conclusions. – Daniel B Sep 05 '12 at 10:57
IIRC, Kent Beck did not say the curve is flat after the first rise, only that it is "almost flat" but still rises slightly. – herby Sep 05 '12 at 11:56
@DanielB an interesting note on the Fagan method: "The coding was then done, and when the code was brought to the level of the first clean compilation, it was subjected to a code inspection, I2." In other words the process used called for all code to be written _before they even attempted to compile it_. I imagine that doesn't reflect the way you work; I certainly compile early and often (and have warnings available pre-compilation in my IDE, too). – Sep 05 '12 at 12:25
@GrahamLee Good point, I'm guessing their compile times were measured in minutes and hours (not counting getting the source to the server), thus the difference in the development process. I wonder if that makes their data unusable - e.g. certain classes of "errors" may be completely trivial today, yet building the wrong product (an error in specification) is still going to cause a ripple-effect. – Daniel B Sep 05 '12 at 12:59
@DanielB given such a workflow even "missing semicolon" would be a bug that needs reporting and reworking (and could easily be caught early by the sort of code inspection detailed in the paper). – Sep 05 '12 at 15:17
Eclipse compiles as you type, and of course a huge amount of software is written in non-compiled languages. – MebAlone Sep 13 '12 at 05:58
"When we talk about improving software engineering, what is it that we want to improve?" That is certainly an interesting question. What makes software "improved"? And if it is, whose lives are improved? Users? Software business owners? Software business workers? – MebAlone Sep 13 '12 at 06:04

score 8 · Answer 2 · answered Sep 04 '12 at 22:02

For my part, the answer to "how does this impact software development methodology" is "not much".

Whether caught by the developer or the end user, whether it takes more money to fix it after it's been caught by the user or not, the fact remains that a bug has been found in the system. If caught by the developer, hopefully it's a quick fix. The same hope holds for bugs caught by the user.

Regardless of actual developer-hour cost to fix a bug caught by an end user, there is the intangible cost of maintaining the stereotype that coders suck at what they do. When a user finds a bug, it's the developer's fault. Therefore, each bug the end user finds reduces the user's confidence in the system. It's like touring a home you want to buy, and seeing a water stain showing through the ceiling in one corner of the house. That, in itself, is an easy fix, but you wonder what caused it, and what else that root cause may have affected. What's your peace of mind worth? You may have to tear the walls down back to the studs and visually inspect everything to ensure the stain was from an isolated incident that has been fixed. Knowing that might be a possibility doesn't make you very confident in the home. Similarly, if the developer who wrote your software missed this very obvious (to you) cosmetic flaw, what's wrong deeper down?

These intangible costs are avoided the sooner the bug is caught, which is the stated purpose of TDD-style methodologies. A bug caught during typing by the developer or partner in a pair, one caught at compile time, or one caught by unit/integration testing (the layer added by TDD), is a bug that the user never has to know about, that your project manager never has to apologize for, and that you don't have to be pulled off of whatever you're doing right this second to switch gears into defect-fixing mode on a part of the system you thought you'd left behind weeks ago.

Interesting answer. I try to have my users understand that development is an iterative process or both refining and improving. I can give them something very quickly, and if they find problems or want improvements I can turn those changes around very quickly, too (minutes or hours, not days or weeks). The system becomes more stable over time, but they come away with a trust in the development process and end result, rather than the specification process and first build. (of course depends on the environment that you work - I'm writing line of business apps so if they break, they get fixed). — Kirk Broadhurst, Sep 04 '12 at 23:41
Unfortunately, the original evidence - that requirements errors found when the product is fielded are the more costly than implementation errors found when the product is fielded - implies a need for better validation, not better verification. TDD - using testing to verify the product against the requirements - is simply not relevant to finding these bugs. — Pete Kirkham, Jan 18 '13 at 11:43

Thomas Owens · Answer 3 · 2016-05-07T14:49:10.930

I'm going to preface this with the fact that most of what I'm finding comes from the 1970s and early 1980s. During this time, sequential process models were far more common than the iterative and/or incremental approaches (the Spiral model or the agile methods). Much of this work is built on these sequential models. However, I don't think that destroys the relationship, but one of the benefits of iterative/incremental approaches is to release features (an entire vertical slice of an application) quickly and correct problems in it before dependencies are injected and complexity of each phase is high.

I just pulled out my copy of Software Engineering Economics and found a reference to the data behind this chart in Chapter 4. He cites "Design and Code Inspections to Reduce Errors in Program Development" by M.E. Fagan (IEEE, PDF from UMD), E.B. Daly's "Management of Software Engineering", W.E. Stephenson's "An Analysis of the Resources Used in Safeguard System Software Development" (ACM), and "several TRW projects".

...the relative cost of correcting software errors (or making other software changes) as a function of the phase in which the corrections or changes are made. If a software requirements error is detected and corrected during the plans and requirements phase, its correction is a relatively simple matter of updating the requirements specification. If the same error is not corrected until the maintenance phase, the correction involves a much larger inventory of specifications, code, user and maintenance manuals, and training material.

Further, late corrections involve a much more formal change approval and control process, and a much more extensive activity to revalidate the correction. These factors combine to make the error typically 100 times more expensive to correct in the maintenance phase on large projects than in the requirements phase.

Bohem also looked at two smaller, less formal projects and found an increase in cost, but far less significant than the 100 times identified in the larger projects. Given the chart, the differences appears to be 4 times greater to fix a requirements defect after the system is operational than in the requirements phase. He attributed this to the smaller inventory of items that comprise the project and the reduced formality that led to the ability to implement simpler fixed faster.

Based on Boehm in Software Engineering Economics, the table in Code Complete is rather bloated (the low end of the ranges is often too high). The cost to make any change within phase is indeed 1. Extrapolating from Figure 4-2 in Software Engineering Economics, a requirements change should be 1.5-2.5 times in architecture, 2.5-10 in coding, 4-20 in testing, and 4-100 in maintenance. The amount depends on the size and complexity of the project as well as formality of the process used.

In Appendix E of Barry Boehm and Richard Turner's Balancing Agility and Discipline contains a small section on the empirical findings regarding the cost of change.

The opening paragraphs cite Kent Beck's Extreme Programming Explained, quoting Beck. It says that if the cost of changes rose slowly over time, decisions would be made as late as possible and only what was needed would be implemented. This is known as the "flat curve" and it is what drives Extreme Programming. However, what previous literature found was the "steep curve", with small systems (<5 KSLOC) having a change of 5:1 and large systems having a change of 100:1.

The section cites the University of Maryland's Center for Empirically Based Software Engineering (sponsored by the National Science Foundation). They performed a search of available literature and found that the results tended to confirm a 100:1 ratio, with some results indicating a range of 70:1 to 125:1. Unfortunately, these were typically "big design up front" projects and managed in a sequential manner.

There are samples of "small commercial Java projects" run using Extreme Programming. For each Story, the amount of effort in error fixing, new design, and refactoring was tracked. The data shows that as the system is developed (more user stories are implemented), the average effort tends to increase in a non-trivial rate. Effort in refactoring increases about 5% and efforts toward effort fixing increases about 4%.

What I'm learning is that system complexity plays a great role in the amount of effort needed. By building vertical slices through the system, you slow down the rate of curve by slowly adding complexity instead of adding it in piles. Rather than dealing with the mass of complexity of requirements followed by an extremely complex architecture, followed by an extremely complex implementation, and so on, you start very simply and add on.

What impact does this have on the cost to fix? In the end, perhaps not much. However, it does have the advantages of allowing for more control over complexity (through the management of technical debt). In addition, the frequent deliverables often associated with agile mean that the project might end sooner - rather than delivering "the system", pieces are delivered until the business needs are satisfied or have changed drastically that a new system (and therefore a new project) is needed.

Stephen Kan's Metrics and Models in Software Quality Engineering has a section in Chapter 6 about the cost effectiveness of phase defect removal.

He starts off by citing Fagan's 1976 paper (also cited in Software Engineering Economics) to state that rework done in high level design (system architecture), low-level design (detailed design), and implementation can be between 10 and 100 times less expensive than work done during component and system level testing.

He also cites two publications, from 1982 and 1984, by Freedman and Weinberg that discuss large systems. The first is "Handbook of Walkthroughs, Inspections, and Technical Reviews" and the second is "Reviews, Walkthroughs, and Inspections". The application of reviews early in the development cycle can reduce the number of errors that reach the testing phases by a factor of 10. This reduction in the number of defects leads to reduced testing costs by 50% to 80%. I would have to read the studies in more detail, but it appears that the cost also includes finding and fixing the defects.

A 1983 study by Remus, "Integrated Software Validation in the View of Inspections/Review", studied the cost of removing defects in different phases, specifically design/code inspections, testing, and maintenance, using data from IBM's Santa Teresa Laboratory in California. The cited results indicate a cost ratio of 1:20:82. That is, a defect found in design or code inspections has a cost-to-change of 1. If the same defect escapes into testing, it will cost 20 times more. If it escapes all the way to a user, it will multiple the cost-to-correct by up to 82. Kan, using sample data from IBM's Rochester, Minnessota facility, found the defect removal cost for the AS/400 project to be similar at 1:13:92. However, he points out that the increase in cost might be due to the increased difficulty to find a defect.

Gilb's 1993 ("Software Inspection") and 1999 ("Optimizing Software Engineering Specification and Quality Control Processes") publications on software inspection are mentioned to corroborate the other studies.

Additional information might be found in Construx's page on Defect Cost Increase, which provides a number of references on the increase in defect-repair cost. It should be noted that Steve McConnell, author of Code Complete, founded and works for Construx.

I recently listened to a talk, Real Software Engineering, given by Glenn Vanderburg at Lone Star Ruby Conference in 2010. He's given the same talk at Scottish Ruby Conference and Erubycon in 2011, QCon San Francisco in 2012, and O'Reilly Software Architecture Conference in 2015. I've only listened to the Lone Star Ruby Conference, but the talk has evolved over time as his ideas were refined.

Venderburg suggests that the all of this historical data is actually showing the cost to fix defects as time progresses, not necessarily as a project moves through phases. Many of the projects examined in the previously mentioned papers and books were sequential "waterfall" projects, where phase and time moved together. However, a similar pattern would emerge in iterative and incremental projects - if a defect was injected in one iteration, it would be relatively inexpensive to fix in that iteration. However, as the iterations progress, lots of things happen - the software becomes more complex, people forget some of the minor details about working in particular modules or parts of the code, requirements change. All of these will increase the cost of fixing the defect.

I think that this is probably closer to reality. In a waterfall project, the cost increases because of the amount of artifacts that need to be corrected due to an upstream problem. In iterative and incremental projects, the cost increases because of an increase in complexity in the software.

@AndresF. one of the problems I found in tracking down these citations is what Bossavit described as the "needle in a haystack" problem in the book you linked to. Citing a book is great obfuscation - even if it's still in print when you go to read the citation, you've got a few hundred pages to read looking for the little nugget that backs the citing author's claim. — , Oct 02 '12 at 15:22

James Anderson · Answer 4 · 2012-09-05T08:27:10.130

3

Its just simple logic.

Error detected in spec.

Case : Error found while reviewing UseCase/Function spec.
Actions:
        Rewrite paragraph in error.

Case : Error found during unit test.
Actions:
        Fix code.
        (Possibly) rewrite paragraph in spec.
        rerun unit test.

Case : Error found in integration test.
        Fix code.
        (possibly) rewrite paragraph in spec.
        rerun unit test.
        build new release.
        rerun integration test for whole system.

Case : Error found in UAT
        Fix code.
        (possibly) rewrite paragraph in spec.
        rerun unit test.
        build new release.
        rerun integration test for whole system.
        deploy release to UAT.
        rerun UAT tests.


Case : Error found in production
        Fix code.
        (possibly) rewrite paragraph in spec.
        rerun unit test.
        Build release.
        rerun integration test for whole system.
        deploy release to UAT.
        rerun UAT tests.
        deploy release to production.

As you can see the later the error is detected the more people are involved the more work has to be re-done and in any "normal" environment the paper work and bureaucracy increases exponentially once you hit UAT.

This is all without including the costs a business could incur due to an error in production software (lost sales, over ordering, hacked of customers etc. etc.)

I don't think anyone has ever managed to write a non-trivial system which never had bugs in production, but, anything you can do to catch bugs early will save you time and effort in the long run. Specification reviews, code reviews, extensive unit testing, using different coders to write the tests etc. etc. are all proven methods of catching bugs early.

edited Sep 05 '12 at 08:27

answered Sep 05 '12 at 05:05

James Anderson

18,049
1
42
72

2

This only covers one case: error detected in spec, i.e. an error that is introduced at the very beginning. But errors can be introduced in all stages of development (including post-deployment bug fixing) and fixing *those* errors will be considerably easier because they will probably influence a smaller part of the system. – Konrad Rudolph Sep 05 '12 at 06:54
2

But the problem is bug fixes can have unexpected side effects, so unless you can absolutely guarantee the fix will only affect a particular sub set of components you are stuck with redoing SIT UAT etc. Also the paper trail remains equally burdensome no matter how small the change. – James Anderson Sep 05 '12 at 08:23
2

I’m still not convinced that this shows that bugs will always be more expensive to fix when discovered late. I’d claim that a bug gets more expensive to fix with the time passing *after its introduction*. I.e. a bug introduced late, discovered soon after and fixed is cheaper than a bug introduced very early and discovered early (but with a longer delay than in the first case). At least I could imagine that this is how it works. – Konrad Rudolph Sep 05 '12 at 08:25
@KonradRudolph Could you elaborate? This post is pretty much my understanding as well, and I'm not seeing why time would matter but phase doesn't. To me, the measure of time in a project is your current phase (and sometimes your timeboxed iteration to go through all phases). I don't see the difference between work done in Day 3 of detailed design and Day 300 - the detailed design work product has not been used to make any other work products, so a defect injected in detailed design only exists in one place and only requires a change there. I don't see how the passage of days matters. – Thomas Owens Sep 05 '12 at 10:28
3

@Thomas I’m only hypothesising. But time matters because *most* pieces of code or spec features introduced will influence more components as time goes by, unless they are highly specialised and nothing else will ever depend on them, either directly or indirectly. So a bug that’s around for long, regardless of which phase it’s introduced in, will potentially influence many parts of the system and its removal requires making sure that no other component is broken by that process. – Konrad Rudolph Sep 05 '12 at 11:41
@KonradRudolph I see where you're coming from with that. I'll see if any papers mention this when I'm researching my answer, but it appears that much of the research is done under a waterfall-like methodology. The idea of "defect found in system test" means that "implementation is 100% done", and this is how these figures came out. I think that an iterative/incremental approach adds another layer on top of the increasing cost of the system, which is the increasing complexity (and therefore dependencies), which is indeed mentioned in my answer already. – Thomas Owens Sep 05 '12 at 11:49
@Konrad -- I really do get the point about the time lapse. If enough time has lapsed the programmers have forgotten the intricacies of the code, the analysts have only vague memories of what the feature was for so the "fix" takes much longer. – James Anderson Sep 07 '12 at 01:35

Guy Sirton · Answer 5 · 2012-09-05T04:54:44.613

I believe this is, and has always been, about risk management and economics. What is the cost of reducing the number of defects vs. the present value of the impact of future defects. The trajectory of the yellow bird being slightly off in Angry Birds does not equate the trajectory of a Tomahawk cruise missile being off. Software developers in either project can't make decisions based on that table. In this regard, nothing changes.

The way I think this tends to work is via feedback, expensive bugs in the field cause companies to tighten their quality processes while no complaints from the field cause companies to relax it. So over time software development companies will tend to converge or oscillate around something that works for them (+/-). Code Complete may influence some initial values or may pull companies slightly one way or another. A company that spends too much effort removing defects that no one would notice is probably going to lose business to a competitor that has a more optimized approach. On the other hand, a company releasing buggy products will also go out of business.

Some relevant papers from a quick search (read the complete papers, do more research, and form your own opinion):

A Systematic Literature Review of Software Quality Cost Research (2011)

"While the community has thus developed a sound understanding of the research domain’s structure, empirical validation is often lacking. Only about a third of the analyzed articles presents a case study or more extensive empirical results. This appears to be insufficient for software quality cost research, which strongly relies on quantitative data to generate new findings. There is thus a need for novel approaches to gather quality cost data, as well as stronger cooperation between industry and research to make such data available."

Evaluating the Cost of Software Quality (1998)

"Finally, we have seen that it is important to monitor software conformance and nonconformance costs so that conformance policies can be adjusted to reduce the total costs of software quality."

The cost behavior of software defects (2004)

Abstract ... "The current research attempts to update our knowledge of the way in which defects and the expense of correcting them (or alternatively, leaving them uncorrected) influences the final cost of software" ... "uncorrected defects become exponentially more costly with each phase in which they are unresolved"

Test Coverage and Post-Verification Defects: A Multiple Case Study (2009)

"We also find that the test effort increases exponentially with test coverage, but the reduction in field problems increases linearly with test coverage. This suggests that for most projects the optimal levels of coverage are likely to be well short of 100%."

Bridge the Gap between Software Test Process and Business Value: A Case Study (2009)

score 0 · Answer 6 · answered Sep 04 '12 at 21:48

I cannot answer your first part of the question, as I simply have not checked. But I can formulate an answer to your second question, and perhaps hint at a possible answer to the first.

It should go without saying that some most important factors in the cost of fixing a bug, barring intrinsically hard to use development tools, are the intrinsic complexity of the product, and how well the user can understand that product.

Focusing for a second on code, under the assumption that code is typically written and maintained by developers capable of dealing with the intrinsic complexities of their code (which may not be entirely true and may deserve its own debate), I would dare suggest that of cricital importance in maintenance, and thus in fixing bugs, is the maintainers' ability to understand said code.

The ability to understand code is greatly enhanced by the use of proven software engineering tools that are, unfortunately, mostly under- or improperly used. Using the right level of abstraction, modularity, enhancing module cohesion and reducing module coupling are critical tools in coping with complexity that need proper use. Coding to interfaces, or, in OOP, avoiding the over-use of inheritance over composition, packaging by feature, are some of the techniques that are often given insufficient attention in coding.

I believe that the realities of competition in the industry put a negative force on the employment of quality enhancing methods to developing software, keeping low the intrinsic quality of software as a measure of ongoing success.

Consequently, I believe that, in the industry, software tends to suffer more from bug fixing costs the bigger it grows. In such products, bugs become harder to fix over time because the system becomes harder to understand as it grows. The concerns introduced by each feature are overly coupled with other concerns, making understandability hard. Or, the right level of abstraction was not employed, making it hard for the maintainer to formulate a proper model of the system and reason about it. Lack of documentation certainly doesn't help.

There are exceptions. I'm sure Google isn't functioning at its pace without some solid practices upheld by stellar developers. And others are likely in the same situation. But for a majority of the software, I wouldn't be surprised if the data did in fact confirm the claim in Code Complete.

I stand by my answer even with the negative rating. I recently interviewed a candidate who maintains the online banking tool of one of the top banks. During casual chat, he suggested not to use it, because of heavy copy-paste reuse and otherwise shoddy structure. At a previous job, I was a developer at a company writing analysis tools for banks like Lehman, MS, UBS, and we had to act as domain experts, figuring out the next thing to put on the else branch from at most sparse documentation. Even if in disagreement with the specific practices, the overall message re: industry is true. — Mihai Danila, Sep 05 '12 at 13:40

score -1 · Answer 7 · answered Sep 13 '12 at 01:51

-1

Another Answer! This time to address the title question "Does software morhtodoligy rely on flawed data".

The real answer is "there is no data". As in there is no large reliable body of data on software projects there defects, successes time to market etc.

All attempts to gather such data have been underfunded, statistically flawed, or,, so specific to a particular project it is not possible to derive general conclusions from.

Furthermore I do not think there ever will be, the software development process is too subjective and slippery for strict measurement. The organizations best placed to gather such data (the large software houses and systems integrators) know in their hearts that any figures gathered from their performance would be deeply embarrassing.

The only organizations that publish numbers on the cost and success of software projects
are government departments, and, only then because they have to, and, yes these numbers are deeply embarrassing no matter how much they massage the figures.

So in conclusion all software studies are necessarily purely subjective because there is no real data on which to base an objective conclusion.

answered Sep 13 '12 at 01:51

James Anderson

18,049
1
42
72

1

Nah, I don’t buy this. First off, there *is* data although you may be right that it’s flawed. But this requires an individual critique of each data set, not a general dismissal. And I’m deeply suspicious of the contention that there will never be data, and of such reasons as “it’s too subjective”. That’s essentially an [argument from lack of imagination](http://en.wikipedia.org/wiki/Argument_from_ignorance). I don’t pretend that gathering reliable statistics here is easy, but I contend that it’s entirely feasible. In other fields, way more complicated systems are successfully analysed. – Konrad Rudolph Sep 13 '12 at 06:53
@Konrad -- take something basic and simple like "defect count", some shops count unit test failures, some shops don't start tracking defects till UAT, some shops only track defects in the code, some shops include documentation, configuration, and deployment scripts in their defect tracking process. Does having the wrong background color count as a defect? Some projects will track it as a defect others will ignore it. – James Anderson Sep 14 '12 at 01:43
Those are all parochial – i.e. solvable – problems. They don’t place fundamental constraints on what’s possible, they just add difficulties that require solving. – Konrad Rudolph Sep 14 '12 at 09:00

Does software testing methodology rely on flawed data?

7 Answers7