Are there any actual case studies on rewrites of software success/failure rates?

Question

I've seen multiple posts about rewrites of applications being bad, people's experiences about it here on Programmers, and an article I've ready by Joel Spolsky on the subject, but no hard evidence or case studies. Other than the two examples Joel gave and some other posts here, what do you do with a bad codebase and how do you decide what to do with it based on real studies?

For the case in point, there are two clients I know of that both have old legacy code. They keep limping along with it because as one of them found out, a rewrite was a disaster, it was expensive and didn't really work to improve the code much. That customer has some very complicated business logic as the rewriters quickly found out.

In both cases, these are mission critical applications that brings in a lot of revenue for the company. The one that attempted the rewrite felt that they would hit a brick wall if the legacy software didn't get upgraded at some point in the future. To me, that kind of risk warrants research and analysis to ensure a successful path.

Have there been actual case studies that have investigated this? I wouldn't want to attempt a major rewrite without knowing some best practices, pitfalls, and successes based on actual studies.

Aftermath: okay, after more searching, I did find three interesting articles on case studies:

Rewrite or Reuse. They did a study on a Cobol app that was converted to Java.
The other was on Software Reuse: Developers Experiences and Perceptions.
Reuse or Rewrite Another study on costs of maintenance versus a rewrite.

I recently found another article on the subject: The Great Rewrite. There the author seems to hit on some of the major issues. Along with this was the idea of prototyping by using the proposed new technology stack and measuring how quick the devs picked it up. This was all as a prelude to a rewrite, which I thought was a great idea!

I don't know about case studies, but I think the standard answer for an approach is probably "add unit tests and refactor." — Jerry Coffin, Mar 27 '12 at 16:42
Yes, but if it's a major revenue producing application, do you really want to risk your job on just that? Almost everyone involved in the rewrite disaster is gone from that company and upper management still has a bad taste in their mouth over it. — , Mar 27 '12 at 16:47
It strikes me as quite possibly the lowest risk approach available. Of course, I don't know enough details to be certain it's the right approach, but based on what I know so far, I don't know of much that's particularly likely to be a lot better. — Jerry Coffin, Mar 27 '12 at 16:51
If they do unit tests (properly -- truly capturing all the requirements) it's difficult to come up with a way for refactoring to result in a true disaster. About the worst that can happen is that you don't make as much progress as fast as you'd like. At the same time, if their code base is in as bad of shape as the question implies, chances are pretty good that *serious* effort will be needed to enable progress, regardless of the route taken. — Jerry Coffin, Mar 27 '12 at 17:01
In retrospect I'm convinced. The complexity of the app won't change that. That is the correct answer. That gets to the root of my question. Thanks! — , Mar 27 '12 at 17:10
So many projects are private it would be difficult for a study have a really good sample set. You may be limited to anecdotal evidence. — mike30, Mar 27 '12 at 17:14
Why do you think that rewriting business software would be any different in terms of risk than creating a new system (assuming factors such as technology maturity and staff competency are the same in both cases)? — NoChance, Mar 27 '12 at 17:53
You can find some very similar question here on PSE like this one: http://programmers.stackexchange.com/questions/6255/have-you-ever-been-involved-in-a-big-rewrite or this one: http://programmers.stackexchange.com/questions/6268/when-is-a-big-rewrite-the-answer — Doc Brown, Mar 27 '12 at 20:37
The risk is that nobody really knows or can state why the legacy stack actually works well, so all the unit tests will test the wrong things. The refactor will pass all the tests and fail due to something else that nobody though was important or maybe even thought to be a bug that was To be "fixed". Multiply by magnitudes. Add to that the temptation to 100% "do it right" this time. Many companies are rumored to have hidden their failures from any case study visibility. — hotpaw2, Mar 28 '12 at 03:16
Emmad, because as I said these were revenue producing legacy apps! One of my favorite sayings is: "You don't test code in production!", but I know of two cases where that happened. Sometimes subtle bugs show up at the worst time! — , Mar 30 '12 at 19:53
The problem isn't rewriting the entire codebase. The problem is wanting to rewrite the whole damn thing all at once and then press a button and *plong* all your headaches have gone away. In most cases, you can't afford the time lost to your competitors adding new features, bugs being left to fester, and customers being left to cancel. However much of a complete disaster a codebase is, it should be possible to identify pain-points and modularize the right pieces of the spaghetti out as you go. — Erik Reppen, Jun 15 '13 at 03:20

score 6 · Answer 1 · edited Mar 30 '12 at 16:15

6

I skimmed Working Effectively with Legacy Code by Michael Feathers a while back and found it offered some good insights into the real-world practice of maintaining legacy code, including writing tests (even when you don't know what the code was for) and of course refactoring/rewriting. It's a bit dated but highly rated on Amazon.

edited Mar 30 '12 at 16:15

svick

9,999
1
37
51

answered Mar 27 '12 at 22:27

Will

712
5
12

score 6 · Accepted Answer · edited Mar 28 '12 at 21:02

I can't take credit for these great comments, but they were never put into an answer by the original author, so I'm marking it community wiki.

I don't know about case studies, but I think the standard answer for an approach is probably "add unit tests and refactor."

It strikes me as quite possibly the lowest risk approach available. Of course, I don't know enough details to be certain it's the right approach, but based on what I know so far, I don't know of much that's particularly likely to be a lot better.

If they do unit tests (properly -- truly capturing all the requirements) it's difficult to come up with a way for refactoring to result in a true disaster. About the worst that can happen is that you don't make as much progress as fast as you'd like. At the same time, if their code base is in as bad of shape as the question implies, chances are pretty good that serious effort will be needed to enable progress, regardless of the route taken.

I would assume that rewrite projects aren't significantly different in failure rates than projects in general, and would refer to latest CHAOS report for best info.

code that needs to be rewritten usually is written in a way that is untestable (tight coupling, classes that do too much, etc.) which puts you in the odd position of needing to refactor before you can unit test. — Kevin, Mar 29 '12 at 03:58
Yes, that is why the comment about the book: "Working Effectively with Legacy Code" was very appropriate. I had to do something like this last year and taking a more methodical approach like what is explained in that book would have saved me a lot of time and left a documentation/testing trail that would have been useful. I feel like just getting it to work was great, but wasn't enough. The objects had so many dependencies I felt my testing could have had better coverage. — , Mar 29 '12 at 16:29

score 4 · Answer 3 · answered Mar 31 '12 at 09:22

Speaking from experience and living in a company that has poorly conceived enterprise architecture from the beginning I can honestly say the biggest issue is developing a comprehensive understanding.

This idea that a system can be broken down into pieces and understood individually is flawed. At some point in time a single individual, or several individuals had to be able to conceive of the whole problem in its entirety. If that problem is a series of business problems and the technologies that drive them; it may take a person in the company several years to understand all the systems to a level where replacing them without a disaster or a missed requirement is possible. This was certainly the case at my company when I took over as Director of Technology. If it wasn't for the fact that I am myself a coder I wouldn't have been able to slowly understand all the details of the poorly organized, tightly coupled, tightly bound architecture and integration of technology. If small hidden, undocumented details like "We put the eBay order number in the "SYSOENT.PO_NUMBER" field in the ERP system because that's what Wendell the VB coder from Florida decided to do" are overlooked the results can be disastrous, and the only way to know this is to slowly discover it all.

If someone is asked to replace the engine on an aircraft while in flight, or face certain death - given the proper amount of tools and resources that individual would need to know how to override the sensors, hydraulic system, how to re-route fuel flow, manipulate systems that were never designed to be changed or manipulated externally or from the cockpit. This is often what the prospect of rewriting a business-application is like. The application is usually pounded into the business between many different other technologies.

I guess my primary point is, the system must be understood in nearly it's full complexity, and it must be at some point in time understood in it's completeness in order for the new system to be properly "engineered" and not just "made".

Cookies are made, software (should be) engineered.

+1 for the need to understand so much about the business system up-front. However, as you know, the challenge here is the time and the resources (from the business) as well as the change factors. For medium and large systems, understanding the full complexity up-front is sometimes not always possible. — NoChance, Mar 31 '12 at 09:35

score 2 · Answer 4 · answered Jun 02 '13 at 10:13

There are many examples of companies that died of a rewrite, like Netscape. There are also companies that survived a rewrite without big trouble like twitter.

There aren't any quantified case studies, because it is not feasible to have a control experiment where you look at the business success of not rewriting versus the business success of rewriting. Every application is different.

There are some obvious cases where a rewrite makes sense and many cases where it doesn't. I've cooked up a little recipe to figure out if a rewrite would make sense in your case.

I think that rewrites make more sense nowadays because we're coding swiftly on the shoulders of ever improving invasive frameworks like Rails, Grails, AngularJS. If you want to move from plain js to Angular, a rewrite is about all you can do. It might still make tons of sense. If you're replacing one diy implementation with another (like all the examples in Joel Spolsky's article) you're probably crazy.

jmoreno · Answer 5 · 2012-03-29T01:48:30.163

You aren't going to find many non-biased case studies that are anything other than after action reports -- most people are not going to pay to have the same work done least twice (once as a rewrite, once as an upgrade, best case would be multiple rewrites/upgrades by different teams).

The case study that you found seems to have been produced by Fujitsu, and unsurprisingly the result was that it was better to use Fujitsu's tools.

From an organizational point of view, a rewrite is only clearly a failure when either the rewritten application doesn't work, or the project is canceled before it is finished -- otherwise it's impossible to know whether the delay in releasing the rewritten version was the cause of whatever loss you suffered or merely coincidental. If canceled before finished, it's generally considered a total waste of time and resources. An upgrade can potentially have the same problem, but being incremental is unlikely to be on the same scale (you get something for your money).

Best case from a programming perspective is to do both -- incremental upgrades while rewriting, both supporting and inspiring each other. Unless you have different teams this will naturally take longer.

Note that this provides a rough guideline for when you should consider a total rewrite -- the project is small enough that you can do so easily, or the orgainization is large enough that it can afford to do both, counting the effort or the learning cost worthwhile. Also note that if the rewrite never catches up to the rework, that can mean several things, but all else being equal, it probably means that a rewrite was unnecessary.

Rewrite can be a failure when it runs massively over budget and is cancelled before reaching completion. Money down the drain. The possibility of this eventuality is why rewriting is considered more risky than refactoring. — MarkJ, Mar 28 '12 at 16:15
@MarkJ: I thought that was covered under "doesn't work", but I'll edit to make that more clear. — jmoreno, Mar 29 '12 at 01:35

Are there any actual case studies on rewrites of software success/failure rates?

5 Answers5

Linked