97

I have been in two software product houses for three years in a row.

The first is a small company maintaining a fairly small management system with a monolithic legacy code base (almost twenty years). Tightly coupled code is everywhere without sufficient unit test coverage. However, the management usually does not want developers to refactor the legacy code.

The second is a fairly big company maintaining a big domain-specific system with a huge monolithic Java legacy code base (over ten years). The layered architecture indeed decoupled the infrastructure from the business logic. However, in their business layer, there are also some giant classes with more than 3 thousand lines of code. Developers still continuously inject more and more code into those legacy classes. Developers are allowed to refactor their own fairly new code about adding new features, but are warned not to refactor these giant spaghetti classes, either. Experienced senior developers say that changes or refactoring on those classes might be disastrous due to the lack of regression tests.

However, personally I have read practical books about clean code and refactoring. Most of the books strongly recommend developers to refactor actively. But why in real world companies are against this?

So I would like to collect answers from very experienced developers. Why do these two companies I was in prefer to keep the super legacy code unrefactored? Isn't this disastrous?

Kate Gregory
  • 17,465
  • 2
  • 50
  • 84
Rui
  • 1,569
  • 1
  • 11
  • 18
  • 75
    `Most of the books strongly recommend developers to refactor actively. But why in real world companies are against this?` -- You already know the answer to this. The code isn't covered by unit tests, and it's too brittle to refactor without unit tests to cover it, so instead of doing it the right way and spending the money to write unit tests and refactor, the companies simply say "Do not touch it." It's not necessarily a bad strategy; if the code works, and it's never touched again, then it's never going to break, is it? (fingers crossed). – Robert Harvey Sep 25 '20 at 21:45
  • 2
    but those old classes are still growing gradually, so is the complexity – Rui Sep 25 '20 at 21:47
  • 22
    Has anything bad happened? – Robert Harvey Sep 25 '20 at 21:47
  • 4
    No, but for adding code to those existing classes, developers need time to read and understand those legacy code. Time spent on each developer in understanding the legacy code is growing also – Rui Sep 25 '20 at 21:49
  • moreover, none of those books I have read told developers not to touch the legacy code – Rui Sep 25 '20 at 21:50
  • 70
    There are books, and there is reality. – Robert Harvey Sep 25 '20 at 21:51
  • Let us [continue this discussion in chat](https://chat.stackexchange.com/rooms/113412/discussion-between-robert-harvey-and-rui). – Robert Harvey Sep 25 '20 at 21:51
  • 33
    One thing that some of the answers hint at, but don't directly state is that the existing application is not just an application. It also contains the only documentation for probably thousands of requirements that were implemented, but never written down in the project documentation. It is the product of hundreds of thousands or even millions of hours of testing by real world usage. Nothing that you can do can allow large-scale refactoring without risking breaking things. Sometimes, in coding, two wrongs can make a right, and if you fix one of the wrongs, and not the other, you break things. – Itsme2003 Sep 27 '20 at 20:23
  • 30
    I have yet to see the testing (automated or not) which can compare to running ten years in a live environment. Nothing you can build in a year will nearly be as though as a code having been challenged and fixed for ten years - it has seen corner cases you can't dream of. - It can still be better to build something new, just know that it will take time to become as resilient and battle-hardened as the old junk. – Falco Sep 28 '20 at 12:56
  • 3
    The thing that books tend to neglect is that many things are tradeoffs. Every developer should *want* to refactor things, but also realise when it's not worth it because it's too risky or too time-consuming for too little benefit. – Bernhard Barker Sep 28 '20 at 20:18
  • 1
    Sans continuous refactoring, fixing mildly "off" code necessarily requires more of the same. Over enough time virtually all changes, even trivial, breaks code. Non-fix fixes are OK because customers knew it would not be fixed anyway. Thank goodness for that because most bugs are not reliably reproducible. Refactoring is impossible because the code is anti-code. No structure to work within, It's plasma. I could write a (non academic) rather lengthly pamphlet! – radarbob Sep 29 '20 at 02:57
  • 1
    `Do not touch it. It's not necessarily a bad strategy` I know what I'm gonna say might sound controversial but. Aren't we professionals? Any decaying code is all about professionalism. A lack of it. Even if my company/customer doesn't care, I have to insist in how pernicious is the Tech Debt. For all of us, including the company. Somehow, I came to the conclusion that **no, it's not correct** because if we are not part of the solution, we are part of the problem. It's, to me, a mater of self-esteem. I mean, If I don't give value to my job, I can not expect others to do so. – Laiv Oct 23 '20 at 11:00
  • @Laiv so do you mean I should insist in refactoring? – Rui Oct 23 '20 at 11:08
  • 1
    you should insist on solving technical debt as a part of the development life cycle and as part of your daily routine. The same way you write the test without anybody asking you for it, you contribute with code quality improvements as @Karl Bielefeldt suggests. Starting with small ones toward the same end. Even if you solve 0,1% of the tech debt and improves 0,1% the code coverage, that already makes the project less unsustainable. – Laiv Oct 23 '20 at 11:17
  • `you should insist on solving technical debt as a part of the development life cycle and as part of your daily routine. The same way you write the test without anybody asking you for it, you contribute with code quality improvements` for my this is rather simple to do because I work for a consultancy, so I do estimations including this and if they dislike my estimations, my answer is always the same "then ask for it to someone else because I don't know how to make it faster, cheaper and reliable". – Laiv Oct 23 '20 at 11:22
  • @Laiv But in my current company each commit has to be reviewed by a module owner. For instance, currently I did some refactor commit and then it takes about one week to be merged. Even though honestly the module owner did not check it seriously but gave superficial comments, meaning the code structure is not checked seriously. Probably due to this most developers don't refactor, esp. senior developers usu say "I will not change your code" for others' respect, and managers even think refactor is an execuse for delaying the work. So I eventually decided to leave. – Rui Oct 23 '20 at 11:44
  • 1
    2 things: 1. "I will not change your code" for others' respect", respecting others is not leaving you naked in front of 10years old spaghetti code. That's pretty much as telling you "look this wonderful minefield. Now run". That's as unrespectful as not leaving the toilet as you wish it to find it :-). 2. You decided to leave. That' pretty much (IMO) a way of self-esteem. You didn't want to be part of the problem and I applauded that. – Laiv Oct 23 '20 at 12:21
  • @Laiv the problem is that the manager judges developer's skill merely on whether his/her code work fine **AT THE MOMENT**. Then in the future if the code is difficult to maintain, it is probably the next developer's turn to deal with the mess. As a reuslt, it is difficult to differentiate a senior developer from a junior developer. The manager likes developers who can show and talk more mainly – Rui Oct 23 '20 at 14:48
  • 1
    @Laiv senior guys in the team asked questions like why I made some class without the `public` keyword, and they intend to add some logics into *data objects*, which should not contain any business logic. They like to make a fat REST API controller by including lots of business logics inside here and there. These all make me doubt on their professional skills. But manager have no understanding on the tech and trust the seniors as their code work even though it is difficult to maintain. I intended to learn from the senior developers, but it seems on the opposite. I am actually a bit sad on this. – Rui Oct 23 '20 at 14:54
  • `esp. senior developers usu say "I will not change your code" for others' respect, and managers even think refactor is an execuse for delaying the work. So I eventually decided to leave` - HAH! I've heard that. It's not about respect, man! That's about experience, fear, laziness, and code quality. When someone says that they won't touch your code, it's almost never about respect. They know that if they change something they don't fully understand, they may make subtle mistakes, and if they leave doing the change to the author, the risk of mistake will be smaller. It's just that. – quetzalcoatl Oct 28 '20 at 09:35
  • @quetzalcoatl so is such senior developers' mindset correct at all? – Rui Oct 28 '20 at 09:58
  • 1
    `Three thousand lines of code`. I'm currently sat at a file with 7800 lines, which I am adding to! D: Although the main creator of the file does appreciate my code refactorings, it doesn't hurt to ask if you can split classes across different files, which most languages will let you do. – Connor Blakey Oct 28 '20 at 10:31
  • 1
    In my old firm it was considerd easier to just start again rather then upgrade. – ThirdPrize Oct 28 '20 at 13:31
  • @Rui of course not. But people/humans do a lot of things that are "not correct". I understand this behavior, I see where it comes from, and I see what's the immediate benefits of doing it, and sometimes it is good. If they pointed at my code telling me, explicitely or not, that they don't understand it and won't touch it, that's a hint for me that someting's wrong. Maybe their skill, or maybe my coding style, both hints are positive for me. If I were a junior told to fix my own code, that's (probably) a positive thing as well. I hope you get the overall idea, that this is not 100% bad thing. – quetzalcoatl Oct 28 '20 at 13:41
  • 1
    @Rui But aside of that, making "can't touch this except the author" a general rule is a very bad idea. Down to extreme, it dramatically increases the BusNumber of the project. It doesn't spread the knowledge how things work over people involved. It doesn't allow you to really mantain the thing as a team of people, because, well, obviously, each line of code is owned by this guy or that girl and we know whom to blame. And of course, there's little chance of cleaning out any technical debt, because any person with some free time can only cleanup their own code.. So.. need to use it carefully. – quetzalcoatl Oct 28 '20 at 13:43
  • In my experience, the biggest driver not to refactor is the pressure for more and more new features. But eventually, it get's to the point where the difficulty and moreover the risk of adding new features without refactoring (and unit testing of course) shows that it must be done to move forward. The biggest effort in refactoring apart from the nauseating work of creating unit tests is the segregation of the code behind Interfaces. This causes 1000s of compilation errors which must be wadded through and fixed. The unit testing helps ensure no foul-ups but it must also be system tested too. – Nick Oct 28 '20 at 13:48
  • 1
    By definition you cannot refactor code that is not under test. Without a test you cannot be certain you have not changed its behavior. – mattnz Oct 29 '20 at 00:59
  • I did some research on tackling such problems, this is my current thinking: https://timwise.co.uk/2020/07/09/approaches-to-refactoring-and-technical-debt/ – Tim Abell Oct 29 '20 at 10:00
  • 1
    @TimAbell The link is really greate resource to me! Thanks a lot. I definitely need to learn how to deal with the Big Ball of Mud :D – Rui Oct 29 '20 at 12:28
  • 1
    @Rui I'd add it's important to be aware of perspectives of viewpoints. There are always going to be unique conditions of each organisation with their priorities and resources influencing what effort they put in where. So you have to try and get a reliable source of those priorities. Decisions on things like what's worth refactoring requires input from both management, developers and others besides. Yet, too often the effective communication that's needed between all parties to make those decisions isn't always there, especially in organisations already in that spaghetti situation... – S__ Oct 29 '20 at 18:32
  • 1
    ...I'd say to be wary of developers alone - senior or otherwise - telling you refactoring would be disastrous. Maybe it's the case the software isn't going to be receiving future changes and isn't worth the effort. Other times, developers have been involved in creating the mess over time, and it doesn't affect them to the same degree as others coming in later. Many best practices are about helping other people more than ourselves, or avoiding issues that creep in over time. It's often those less immediate issues that miss peoples attention when they're head down day-to-day doing their thing... – S__ Oct 29 '20 at 18:34
  • 1
    ...It gets easier to spot the good and bad sources of truth with the more projects you work on. If, as you say, the code base is already in a state and yet developers still continuously inject more code into that, then yes that doesn't tend to end well. – S__ Oct 29 '20 at 18:34

9 Answers9

145

It‘s a question of risk management:

  • Refactoring a system always creates the risk of breaking something that worked before.

  • The larger the system, the higher its complexity, and the higher the risk of breaking something.

  • With spaghetti-code (or any other poorly structured code) the real structure of the code remains fuzzy, and the dependencies might be hidden. Any change in one place could easily have impacts anywhere else. This increases the risks of breaking something to the highest level.

  • With TDD, or any other technique guaranteeing a comprehensive set of test cases, you can quickly verify that refactored parts (and dependent parts) still work. Of course, this is effective only with the help of proper encapsulation.

  • Unfortunately, tests are often missing for legacy code, or their coverage or depth is insufficient.

In other words, with large legacy spaghetti code bases, refactoring creates a high risk of breaking something that worked before, and the impact of this risk cannot be reduced with automated tests. The significant risk of refactoring simply outweighs refactoring benefits in this case.

Additional remark: An alternative approach at a lower risk is: don't touch the running system, but implement new and replaced features with state of the art testable code and clear boundaries. This more evolutionary approach is not always feasible, but it can provide significant short term and long term benefits.

Robert Harvey
  • 198,589
  • 55
  • 464
  • 673
Christophe
  • 74,672
  • 10
  • 115
  • 187
  • 7
    Well said, and I would add that we as software developers often adopt an all-or-nothing stance on software quality that is unlikely to be appreciated by the business folks. A perfectly-designed, state-of-the-art system makes a lot of money, but a crappy, outdated system still makes *some money* - money that is at risk of stopping flowing when we tinker with the existing codebase too much in the name of code quality – crizzis Sep 26 '20 at 16:57
  • 48
    A perfectly designed, state-of-the-art system first needs to pay for itself. But the crappy outdated system already did, so it actually makes *more money*, especially in the time-frame that decides the next bonus. – Jan Hudec Sep 26 '20 at 19:51
  • 9
    Note that even one of the most comprehensively tested piece of code in the world, [sqlite](https://sqlite.org/testing.html), had a couple of bugs fixed in each release. So while decent test suite is a requirement to be able to touch the code without breaking it to pieces at all, it still does not eliminate the risk completely. – Jan Hudec Sep 26 '20 at 19:55
  • 2
    @JanHudec it’s not only about money: It can also be continuity of essential services. And indeed, the tests reduce the risk significantly but can not eliminate the risk completely ;-) – Christophe Sep 26 '20 at 20:24
  • 4
    @Christophe: Nor is it just about the code working or not working. If you replace working code X that's been around quite a while with new code Y that works (in the sense that results are correct) but does things differently, you irritate and/or confuse your user base. – jamesqf Sep 27 '20 at 04:46
  • I'd accept that the risk can outweigh the refactoring benefits. But If you constantly need to work at or around the legacy code that adds risk and cost too and that easily can outweigh the refactoring risk, it is just hard to quantify sometimes especially since many managers work based on the assumption "we won't touch that ever again and it will keep running" whereas many developers know that the likelihood it needs to be touched either because requirements change or simply because security issues appear can be substantial and is often under-estimated. – Frank Hopkins Sep 27 '20 at 19:02
  • @FrankHopkins I understand. i just explain the general case. And I’m not dogmatic at all: when I say is that it’s a risk/ benefit tradeoff, I also mean that if you can convince that the odds are the other way round, you have better chances to get heard. My additional remark offers in this regard an evolutionary path: if you end up with more and more new code that is well engineered, and less and less radioactive code, you may at some point reach the level where it could finally lake sense to refactor some older parts in contact with the new one, and there goes the dominos ;-) – Christophe Sep 27 '20 at 19:26
  • @Christophe ah I totally like your additional remark, I just felt the statement that the risk outweighs the benefits sounded a bit fixed/general. So that was merely meant as a suggestion to make the importance of context more clear. And as a reference hint, the evolutionary approach you suggest should match the strangler pattern (mostly used in microservices but applies generally imho), you might want to include that as a reference or distinguish your approach from it. All just suggestions, feel free to take whatever makes sense to you ;) – Frank Hopkins Sep 27 '20 at 19:46
  • 6
    @jamesqf That is very prescient. I've worked on projects that replaced existing systems (as per the additional remark). In practice, they always have to be bug-for-bug replacements. In situations where it would be significantly more work to replicate a bug than to just leave it out, you generally have to engage with users and other stakeholders to ensure they're happy with the change. – James_pic Sep 28 '20 at 09:19
  • 3
    AKA before you refactor to round out those square wheels, make sure you [check the ground they're rolling on](https://en.wikipedia.org/wiki/Square_wheel#/media/File:Rolling-Square.gif) – Tacroy Sep 28 '20 at 17:24
  • @Tacroy Excellent! First time that I see that! – Christophe Sep 28 '20 at 17:27
  • @James_pic: It's not even re-implementing old bugs that everyone's used to, it's pointless changes. E.g. seems like every year or two my various on-line things like bank & credit card web sites decide to "improve" their interface. Everything I do worked correctly before the change, but the new version puts things in different places, maybe no longer supports my preferred browser, It may be correct, but it's a PITA to deal with the changes. – jamesqf Sep 29 '20 at 06:50
  • I fully agree, and I would add that exposing your product to users (or production uses, whatever they may be) is a practice of (bad) testing. And sometimes that spaghetti structure comes from make piecemeal fixes to real issues. Testing the old system may be a hard-fought battle of testing and fixing. That can be mitigated by good TDD from the start, but never eliminated. – xaviersjs Oct 28 '20 at 17:50
  • 2
    Plus, how many of us have thought "The whole problem is this mess..." and then written an implementation that had just as many issues, just not the ones you've seen already? I know I've done it more than once. – xaviersjs Oct 28 '20 at 17:51
  • Just to add on risks: you are evaluating only "technical" risks (e.g. only from the product pov). I think you should factor in others. The first that pops in my mind is: What are the chances that good programmers get frustrated about the s**t code and go away (together with the knowledge of what the spaghetti code does)? And what are your chances to get a new good programmer work on old code? As I see, programmers are a scarce resource now, so they can choose, and they won't choose a workplace where they are not allowed to work properly. – marco6 Oct 29 '20 at 13:23
  • 2
    @marco6 Interesting. Indeed, my examples are about technical risks. Nevertheless, these translate directly into business risks (imagine for example the consequences of an airline system down for an hour). But you are right: risks and benefits need to be evaluated globally (e.g. also considering security risks and sourcing risks); my final conclusion is neutral in this regard. This being said, job attractiveness is not only about the task content ;-) – Christophe Oct 29 '20 at 13:39
  • 2
    @Christophe sourcing is the first that popped in my mind, there are others. Of course, there are some companies (like airline systems) where breaking something in production may risk lives. I don't think the majority of programmers are working in such companies though. I agree with you in this regard: **risks and benefits need to be evaluated globally**. – marco6 Oct 29 '20 at 13:48
  • 2
    On top of that, there is the risk that the "old" system has bugs which are compensated on a higher level. Replacing the "old" system could have repercussions on the higher level code base. This implies that a top-down or down-top refactoring does not always work if there is bug-compensation in top-level code. It only works when you rewrite the entire codebase in a single go ... Sometimes it is better to close your eyes and move on. – kvantour Oct 30 '20 at 11:41
100

One reason is it's really difficult to measure the loss of productivity the messy code is causing, and difficult to estimate the work it will take to clean it properly and fix any regressions.

The other reason is many developers incorrectly call any rewriting refactoring. True refactoring is very methodical, almost mechanical. You make very small changes that improve the structure but don't change the behavior, and you verify the behavior didn't change using automated tests. If there aren't good tests, you add them first.

It makes experienced developers nervous when someone says they don't understand some code so they need to rewrite it. Changing code you don't understand is a recipe for disaster unless you are making dead simple changes.

So if you want to refactor some code, make your change so dead simple that anyone reviewing the pull request can instantly see the behavior is preserved. My first pull request in really messy legacy code is usually nothing but tests. My second pull request is purely mechanical changes like extracting repeated code into a function, and more tests that take advantage of those changes. Then on the third pull request I might start to rewrite some (now much smaller) functions for clarity, now that I have thorough tests. By this time, I have a fairly thorough understanding of the code, with all its quirks, and on the fourth pull request, I might make changes that affect the bigger picture.

If someone tried to skip straight to the fourth pull request, I would argue strongly against it, whereas I've never seen anyone argue against a pull request that just adds tests. If they won't let you make a high-risk change, make a low-risk change that moves you in the same direction.

Karl Bielefeldt
  • 146,727
  • 38
  • 279
  • 479
  • 52
    "Changing code you don't understand is a recipe for disaster unless you are making dead simple changes." – This is a variation of [*Chesterton's Fence*](https://www.chesterton.org/taking-a-fence-down/). If you can't see why something is there, don't touch it. Only when you understand why something is there, can you delete it. Also, [this story](http://www.catb.org/~esr/jargon/html/magic-story.html) from the Jargon File is relevant. – Jörg W Mittag Sep 26 '20 at 05:00
  • For the part of code I have refactored, I indeed have understanding on it as I have been maintaining the legacy code base for years. – Rui Sep 26 '20 at 07:32
  • To add to the measurement problem, Measuring the correctness of your changes is often mighty difficult. This is especially true if the company hasn't been spending the money to maintain the kinds of testing and requirements which make it easier. – Cort Ammon Sep 26 '20 at 17:49
  • As you say, it is often unclear what negative impact the legacy code has. At the same time it is obvious that refactoring will occupy resources lacking elsewhere, and as always the initial estimation probably underestimates the effort. This can be combined to the statement that *the net benefit of the refactoring is not apparent to management,* and indeed the management may be right! – Peter - Reinstate Monica Sep 27 '20 at 12:55
  • Re *"many developers incorrectly call any rewriting refactoring"*: Yes, indeed (as refactoring has now become cool - refactoring is a synonym for "good"). Often what is really (incorrectly) meant is any improvement that is not adding new features: restructuring, ***changing/improving*** interfaces, and removing parts of the system that ***seems*** not be used any more (it is also now cool to delete code, even if you don't fully understand it). (Real story - I wasted more than one year of my work life on this account.) – Peter Mortensen Sep 28 '20 at 04:01
  • Note that this mechanical method of refactoring spaghetti code may be the most efficient way to gain an understanding of what the code does. Sufficiently huge spaghetti code is impossible to comprehend, but once you start factoring out the repetitions, sorting them into classes, turning bunches of features into abstractable behavior, you start to see structure. Nevertheless, it may be a giant effort, and it may leave you owing the company (http://www.geekherocomic.com/2008/10/09/programmers-salary-policy/index.html). – cmaster - reinstate monica Sep 28 '20 at 13:26
  • *True refactoring is very methodical, almost mechanical* Not really, you can *refactor to patterns* (e.g. a book by Joshua Kierievsky) which is neither methodical nor mechanical. It requires a deep understanding of what the code does and what pattern it tries to reinvent. – Wiktor Zychla Oct 22 '20 at 16:02
  • "*My first pull request in really messy legacy code is usually nothing but tests*" - But what if the code was never designed to be testable? – J. Mini May 03 '23 at 18:07
50

It depends on your definition of "correct practice".

I'm currently working on said old spaghetti code, much of it is old enough to drink. It's a critical safety system.

Changes to this code have to be audited and signed-off by an independent third party. The owners of the company consider it to be "correct practice" for the company not to budget money on re-factoring this code because it's working the way it is and has been for years.

And yes, some of the code is pretty much garbage, but it's well-tested and trusted garbage. We don't disassemble and re-assemble rivetted bridges just because nowadays nuts and bolts are best practice.

Rivets

Also, what's considered "best practice" in software engineering changes with the weather. Another 10 years and goto-spaghetti might be back in fashion ;)

Peter Mortensen
  • 1,050
  • 2
  • 12
  • 14
Kingsley
  • 653
  • 4
  • 6
  • 7
    +1 Nice answer with practical real-life arguments :-) About your provocative final statement, I doubt that goto with global variables will be in again. But after all, the spaghetti was not the initial aim of former design: it emerged over time. So who knows: we might well discover in 10 years that the one or the other modern practices from today also had drawbacks that were not yet anticipated ... – Christophe Sep 27 '20 at 18:25
  • 8
    we don't keep adding new pillars to old bridges either, often enough we build a new bridge to temporarily replace the old bridge, then rebuild the old bridge with modern tech and remove the temporary bridge again. @Christophe and most refactoring isn't about introducing new tech, it's about redesigning once multiple changes messed up the original intend of the code and made it harder to read and understand. So yes keeping old systems running, no issue, but the more you need to modify them regularly the more a refactoring makes sense. Might make the auditing even cheaper in such a case. – Frank Hopkins Sep 28 '20 at 02:48
  • 3
    @FrankHopkins the analogy with construction work has its limits. Take the Eiffel tower: it’s a legacy that was initially supposed to be temporary. Maintenance changes periodically parts because of corrosion. There is no equivalent in software: software doesn’t corrode. And it’s not refactoring because the structure remains unchanged. There were however extensions, like additional elevators, which are added with the latest technology. But it’s made in a way that the overall structure is changed as little as possible. When elevators can no longer be maintained, upgraded, then the are replaced. – Christophe Sep 28 '20 at 06:31
  • 5
    @FrankHopkins so what happens is that over time, the legacy remains unchanged. It’s extensions and the parts in contact with the extension are reengineered more often. But as said, although we can see some similarities, software is not concrete nor iron, and there’s still a lot if cobol out there ;-) – Christophe Sep 28 '20 at 06:36
  • @Christophe Software corrodes like hell. It is called software rot: https://en.wikipedia.org/wiki/Software_rot – Eriks Klotins Sep 28 '20 at 21:00
  • 5
    @EriksKlotins interesting thought, but a different phenomenon. Corrosion happens to a material without anybody making a change to it: iron of the bridge just rusts when in contact with its environment, without the intervention of any black-smith. The software doesn't rot just because it's executed and input is thrown at them by the user. [About the dangers of analogy in engineering and science](https://invertedpassion.com/thinking-in-analogies-is-dangerous/) – Christophe Sep 28 '20 at 21:18
  • @Christophe For another bad analogy, a computer doesn't "think" like a human brain. Human memory is nothing like computer memory. AI goals are NOT to re-create how a human thinks, because nothing about human thinking is anything like a computer processor. Here is an excellent paper on [your brain is not a computer](https://aeon.co/essays/your-brain-does-not-process-information-and-it-is-not-a-computer) – Nelson Sep 29 '20 at 04:53
23

I once had the joy of watching someone “refactoring” some legacy code that I had written, about two years earlier. My code was complicated, because it covered about two dozen corner cases that had been found through intensive testing. Each corner case handled was heavily documented.

The new code was a beauty. It was a quarter of the size, and very easy to understand. I took an old version of the code for reference and tried the first corner case. It didn’t work. I showed the new developer and he couldn’t understand why it didn’t work. I showed him exactly the code which made it work and which he had removed, ignoring the comments. Then I tried the second corner case and it didn’t work. I left him to it. I was on a different project now, and that was between him and his manager. Never checked what they did.

Now that happened with the author of the original code still there. Now imagine ten year old code, where the original author is gone, the guy working with him is gone, the one picking up after him is gone, and nobody knows what the code does and why.

I’m not saying you can’t refactor it, but you need someone who is really, really really good or you will break things in fantastic ways.

gnasher729
  • 42,090
  • 4
  • 59
  • 119
  • 2
    Nice real life story. In what way were the hidden cases manifest? Was it an artefact of the language used, or awkward 'business' logic. I.e. how did the side-effects and side-causes get into the system - could the code ever have been pure functional code? – Philip Oakley Oct 28 '20 at 11:44
  • 4
    I voted your answer up because it is fundamentally the same as what I was going to say. I hope you will consider editing to explicitly call out that _Formal requirements capture only that which is known in advance. Bugfixes, performance tweaks and modifications that extend or change the planned behaviour are absent from the spec. Often these augmentations are done in a hurry and are totally undocumented. Unless done by the same team while the issues are still large in their minds, a do-over abandons all this hard won knowledge and experience._ – Peter Wone Oct 29 '20 at 05:10
  • 2
    Probably the new coder should have wrote unit tests or some kind of regression tests from your old code comments. Then updated the new code till they passed. – Raul Nohea Goodness Oct 29 '20 at 20:27
  • 2
    How the other coder did not find he broke those edge cases? You did not write unit tests for them? Unit Test > Documentation! :D – FuryFart Nov 02 '20 at 09:44
12

How do you define 'correct'?

I have seen more disasters when people tried to fix something that was not broke by fixing spaghetti code.

There are times when you will be forced to make changes.

Some folks try to totally rewrite it 'correctly'. Those attempts usually take longer and have more problems than they estimated; and sometimes they fail completely.

I have been there, done that, and I say if it ain't broke, don't fix it. Leave it for the next schmoe to worry about after you have moved on to a better job somewhere else.

Peter Mortensen
  • 1,050
  • 2
  • 12
  • 14
yukfoo
  • 129
  • 2
7

You need a reason to be changing any code. That reason may be as simple as dead code (unreachable), making it easier to read (loops to LINQ for instance) or as complex as refactoring multiple modules for multiple interrelated reasons.

When refactoring important and large blocks of code, the first thing to determine isn’t how desirable or even necessary the change is, the first thing you want to determine is how safely the change can be made. The more important the code is to the business the higher the standards for safety. An existing bug has a known cost associated with it, and it hasn’t yet been fatal to the organization. You don’t want to replace it with a bug that drives you out of business.

A procedure that is several thousand lines, and is extremely valuable to the organization (and not just initializing a bunch of ui controls position and values), is not something that should have more than trivial changes done without first considering the risk.

That said, the first change I would consider making for such a large procedure is breaking it down into sub procedures, if it’s 3500 lines, and you can turn that into 35 sequential procedures: proc1, proc2..proc35, I would consider that a win.

In short, the decision to leave it alone or not is not directly related to the code quality, it’s a function of weighing risk vs reward. If you can eliminate the risk, then the reward is irrelevant, conversely, the risk could be so great as to again make the reward irrelevant. In most cases the risk will be moderate and the direct benefits will be minimal. Which is why the changes get nixed. If you want to make changes, work on making them risk free. Smaller procedures, unit test, adding documentation so it is better understood and less opaque.

jmoreno
  • 10,640
  • 1
  • 31
  • 48
  • 4
    +1 for your first paragraph. A company doesn't exist to develop software; it exists to *make a profit*. If there's no return on investment in refactoring, or if the return isn't enough to justify the investment, then it shouldn't happen. I have exactly this at work - there's a laundry list of new features I'd like to add, but there's only me to do it, and I have other commitments too besides just cutting code. That's why management exists, to work out what's a priority and what isn't. – Graham Sep 27 '20 at 12:50
4

In addition to the good answers I'd like to add something from my experience.

A rewrite is not always feasible for different reasons, for instance if you need to still add new features continuously. So if you need to take a step-by-step approach for refactoring you will have to accept that there will still be legacy code in place for a long time (or even forever depending on the size of the codebase and lifetime of the software).

So most of the time you have to make trade-offs where to invest your time when performing refactorings because a huge code base cannot be refactored at once.

In this case I would be pragmatic and start with refactoring in places where the highest benefits are expected. This could be, for instance, extremely buggy parts of the code or parts of the system where changes happen a lot.

If changes happen a lot refactoring of that part can provide great benefits as clean code can easier be changed than messy code.

Code that does not change often and is considered correct (i.e. does not have known bugs) should be touched at last on the refactoring journey. Or maybe it will never be touched at all...

I think most companies want to avoid unnecessary risks. As refactoring legacy code can be risky the risk should be taken for the right reasons and for code where it pays off at first.

Andreas Hütter
  • 490
  • 2
  • 5
2

From my experience, a long spaghetti code is a result of

  • reinventing things that can be done easier, e.g. using built-in features. I once audited a farily big Javascript app where a lot of existing Javascript core features were implemented using custom functions and even these were used inconsistently.

  • reinventing design patterns that have their idomatic implementations or should be based on well-known specs. I often see DI containers done from scratch in a way you need a while to realize what it's all about. Another example - the Observer pattern, reinvented almoast as if it's there in every Joe's subconsciousness. Another example - SSO protocols, it's been like 15 years when SAML was proposed, OAuth2 and OpenIDConnect are here also for a while but no, it's "Joe did it, he doesn't work here anymore and we are afraid to touch it"

  • not following SOLID GRASP recommendations. Design Patterns? Nope, it's worse. It's like a 3000 line of code refactored to 30 methods of 100 lines named almost like Foo1 to Foo30 where each FooN calls FooN+1 in its last line

  • having zero or not enough unit tests to cover basic and corner cases so that you could pretty much do whatever you want with the code and just look if tests pass. Instead, with insufficient tests, people are afraid that the code does weird things in corner cases and someone relies on these corner cases. And still somehow people are afraid to recreate these cases in unit tests so that they could refactor the code.

There's always quality vs cost/time and it's not that you should take care of everything. But any critical part could easily be isolated, tamed with unit tests and then, refactored. A good rule of refactoring is refactor until you are satisfied and then change each new bug into a new unit tests that covers this buggy case. With this in mind, I did at least couple of nasty refactorings where something completely messy turned into something that it totally under control.

  • Thanks a lot, your suggestion is indeed pretty practical and I most of the time am practicing in such approach :) – Rui Oct 23 '20 at 11:09
1

I agree with the answer of Karl Bielefeldt, there is just a small point to add to it.

In a massive monolith of spaghetti code often a partial refactoring is impossible. Bits of code are so tightly coupled that one change in one point requires one or many changes to align the code in other points in a chain of changes so big that you easily go beyond what is reasonable in the modern agile process. So to change the code people prefer to wait for a change justified by the business in some way.

FluidCode
  • 709
  • 3
  • 10
  • 6
    Even extremely spaghettified code (with *conditional* goto statements depending on some *global* variables set far apart and whose values depend on other conditional goto statements) can be *mechanically* refactored (as Karl Bielefeldt mentioned) as the first step, putting linear sequences of code into functions/procedures. It may not be possible to find meaningful names for the functions at this first step, but that may be done later. Subsequent changes to the code are much easier and it may be possible to isolate the spaghetti part from the more structured part. – Peter Mortensen Sep 28 '20 at 04:33
  • (I have worked on such a system (freelance), porting from one kind of system to another, translating to another programming language in the process.) – Peter Mortensen Sep 28 '20 at 04:33
  • One of the 'mechanical' methods is described on "The Mikado Method". – Philip Oakley Oct 28 '20 at 11:46
  • 'Impossible' is too strong a statement. It is certainly conceivable, however, that the cost/effort required to reliably change such code is so large compared to the benefits derived, that it's not worth doing. – Steve Kidd Oct 30 '20 at 05:08