149

I am dealing with a pretty big codebase and I was given a few months to refactor existing code. The refactor process is needed because soon we will need to add many new features to our product and as for now we are no longer able to add any feature without breaking something else. In short: messy, huge, buggy code, that many of us have seen in their careers.

During refactoring, from time to time I encounter the class, method or lines of code which have comments like

Time out set to give Module A some time to do stuff. If its not timed like this, it will break.

or

Do not change this. Trust me, you will break things.

or

I know using setTimeout is not a good practice, but in this case I had to use it

My question is: should I refactor the code when I encounter such warnings from the authors (no, I can't get in touch with authors)?

kukis
  • 1,352
  • 2
  • 10
  • 11
  • 157
    Congratulations! You are now the author of the code. –  Apr 06 '17 at 16:22
  • 12
    You cannot fix it until you know both what it is supposed to do and what it actually does (you need to know the latter in order to understand the full consequences of your changes.) The problem appears to be in synchronization, and removing such problems should be refactoring job #1, or else things are just going to get worse with each change. The question you should be asking is how to do that. There is a fair amount of language- and platform-dependency in that question. e.g: http://stackoverflow.com/questions/3099794/finding-threading-bugs – sdenham Apr 06 '17 at 22:34
  • 14
    Following up on @sdenham's comment: if you don't already have automated unit tests which exercise your code base I suggest that you spend your time creating such unit tests, instead of wasting time on a "refactoring witch hunt" with AFAICT no clear goals in mind. Once you have a suite of unit tests which *thoroughly* exercise your code base you'll have an objective way to know if your code still works if/when you have to rewrite bits of it. Best of luck. – Bob Jarvis - Слава Україні Apr 06 '17 at 23:00
  • 1
    @BobJarvis In general, I agree, but in addition, concurrency bugs are particularly hard to test for, which is one reason why I think this is probably the most urgent problem. – sdenham Apr 06 '17 at 23:25
  • 17
    If the authors are so paranoid that the code will break if someone so much as breathes near it, it's probably already broken and the undefined behavior just happens to work out the way they want it to at the moment. That first one in particular sounds like they've got a race condition that they've kludged so it kind of works most of the time. sdenham has it right. Figure out what it's supposed to do and don't assume that's what it actually does. – Ray Apr 07 '17 at 02:18
  • Change it, see how it's broken, then fix that. Now you've cleaned up the edge-case bug that the previous developer band-aided. Congrats. – Ethan The Brave Apr 07 '17 at 15:13
  • 7
    @Ethan It might not be that simple. Changing code may break it in ways that aren't immediately apparent. Sometimes it breaks long after you've forgotten that this present refactor could have caused it, so all you have is a "new" bug with mysterious cause. This is *especially* likely when it's timing-related. – Paul Brinkley Apr 07 '17 at 17:03
  • @PaulBrinkley *Job security* –  Apr 07 '17 at 17:50
  • 5
    In my experience working with such "legacy code," I would recommend getting everything under source control, identifying seams, implementing *tests* so that you make changes confidently, and then and only then begin refactoring. I found the book "Working Effectively with Legacy Code" to be an excellent resource on this topic. – StockB Apr 07 '17 at 19:11
  • 3
    It's going to break when new functionality is added anyway. Better to have visibility and clarity earlier than later. – Dave Newton Apr 08 '17 at 05:10
  • 2
    The code comments you've quoted read like the actual code or the problem it is suppoed to solve are not well understood. Setting timeouts is not bad practice. – user207421 Apr 08 '17 at 07:22
  • 2
    @EJP There are indeed valid uses for timeouts, but these comments leave little doubt that here they are being used for (or, rather, instead of) synchronization, and that is often ill-advised. It is certainly possible that the "cannot add features without breaking something else" problem could be caused by this. It would help if the questioner was able to tell us something about the concurrency architecture of the program(s), and also the language(s), platform and environment. – sdenham Apr 08 '17 at 17:29
  • 5
    At least you do not see comments like "Ask Frank.", where Frank has no idea what is being asked. – BЈовић Apr 10 '17 at 05:06
  • This is a "don't push the red button" moment – br3w5 Apr 11 '17 at 11:13
  • Check the version control history to see when that line was added? – user253751 Feb 28 '18 at 05:16

13 Answers13

225

It seems you are refactoring "just in case", without knowing exactly which parts of the codebase in detail will be changed when the new feature development will take place. Otherwise, you would know if there is a real need to refactor the brittle modules, or if you can leave them as they are.

To say this straight: I think this is a doomed refactoring strategy. You are investing time and money of your company for something where noone knows if it will really return a benefit, and you are on the edge of making things worse by introducing bugs into working code.

Here is a better strategy: use your time to

  • add automatic tests (probably not unit tests, but integration tests) to the modules at risk. Especially those brittle modules you mentioned will need a full test suite before you change anything in there.

  • refactor only the bits you need to bring the tests in place. Try to minimize any of the necessary changes. The only exception is when your tests reveal bugs - then fix them immediately (and refactor to the degree you need to do so safely).

  • teach your colleagues the "boy scout principle" (AKA "opportunistic refactoring"), so when the team starts to add new features (or to fix bugs), they should improve and refactor exactly the parts of the code base they need to change, not less, not more.

  • get a copy of Feather's book "Working effectively with legacy code" for the team.

So when the time comes when you know for sure you need to change and refactor the brittle modules (either because of the new feature development, or because the tests you added in step 1 reveal some bugs), then you and your team are ready to refactor the modules, and more or less safely ignore those warning comments.

As a reply to some comments: to be fair, if one suspects a module in an existing product to be the cause of problems on a regular basis, especially a module which is marked as "don't touch", I agree with all of you. It should be reviewed, debugged and most probably refactored in that process (with support of the tests I mentioned, and not necessarily in that order). Bugs are a strong justification for change, often a stronger one than new features. However, this is a case-by-case decision. One has to check very carefully if it is really worth the hassle to change something in a module which was marked as "don't touch".

Doc Brown
  • 199,015
  • 33
  • 367
  • 565
  • 70
    I am tempted to downvote, but I'll refrain. I have seen this mentality on multiple projects lead to a much worse product. Once failures finally happen, they are often catastrophic and much more difficult to fix due to their entrenchment. I regularly fix the sludge I see, and it seems to cause at worst as many issues as it fixes or at best leaves no known issues; it is overall a huge plus. AND the code is then better for future development. A stitch in time saves nine, but a problem ignored can get worse. – Aaron Apr 05 '17 at 16:50
  • 98
    I would argue that any code marked with a comment "Time out set to give Module A some time to do stuff. If its not timed like this, it will break" *already* has a bug. It's just luck of the draw that the bug isn't being hit. – 17 of 26 Apr 05 '17 at 17:38
  • 10
    @17of26: not necessarily a bug. Sometimes when you're working with 3rd party libraries you get forced to make all kinds of "elegance" concessions in the goal to get it to work. – whatsisname Apr 05 '17 at 19:47
  • 30
    @17of26: if a module in stake is suspected to have bugs, adding tests before any kind of refactoring is even more important. And when those tests reveal the bug, then you have a need to change that module, and so a legitimation to refactor it immediately. If the tests don't reveal any bugs, better leave the code as it is. – Doc Brown Apr 05 '17 at 19:50
  • 17
    @Aaron: I don't understand why you think adding tests or following the boy scout principle consequently would lower the product quality. – Doc Brown Apr 05 '17 at 19:54
  • 2
    I am tempted to upvote but I can't because after the steps are taken you have listed, the next step is to refactor all code commented as described in the question. This is because added any new feature could change the race condition timings and introduce a bug - except you won't know thats whats going on, you don't know what timing is off, etc. – Harrichael Apr 05 '17 at 20:50
  • 4
    +1 to the recommendation for [Feathers' _Working Effectively With Legacy Code_](https://www.amazon.com/Working-Effectively-Legacy-Michael-Feathers/dp/0131177052). I read it just as I started working with a codebase that's been continuously developed for 20years and is still responsible for >>100000 trans/sec and >1B$ a year revenue. At first I thought it was somewhat over the top - such horrible things as he describes could never happen! Now, 1yr later ... _I know better_. An excellent book. – davidbak Apr 05 '17 at 21:23
  • 9
    @whatsisname that's a bug, it's just not *your* bug. – user253751 Apr 06 '17 at 04:53
  • 3
    @Aaron so you'd probably break existing code, use (note that I do not say "waste") company's time and money without permission, make someone else life harder because of this _unasked_ change will need to be merged in another supported version (and changes will be harder to track).Why? What Doc suggested is The Right Way to handle this problem in real world. Everyone will be more happy if you document why changing "this and that" will break things and if you write tests to ensure everything works. **There will be a time** for refactoring but it seldom can be an "open" task for large code bases – Adriano Repetti Apr 06 '17 at 08:59
  • 2
    +1 For creating tests before doing any refactoring at all. You have to be able to tell if you're breaking something. – Pierre.Sassoulas Apr 06 '17 at 09:12
  • 1
    I have mixed feelings about this answer. Making a code change should never be a "forbidden" thing, but you should be knowing exactly how the code you're changing works. Writing tests and fixing immediate bugs is one of the best things you can do to learn, but once you finish doing that you should be able to change the codebase without issues. The problem is NOT about changing code, but about not knowing what you're changing. – BgrWorker Apr 06 '17 at 09:48
  • 4
    It would be nice to just fix any bugs you find, but you actually have to be really careful about even that. I've fixed bugs that were there for so long that users expected the behavior. Yes, I really have removed "features" by fixing bugs. It's a nightmare and I wish OP the best of luck. – RubberDuck Apr 06 '17 at 10:04
  • 1
    @whatsisname: Alternatively, in embedded systems many components will have certain timing specifications (sometimes exact times, and sometimes ranges). Sometimes when trying to feed data to a device which can accept it *almost* as fast as a processor could feed it, polling for readiness before each byte can almost double execution time compared with padding things just enough that the best-case processor performance and worst-case hardware performance would yield acceptable timing. – supercat Apr 06 '17 at 19:09
  • 3
    @DocBrown Apologies; I did not elaborate enough on my complaint - I can see now that you can't tell what exactly I'm complaining about at first glance. I have no problem with the test cases. My problem was with the mentality of "If you don't see any problems, leave it alone." Even if no tests turn up any problems, crap code often has them in spades lurking in the darkness, and many of your end users are likely aware of them even if you are not. – Aaron Apr 06 '17 at 21:38
  • @AdrianoRepetti Fortunately for me, it is with permission, at least to the extent that I do it. Nobody else's time is made harder, as I merge in these changes at the same time (not necessarily same commit) as my other work. Not sure why you think changes would be harder to track; harder than what? "Why?"??? Because otherwise you get *catastrophic* failures in a few years that have all the problems you mention times 10. "There will be a time"?? The time is earlier rather than later. The sooner the better; it uses less time and makes everyone's life easier now and in the future. – Aaron Apr 06 '17 at 21:43
  • 1
    @Aaron I think this is due to the issues being architectural in nature. If you can't change( fix/modify/refractor ) one module for fear of it breaking things in countless unknown others, there are serious architectural issues that can be addressed that aren't "Wasting time". The downside is it really requires you to absorb entire subsystems or even the entire codebase and redesign it. These things can't really be fixed in an adhock way. Often even, refractoring bad sections as you encounter them is horrible because you start introducing inconsistencies. – 8bitwide Apr 06 '17 at 21:59
  • While I agree with your first paragraph in general, this specific case clearly has systemic synchronization problems. Because concurrency bugs have global scope in their power to break things, and are very difficult to test for and diagnose, I not only think there is a strong case for fixing them now, but a strong case for fixing them before any other refactoring. – sdenham Apr 06 '17 at 22:48
  • 1
    @BgrWorker: "Making a code change should never be a "forbidden" thing": I agree, but there should always be a precise, concrete reason for changing code, as the answer suggests. I see way too often that programmers change code just to make it fit their taste without taking the time to really understand why it was written that way. I always follow two principles: (1) If it isn't broken, don't fix it and (2) open-closed p. (do not change the behaviour of existing code). IMO lots of new bugs would be avoided if programmers took more time to **read** the code before they start rewriting it. – Giorgio Apr 07 '17 at 05:18
  • Basically I see (on interpret) @DocBrown's advice as "Be extra careful with it. Refactor only if necessary and do it by the book". I don't think it should be completely avoided, last thing you want is a nail house[1] of a code just sticking out just because of a "beware of dogs" sign on the fence. That said.... Expect some angry Pitt bulls come running at you as you cross the fence. ( [1]: http://io9.gizmodo.com/unbelievable-nail-houses-around-the-world-892781747) – Newtopian Apr 07 '17 at 13:44
  • @DocBrown If it's a timing issue, the tests may only reveal the bug randomly. I really don't think there's any need to have a debate about whether there's a bug or not when the code does not properly synchronize with other pieces. A heisenbug *is there*; the only question is the probability of it occurring. – jpmc26 Apr 07 '17 at 20:31
  • @Aaron you refrain to downvote because you do not have enough reputation; but I do, so -1 – Billal Begueradj Apr 08 '17 at 07:09
  • Do it on YOUR OWN TIME if its just to please yourself. If its needed for business priorities, then its paid time. Always create a branch clearly labelled as an experiment, so it doesn't affect anyone else. Once you have perfect hindsight (completed what you intended to do) if it works, then discuss with business and agree on recompense to pass it back to them, if it failed it becomes a "self education" cost to yourself, and if its not needed yet, but maybe later then be patient. I do this myself as a consultant but only if I am prepared not to get paid if they don't care as much as I do. – iheggie Apr 09 '17 at 08:08
  • @iheggie no, not on your own time nether for your own pleasure on a businness application. Pleasure and free time might be for personal projects. If you breaks just one thing you will be asked for explanations, the fact that you did it on your own time won't make you escape that. – Walfrat Apr 11 '17 at 11:36
  • @walfrat - re "if it breaks" is why I said "Always create a branch clearly labelled as an experiment, so it doesn't affect anyone else". Re "pleasure" - lets be real - even in technical areas like programming our personal likes and dislikes influence our choices and our opinion irrespective of if we are aware we are doing so or not - we are just not as logical in practise as we like to think. THEREFORE I feel it is better to acknowledge that and deliberately allow an expression of that in such a way that it is ISSOLATED from the main project - both in effect on code and effect on budget. – iheggie Apr 15 '17 at 04:01
  • @walfrat - I didn’t comment in the original that you do discuss this with your boss/client - for most projects personal experiments clearly labelled and isolated would not be a problem if discussed first. It would not be appropriate where projects are subject to significant legislation (eg health care systems), are life critical (medical systems), have tight deployment timing requirements (where the repo size matters), very limited resources (eg embedded systems) or have tight security requirements (eg banks). – iheggie Apr 15 '17 at 04:09
  • In my experience these "`dont touch!`" comments are often overused and not really meant that way. In my code I sometimes have such but I would never think that my code can't be improved. – Penguin9 May 02 '17 at 13:06
  • @RaisingAgent: this can heavily depend on the situation, the specific component, its complexity and the number of other components which depend on it, the person who wrote the component, the teams documentation culture etc. Without knowing the real case, I would be very cautious before giving such a recommendation. – Doc Brown May 02 '17 at 14:35
  • @RaisingAgent Basically "don't touch" without proper explanation (sometimes there is) : "it's over my skill" (and probably the team when it was wrote), which varies greatly depending both on the developers and the experience with the technologies implied itself. – Walfrat May 02 '17 at 15:04
140

Yes, you should refactor the code before you add the other features.

The trouble with comments like these is that they depend on particular circumstances of the environment in which the code base is running. The timeout programmed at a specific point may have been truly necessary when it was programmed.

But there are any number of things that might change this equation: hardware changes, OS changes, unrelated changes in another system component, changes in typical data volumes, you name it. There is no guarantee that in the present day it is still necessary, or that it is still sufficient (the integrity of the thing it was supposed to protect might have been broken for a long time - without proper regression tests you might never notice). This is why programming a fixed delay in order to allow another component to terminate is almost always incorrect and works only accidentally.

In your case, you don't know the original circumstances, and neither can you ask the original authors. (Presumably you also don't have proper regression/integration testing, or you could simply go ahead and let your tests tell you whether you broke something.)

This might look like an argument for not changing anything out of caution; but you say there will have to be major changes anyway, so the danger of upsetting the delicate balance that used to be achieved in this spot is already there. It is much better to upset the apple cart now, when the only thing you're doing is the refactoring, and be certain that if things break it was the refactoring that caused it, than to wait until you are making additional changes simultaneously and never be sure.

Kilian Foth
  • 107,706
  • 45
  • 295
  • 310
  • 47
    Best answer right here; too bad it's an underdog. The accepted answer above is not good advice, especially the emphasis on "doomed." I have spent the last year working on the largest (~10^6LOC just for _my_ section of it), legacy-est, aweful code I have ever seen, and I make it a habit to fix the train wrecks I see when I have the time, even if it is not part of the component I'm working on. Doing this I have found and fixed bugs that the dev team was not even aware existed (though our end users probably were). In fact, I am instructed to. Product is improved as a result. – Aaron Apr 05 '17 at 16:46
  • 10
    I would not call that refactoring, but bug-fixing, though. – Paŭlo Ebermann Apr 05 '17 at 18:23
  • 7
    Improving the design of a codebase is refactoring. If the redesign reveals bugs that can be fixed, that's bug-fixing. My point is that the first often leads to the second, and that the second is often very hard without the first. – Kramii Apr 05 '17 at 18:36
  • 10
    @Aaron you're lucky to have management/developer support in doing that. In most environments the known and unknown bugs caused by poor design and code are accepted as the natural order of things. –  Apr 05 '17 at 19:05
  • @Aaron: you probably misunderstood me - see the edit of my answer. – Doc Brown Apr 06 '17 at 06:08
  • 2
    I'm tempted to upvote, but I can't, since this answer doesn't state that you should first ensure that you have sufficient test coverage before you start refactoring. Refactoring in the dark is one of the worst recipes for disaster. – Erwin Bolwidt Apr 06 '17 at 13:07
  • @ErwinBolwidt a counter-argument: simply testing the existing code as a black-box does not add any understanding. If they knew enough to know how to test any changes, they would probably know enough to know how to correct, refactor or replace the code in question. How can you write tests for something you do not understand? If you understand nothing, sometimes you just jump in and do OJT (on the job training), assuming it is not a system that dispatches ambulances or something. –  Apr 06 '17 at 14:43
  • 5
    @nocomprende I would argue that if you don't understand the system well enough to write some tests for it, you have no business changing how it works. – Patrick M Apr 06 '17 at 14:56
  • @PatrickM I agree that working blind is a bad idea. But if you are blind and you have to move forward, sitting still is not doing your job. My OJT idea included learning about the system, not just mucking with lines of code and crossing your fingers. I have been given large, complex code before with little or no access to the authors, and this *was* my job description: "Learn enough to take over managing this code." What do you do then? Resign? –  Apr 06 '17 at 15:13
  • @nocomprende No. At that point I would learn about the system, by reading the manual, talking to people who use it, or talking to the people who provide end-user support. If that fails, I would watch the system in action, to try to figure out what it's trying to do. Then I can write tests, whether they are automated, unit, integration, or even a test script meant for a human. You have to have some way of knowing how it currently acts, so that you know when stuff changes. Then, when something changes, you can decide if that change was good or bad. – Patrick M Apr 06 '17 at 15:39
  • @PatrickM I am glad we agree. I was also the product support person. It was a one-man team for a new product (lo, these many years ago.) There was a nice blue ring-binder full of info, which was good. I remember throwing the binder away, years after the product was discontinued. Nostalgia. Went on to create mission-critical stuff, until I was let go when the company was bought and the site I worked at was shut down. Pinged my server occasionally for a year after to see it still running, in the 2nd largest site of its kind in North America. Never worked as a programmer again. More nostalgia. –  Apr 06 '17 at 15:50
  • 3
    After raising the issue of refactoring a significant code base before starting new rounds of feature releases, I was told by development management that board members would reply "It don't make money!" to my repeated proposals. (My mgr. didn't support that, but was bound to follow board directives and merely stated probable attitudes.) My difficulty was in knowing both the extensive ( "bigly!") time needed for every tiny bug-fix and similar times for adding features for all future feature-release cycles. It don't make money, true; but it sure costs not to do it. – user2338816 Apr 07 '17 at 05:22
  • 1
    @user2338816 Perhaps a colorful metaphor explaining how much it would slow future progress to not clean things up now? Better roads don't make money, but worse roads slow everything and cost more in vehicle repairs. Programmers need to make use of analogies to communicate, because most people do not understand the issues. –  Apr 07 '17 at 12:26
  • 1
    +1. The OP says that they are "unable to add new features," and code like this is the most likely reason why. They are already upsetting the delicate balance even if they don't refactor. Much better to do it when the fallout is *expected* and budgeted for. – jpmc26 Apr 07 '17 at 20:35
  • @nocomprende Easy enough when you have access to the board (basically isolated venture capitalists for a startup). Harder when needing to communicate by proxy through management who agrees with you and needs no metaphor. The board was really only interested in selling the company, which they did for plenty. – user2338816 Apr 08 '17 at 07:34
  • @user2338816 Right, when it is a business issue (selling a company) software is not really any part of the decision. The company I worked for years back was bought for its 86% market share, and all of the hundred people I worked with were let go, sooner or later. –  Apr 09 '17 at 01:29
  • @nocomprende To clarify, this was a software company, providing network security and system auditing products. The software was a pretty fair sized part of the external interest (regardless of how much the new owners assured us that the "people" were the draw). Still, much of the core had aged and been layered over and over with "New! Improved!" feature sets. Each new one needed more spaghetti strands to integrate. – user2338816 Apr 13 '17 at 04:59
  • @user2338816 the place I worked at was a software company too, it created customized versions of our main product and had a hundred major customer sites with active service contracts. Again, it was bought for the market share and neither the code nor the staff were kept. This was long ago, I have no idea how it all worked out. But business does not really give a rip about software design, even when it is "the product". Business cares about money, and customers who stick. This is part of why I am no longer a programmer. –  Apr 13 '17 at 11:22
  • @user2338816 another thought: if your company was being bought for the software, and you knew that the code was basically end-of-life and needed rewriting, what did you do then? Jump ship, I would imagine, because someone was about to make a bad bargain. If the buyer knew the code needed rewriting, they were not buying the code. If they didn't, they were going to be unhappy soon. –  Apr 13 '17 at 11:31
  • @nocomprende I didn't jump immediately, though 80% of all employees were 'dumped' within 60 days. I stayed for a couple years and lobbied for core upgrades. The year after I finally resigned out of frustration, I was contacted to do the upgrades on contract, but I was too fully involved in other work. – user2338816 Apr 14 '17 at 00:04
60

My question is: should I refactor the code when I encounter such warnings from the authors

No, or at least not yet. You imply that the level of automated testing is very low. You need tests before you can refactor with confidence.

for now we are no longer able to add any feature without breaking something else

Right now you need to focus on increasing stability, not refactoring. You might do refactoring as part of stability increase, but it is a tool to achieve your real goal - a stable codebase.

It sounds like this has become a legacy codebase, so you need to treat it a little differently.

Start by adding characterization tests. Don't worry about any spec, just add a test that asserts on the current behavior. This will help prevent new work from breaking existing features.

Every time you fix a bug, add a test case that proves the bug is fixed. This prevents direct regressions.

When you add a new feature, add at least some basic testing that the new feature works as intended.

Perhaps get a copy of "Working Effectively with Legacy Code"?

I was given a few months to refactor existing code.

Start by getting test coverage up. Start with areas that break the most. Start with areas that change the most. Then once you have identified bad designs, replace them one by one.

You almost never one to do one huge refactor, but instead a constant flow of small refactors every week. One big refactor has a huge risk of breaking things, and it's hard to test well.

Daenyth
  • 8,077
  • 3
  • 31
  • 46
  • 10
    +1 for adding characterization tests. It's pretty much the only way of minimizing regressions when modifying untested legacy code. If the tests start failing once you start making changes you can then check if previous behaviour was actually correct given the spec, or if the new changed behaviour is correct. – Leo Apr 05 '17 at 18:51
  • 2
    Right. Or if there is no usable spec check if the previous behaviour is actually desired by the people paying for or having an interest in the software. – bdsl Apr 06 '17 at 09:10
  • *Perhaps get a copy of "Working Effectively with Legacy Code"?* How many weeks will it take him to read that book? And when: during working hours at workplace, at night when everybody sleeps? – Billal Begueradj Apr 08 '17 at 12:13
23

Remember the G. K. Chesterton's fence: do not take down a fence obstructing a road until you understand why it was built.

You can find the author(s) of the code and the comments in question, and consult them to obtain the understanding. You can look at the commit messages, email threads, or docs if they exist. Then you'll either be able to refactor the marked fragment, or will write down your knowledge in the comments so that the next person to maintain this code could make a more informed decision.

Your aim is to reach an understanding what the code does, and why it was marked with a warning back then, and what would happen if the warning were ignored.

Before that, I'd not touch the code which is marked "do not touch".

9000
  • 24,162
  • 4
  • 51
  • 79
  • 4
    OP says "no, I can't get in touch with authors". – FrustratedWithFormsDesigner Apr 05 '17 at 14:45
  • 5
    I can't get in touch with authors nor there is any documentation. – kukis Apr 05 '17 at 15:12
  • 1
    @kukis: Why can't you get in touch with the authors? – einpoklum Apr 05 '17 at 18:45
  • 21
    @kukis: You can't contact the authors, any domain experts, and can't find any emails / wiki / docs / test cases pertaining to the code in question? Well, then it's a full-blown research project in software archeology, not a simple refactoring task. – 9000 Apr 05 '17 at 18:55
  • 12
    It's not unusual that the code is old and the authors have left. If they've ended up working for a competitor then discussing it may violate someone's NDA. In a few cases the authors are permanently unavailable: you can't ask Phil Katz about PKzip because he died years ago. – pjc50 Apr 06 '17 at 08:57
  • 1
    Sometimes you can do some analysis, scratch your head and move on, then come back to it later. Two nights of sleep has cleared up more things for me than anything else I have ever tried. If your mind is on to the problem, it will find a way even if you are doing other stuff. Remember what happened to Blaise Pascal. –  Apr 06 '17 at 15:23
  • 1
    I think Chesterton's fence is actually the definitive correct answer for this. May I recommend copying the text from [Wikipedia](https://en.wikipedia.org/wiki/Wikipedia:Chesterton's_fence) into the answer for those who are not familiar with that story? I do so love his wording. – Cort Ammon Apr 07 '17 at 00:38
  • 2
    If you replace "finding the author" with "understanding what problem the code solved" then this is the best answer. Sometimes these bits of code are the least horrible solution to bugs that took weeks or months of work to find and track down and deal with, often bugs that normal kinds of unit testing can't find. (Though sometimes they are the result of programmers who didn't know what they're doing. Your first task is to understand which it is) – grahamparks Apr 11 '17 at 18:17
  • @grahamparks: Thank you! Indeed, understanding is the key. The easiest way to obtain the understanding of a cryptic piece is often finding the author, or someone else in the know, and asking. When it is not possible, then research is the way. I've made an update. – 9000 Apr 11 '17 at 18:27
5

Probably not a good idea to refactor code with such warnings, if things work OK as they are.

But if you must refactor...

First, write some unit tests and integration tests to test the conditions that the warnings are warning you about. Try to mimic production-like conditions as much as possible. This means mirroring the production database to a test server, running the same other services on the machine, etc... whatever you can do. Then attempt your refactoring (on an isolated branch, of course). Then run your tests on your new code to see if you can get the same results as with the original. If you can, then it is probably OK to implement your refactoring in production. But be ready to roll back the changes if things go wrong.

FrustratedWithFormsDesigner
  • 46,105
  • 7
  • 126
  • 176
5

Code with comments like the ones you showed would be top on my list of things to refactor if, and only if I have any reason to do so. The point is, that the code stinks so badly, you even smell it through the comments. It's pointless to try to frickle any new functionality into this code, such code needs to die as soon as it needs to change.

Also note that the comments are not helpful in the least: They only give warning and no reason. Yes, they are the warning signs around a tank of sharks. But if you want to do anything within the vicinity, there is little points to try to swim with the sharks. Imho, you should get rid of those sharks first.

That said, you absolutely need good test cases before you can dare to work on such code. Once you have those test cases, be sure to understand every tiny little bit you are changing, to ensure that you are really not changing behavior. Make it your top priority to keep all the behavioral peculiarities of the code until you can prove that they are without any effect. Remember, you are dealing with sharks - you must be very careful.

So, in the case of the time out: Leave it in until you understand precisely what the code is waiting for, and then fix the reason why the time out was introduced before you proceed to remove it.

Also, be sure that your boss understands what an endeavor it is that you are embarking on, and why it is needed. And if they say no, don't do it. You absolutely need their support in this.

  • 1
    Sometimes code ends up being a kludge-up mess because it has to operate correctly on pre-production samples of silicon whose actual behavior doesn't match the documentation. Depending upon the nature of the workarounds, *replacing* the code may be a good idea if code will never need to run on the buggy chips, but changing the code without the level of engineering analysis that would be required of new code is probably not a good idea. – supercat Apr 06 '17 at 19:20
  • @supercat One of my points was that the refactoring must perfectly preserve behavior. If it doesn't preserve behavior, it's more than refactoring in my book (and exceedingly dangerous when dealing with such legacy code). – cmaster - reinstate monica Apr 07 '17 at 06:20
5

I say go ahead and change it. Believe it or not, not every coder is a genius and a warning like this means it could be a spot for improvement. If you find that the author was right, you could (gasp) DOCUMENT or EXPLAIN the reason for the warning.

JES
  • 67
  • 1
4

I am expanding my comments into an answer because I think some aspects of the specific problem are either being overlooked, or used to draw the wrong conclusions.

At this point, the question of whether to refactor is premature (even though it will probably be answered by a specific form of 'yes').

The central issue here is that (as noted in some answers) the comments you quote strongly indicate that the code has race conditions or other concurrency/synchronization problems, such as discussed here. These are particularly difficult problems, for several reasons. Firstly, as you have found, seemingly unrelated changes can trigger the problem (other bugs can also have this effect, but concurrency errors almost always do.) Secondly, they are very hard to diagnose: the bug often manifests itself in a place that is distant in time or code from the cause, and anything you do to diagnose it may cause it to go away (Heisenbugs). Thirdly, concurrency bugs are very hard to find in testing. Partly, that is because of the combinatorial explosion: it is bad enough for sequential code, but adding the possible interleavings of concurrent execution blows it up to the point where the sequential problem becomes insignificant in comparison. In addition, even a good test case may only trigger the problem occasionally - Nancy Leveson calculated that one of the lethal bugs in the Therac 25 occurred in 1 of about 350 runs, but if you do not know what the bug is, or even that there is one, you do not know how many repetitions make an effective test. In addition, only automated testing is feasible at this scale, and it is possible that the test driver imposes subtle timing constraints such that it will never actually trigger the bug (Heisenbugs again).

There are some tools for concurrency testing in some environments, such as Helgrind for code using POSIX pthreads, but we do not know the specifics here. Testing should be supplemented with static analysis (or is that the other way round?), if there are suitable tools for your environment.

To add to the difficulty, compilers (and even the processors, at runtime) are often free to reorganize the code in ways that sometimes make reasoning about its thread-safety very counter-intuitive (perhaps the best-known case is the double-checked lock idiom, though some environments (Java, C++...) have been modified to ameliorate it.)

This code may have one simple problem that is causing all the symptoms, but it is more likely that you have a systemic problem that could bring your plans to add new features to a halt. I hope I have convinced you that you might have a serious problem on your hands, possibly even an existential threat to your product, and the first thing to do is to find out what is happening. If this does reveal concurrency problems, I strongly advise you to fix them first, before you even ask the question of whether you should do more general refactoring, and before you try to add more features.

sdenham
  • 253
  • 1
  • 6
  • 1
    Where do you see "race condition" in the comments? Where is it even implied? You're making an awful lot of technical assumptions here without even the benefit of seeing at the code. – Robert Harvey Apr 07 '17 at 16:22
  • 2
    @RobertHarvey "Time out set to give Module A some time to do stuff." - pretty much the definition of a race condition, and timeouts are not a roust way to handle them. I am not the only one to draw this inference. If there is not a problem here, that's fine, but the questioner needs to know, given that these comments are red flags for poorly-handled synchronization. – sdenham Apr 07 '17 at 17:03
3

I am dealing with a pretty big codebase and I was given a few months to refactor existing code. The refactor process is needed because soon we will need to add many new features to our product [...]

During refactoring, from time to time I encounter the class, method or lines of code which have comments like ["don't touch this!"]

Yes, you should refactor those parts especially. Those warnings were placed there by the previous authors to mean "don't idly tamper with this stuff, it's very intricate and there are a lot of unexpected interactions". As your company plans to further develop the software and add a lot of features, they've specifically tasked you with cleaning up this stuff. So you're not idly tampering with it, you're deliberately charged with the task of cleaning it up.

Find out what those tricky modules are doing and break it down into smaller problems (what the original authors should have done). To get maintainable code out of a mess, the good parts need to be refactored and the bad parts need to be rewritten.

DepressedDaniel
  • 936
  • 5
  • 6
2

This question is another variation on the debate of when/how to refactor and/or cleanup code with a dash of "how to inherit code". We all have different experiences and work in different organizations with different teams and cultures, so there is not any right or wrong answer except "do what you think you need to do, and do it in a way that doesn't get you fired".

I don't think there are many organizations that would take kindly to having a business process fall on its side because the supporting application needed code cleanup or refactoring.

In this specific case, the code comments are raising the flag that changing these sections of code should not be done. So if you proceed, and the business does fall on its side, not only do you not have anything to show to support your actions, there is actually an artifact that works against your decision.

So as is always the case, you should proceed carefully and make changes only after you understand every aspect of what you are about to change, and find ways to test the heck out of it paying very close attention to capacity and performance and timing because of the comments in the code.

But even still, your management needs to understand the risk inherent in what you are doing and agree that what you are doing has a business value that outweighs the risk and that you have done what can be done to mitigate that risk.

Now, let us all get back to our own TODO's and the things we know can be improved in our own code creations if only there were more time.

Thomas Carlisle
  • 1,293
  • 9
  • 11
1

Yes, absolutely. These are clear indications that the person who wrote this code was unsatisfied with the code and likely poked at it until they got it to happen to work. It's possible they didn't understand what the real issues were or, worse, did understand them and were too lazy to fix them.

However, it's a warning that a lot of effort will be needed to fix them and that such fixes will have risks associated with them.

Ideally, you'll be able to figure out what the issue was and fix it properly. For example:

Time out set to give Module A some time to do stuff. If its not timed like this, it will break.

This strongly suggests that Module A doesn't properly indicate when it's ready to be used or when it has finished processing. Perhaps the person who wrote this didn't want to bother fixing Module A or couldn't for some reason. This looks like a disaster waiting to happen because it suggests a timing dependency being dealt with by luck rather than proper sequencing. If I saw this, I would very much want to fix it.

Do not change this. Trust me, you will break things.

This doesn't tell you much. It would depend on what the code was doing. It could mean that it has what appear to be obvious optimizations that, for one reason or another, will actually break the code. For example, a loop might happen to leave a variable at a particular value which another piece of code depends on. Or a variable might be tested in another thread and changing the order of variable updates might break that other code.

I know using setTimeout is not a good practice, but in this case I had to use it.

This looks like an easy one. You should be able to see what setTimeout is doing and perhaps find a better way to do it.

That said, if these kinds of fixes are outside the scope of your refactor, these are indications that trying to refactor inside this code may significantly increase the scope of your effort.

At a minimum, look closely at the code affected and see if you can at least improve the comment to the point that it more clearly explains what the issue is. That might save the next person from facing the same mystery you face.

David Schwartz
  • 4,676
  • 22
  • 26
  • 2
    It is possible that the conditions have changed to the point where the old problems no longer exist at all, and the warning comments have become like the place on old maps that say, "Here be dragons." Dive in and learn what the situation was, or see that it has passed in to history. –  Apr 06 '17 at 14:33
  • 2
    In the real world, some devices don't provide a reliable readiness indication but instead specify a maximum unready time. Reliable and efficient readiness indications are nice, but sometimes one is stuck using things without them. – supercat Apr 06 '17 at 19:12
  • 1
    @supercat Maybe, maybe not. We can't tell from the comment here. So it's worth investigating to, if nothing else, improve the comment with specifics. – David Schwartz Apr 06 '17 at 19:53
  • 1
    @DavidSchwartz: The comment certainly could be more helpful, but it's sometimes unclear how much time a programmer should spend trying to map out all the precise ways in which a device fails to abide by its specification, especially if it's unclear whether the problem is expected to be a temporary or permanent thing. – supercat Apr 06 '17 at 21:00
1

The author of the comments most likely did not fully understand the code themselves. It they actually knew what they were doing, they would have written actually helpful comments (or not introduced the racing conditions in the first place). A comment like "Trust me, you will break things." to me indicates the author tries to change something which caused unexpected errors they didn't fully understand.

The code is probably developed through guessing and trial-and-error without a full understanding of what actually happens.

This means:

  1. It is risky to change the code. It will take time to understand, obviously, and it probably does not follow good design principles and may have obscure effects and dependencies. It is most likely insufficiently tested, and if nobody fully understands what the code does then it will be difficult to write tests to ensure no errors or changes in behavior is introduced. Racing conditions (like the comments allude to) are especially onerous - this is one place where unit tests will not save you.

  2. It is risky not to change the code. The code has a high likelihood of containing obscure bugs and racing conditions. If a critical bug is discovered in the code, or a high-priority business requirement change forces you to change this code on short notice, you are in deep trouble. Now you have all the problems outlined in 1, but under time pressure! Furthermore, such "dark areas" in code have a tendency to spread and infect other parts of code it touches.

A further complication: Unit tests will not save you. Usually the recommended approach to such fix legacy code is to add unit or integration tests first, and then isolate and refactor. But racing conditions cannot be captured by automated testing. The only solution is to sit down and think through the code until you understand it, and then rewrite to avoid the racing conditions.

This mean the task is much more demanding than just routine refactoring. You will have to schedule it as a real development task.

You may be able to encapsulate the affected code as part of regular refactoring, so at least the dangerous code is isolated.

JacquesB
  • 57,310
  • 21
  • 127
  • 176
-1

The question I would ask is why someone wrote, DO NOT EDIT in the first place.

I have worked on a lot of code and some of it is ugly and took a lot of time and effort to get working, within the given constraints at the time.

I bet in this case, this has happened and the person who wrote the comment found that someone changed it and then had to repeat the whole exercise and put it back the way it way, in order to get the thing to work. After this, out of shear frustration, they wrote...DO NOT EDIT.

In other words, I don't want to have to fix this again as I have better things to do with my life.

By saying DO NOT EDIT, is a way of saying that we know everything that we are ever going to know right now and therefore in the future we will never learn anything new.

There should be at least a comment about the reasons for not editing. Like saying "Do Not Touch" or "Do Not Enter". Why not touch the fence? Why not enter? Wouldn't it be better to say, "Electric Fence, Do Not Touch!" or "Land Mines! Do Not Enter!". Then it obvious why, and yet you still can enter, but at least know the consequences before you do.

I also bet the system has no tests around this magic code and therefore no one can confirm that the code will work correctly after any change. Placing characterisation tests around the problem code is always a first step. See "Working with Legacy Code" by Michael Feathers for tips on how to break dependencies and get the code under test, before tackling changing the code.

I think in the end, it is shorted sighted to place restrictions on refactoring and letting the product evolve in a natural and organic way.

BigA
  • 21
  • 1
  • 2
    this doesn't seem to offer anything substantial over points made and explained in prior 9 answers – gnat Apr 07 '17 at 08:06