50

I'm a software intern and I am assigned bugs to fix as well as features to add to the software. When I add features, everything works well. My problem is more with fixing bugs. I'm working on an extremely large codebase (span of millions of lines) with poor documentation (There are no comments, tons of magic numbers, tight coupling between different classes, etc). The only documentation provided is a Word document of 8 or so pages. But I still feel like I can do a better job with bug fixing.

I'll give an example of a situation I've encountered, which I feel like I could have done better in:

  • I was assigned a bug to fix regarding extents calculations for a specific type of object (computer graphics)
  • I found the issue with the bug, and why it was caused. Because the program populated a data structure (contiguous array) with memory representing a 3D cartesian point it seemingly should not have (Therefore this point would be used in extent calculations).
  • The bug was indeed caused by this. HOWEVER, another piece of code (in a different class) used pointer arithmetic to get this point and use it for another feature, to make the mouse snap near this point when a certain feature in the software was enabled. However, since I removed the point, I fixed the bug I was assigned yet caused another bug in the software.

What can I do to not have things like this occur? How can I improve? Is there some process I'm missing?

Jason
  • 371
  • 3
  • 7
  • 40
    The "10 out of 100 bugs fixed, 117 to go problem". If your human, you cannot eliminate it. regression tests, unit tests etc are needed for this reason. – mattnz Sep 14 '17 at 03:31
  • @mattnz Perhaps the problem occurs because I'm too hasty? If I invested more time, I could have probably spotted this consequence. But then questions arise like how long exactly should one spend doing this? I don't want my supervisor to think I'm being careless, but I also don't want to waste company time. What is better? – Jason Sep 14 '17 at 03:34
  • 3
    This is the exact use case I use when lobbying for unit testing and TDD. You can do TDD when refactoring, and it is the only way I feel confident making changes to legacy code. – bakoyaro Sep 14 '17 at 13:46
  • 7
    @mattnz I prefer the song, "99 cute little bugs in the code, 99 bugs in the code, take one down, patch it around, 137 bugs in the code." – Tristan Sep 14 '17 at 14:21
  • You did good, just continue doing it. The behaviour you describe sounds like the second feature (mouse snapping) relied on essentially buggy behaviour. See also: old versions of MacOS and apps, and more recently CSS styling for MS Internet Explorer. – KlaymenDK Sep 14 '17 at 14:40
  • 2
    @Jason As a general rule in software development and in life, when I ask myself "How do I..." and the answer is "be smarter" or "be more perceptive" or "have more willpower" or "be less forgetful", the most productive follow-up is usually: "Okay, but if I can't do that, how can I achieve the goal anyway?" In software development, the best answer is very rarely: "Okay, *next* time I'll be more careful". Examining the underlying cause of *why* working with your codebase requires superhuman levels of conscientiousness is likely to be much more helpful. – Ben Aaronson Sep 14 '17 at 16:56
  • 1
    If there was some way of fixing bugs while guaranteeing not to introduce new ones, all software would be bug-free. You'd just start with an empty file and fix all the bugs until you had a working word processor/compiler/video game/whatever it was supposed to be. – David Richerby Sep 14 '17 at 17:26
  • [Obligatory](https://vangogh.teespring.com/v3/image/HUSQXxzDCkAkbqn64tW9CrPX34c/480/560.jpg) – Darren H Sep 14 '17 at 20:29
  • I did an internship where I was fixing a large code base against a 250+ page requirements word document. Step 1: Test it against the requirements. Step 2: Write bugs and prioritize. Step 3: Fix the most pressing bug. Step 4: return to step 1. I spent a lot of time reducing code duplication... some modules ended up around 10% their original length while I _added_ functionality. – TemporalWolf Sep 14 '17 at 22:41
  • I can't add an answer, so here's a comment: Tests are great. Documentation and best practices are great. You're an intern. If you wrote tests for EVERYTHING you wouldn't have time to singlehandedly change the years of company culture that's also required. :| You may want to look into gold master testing. It sounds like you're working with a GUI, so this will be harder. But if there's a way to automatically call a bunch of functions, record the output, and then do it again after fixing something then you might be able to catch some unintended changes. – James S Sep 19 '17 at 18:37

7 Answers7

74

You cannot be solely responsible for these kinds of defects. You are human, and it is impossible to think about large systems as a whole.

To help prevent "regression bugs" -- bugs that are unintentionally created when modifying the system, you can do the following:

  • Develop a comprehensive suite of automated regression tests. Hopefully your system has a large suite of tests. Every time a bug is discovered in your system, you should write an automated test that reproduces the bug. Then fix the bug. If the system ever regresses, your "regression test" will break and notify the developer.
  • Inspect the code. Whenever you fix a bug, it's a good idea to inspect the usages of the function you've changed to try to identify any breaking changes you've introduced
  • Write more tests. If when inspecting the code, you identify code that is not covered by any automated tests, this is a good opportunity to add some.
  • Familiarize yourself with the system, and work with software a long time. Practice makes perfect. Nobody expects a developer to not introduce bugs into a brand new system or when they're an intern. We've all done it.
Samuel
  • 9,137
  • 1
  • 25
  • 42
  • 1
    The system has no test suite unfortunately, but we have a QA team. – Jason Sep 14 '17 at 03:43
  • 14
    @Jason - QA or Test team? If they did QA, you would have a test suit. – mattnz Sep 14 '17 at 03:47
  • 35
    I understand in your position it would be difficult to sway company status quo, but not having automated tests is unacceptable. IMO developers can't have any confidence in anything they write if they're relying on a human tester (this question is evidential of that fact). And it's intractable for a human to reliably regression test software. – Samuel Sep 14 '17 at 03:48
  • 8
    A maintainer is not responsible *at all* for these kinds of defects. Responsibility falls on the idiots who wrote tightly coupled code, and the idiots in suits who let them. – Kevin Krumwiede Sep 14 '17 at 06:52
  • 16
    Given OP's description about the level of coupling and other details, I'm willing to bet that being able to write automated test cases for said code base will require significant refactoring in and of itself b/c it was not developed with automated testing in mind, and will be extremely difficult to do otherwise. – code_dredd Sep 14 '17 at 07:47
  • 10
    @KevinKrumwiede it's tempting to think 'idiots', but what happens is you're told to throw together a MVP don't worry about tests just ship it. Then it's turned into a product. Then the engineers who wrote it leave, along with the managers who had them write it. Then the next group of engineers who don't understand the code get tasked with adding features by a new group of managers, no time for tests. Rinse and repeat to 20 mil LoC. At each step the managers and engineers responsible did arguably the right thing *at the time* (for the business need). Only in retrospect is it irresponsible. – Jared Smith Sep 14 '17 at 13:26
  • 1
    @Jason Manual regression testing is ultimately infeasible because - unlike the size of the feature development task - the size of the regression testing task grows as the codebase grows. You need a suite of regression tests. The only way to achieve this when you don't already have on is "gradually." – Ant P Sep 14 '17 at 14:32
  • 1
    @Samuel To further your "human tester" point: Most human testers would pull their hair out if they had to test simple, stupid things that could go into an automated unit test. What a waste of a brain! – corsiKa Sep 14 '17 at 19:48
10

You cannot eliminate this in any full sense. Here are some steps to lessen the impact and lessen the likelihood:

  1. Add a unit test and regression test for the functionality or bug you are changing or fixing. If you don't have any testing framework or harness set up for your project, set it up so that its part of your process. This will improve things over time.
  2. Improve your code design so that code is more loosely coupled. That means following the SRP even it means sacrificing some DRY. Particularly, in OOP systems your code and data should be in the same class and the data should always be private. If this doesn't make sense, there is an impedance mismatch between the problem domain and the design. It should not be possible for a change in data structure in one class to affect the behavior of another class.
  3. Talk to your coworkers about your proposed fix. This could take the form of code reviews but sometimes that won't catch the regressions. Code reviews aren't usually for catching bugs. As a corollary, make sure the rest of the team knows about the changes you're making and give them time to give you feedback on what other parts of the system might rely on the code. Do this even if you're the senior person on the team.

Hope this helps.

RibaldEddie
  • 3,168
  • 1
  • 15
  • 17
9

There are many good suggestions in other answers. I would add to them: your organization probably has a problem at the management level.

  • Management may be prioritizing new feature work over addressing debt. The bug you mentioned is evidence that past poor practices are making it more expensive to develop bug-free software today. Collect data points like these to inform management that there is an ongoing problem caused by unaddressed debt, and that it would be wise to schedule a "quality milestone" where there are no new features added, but existing debt is decreased.

  • Management is likely ignorant of the scope of the problem. Static analysis tools that find real bugs are expensive, but they are a lot cheaper than failing in the marketplace because you shipped software that loses user data. A good static analyzer will tell you that you have tens of thousands of unaddressed bugs, where they are, and how to fix them. This lets you get a handle on the size of the problem, and how fast you're mitigating it.

Eric Lippert
  • 45,799
  • 22
  • 87
  • 126
5

In the matter of reforming things, as distinct from deforming them, there is one plain and simple principle; a principle which will probably be called a paradox. There exists in such a case a certain institution or law; let us say, for the sake of simplicity, a fence or gate erected across a road. The more modern type of reformer goes gaily up to it and says, “I don’t see the use of this; let us clear it away." To which the more intelligent type of reformer will do well to answer: "If you don’t see the use of it, I certainly won’t let you clear it away. Go away and think. Then, when you can come back and tell me that you do see the use of it, I may allow you to destroy it."

-- G.K. Chesterson

As well as all the other ideas raised here, another valuable thing to do is to find out how and why a bug was introduced. Assuming that your team uses Git or a similar source control system, tools like git blame or GitHub's Blame button can help with this. Once you've identified the bad line or lines of code that caused the bug, use your source control tools to find out when it was added, by whom, and as part of what broader change.

Whenever I fix a bug, I try to write up on the organisation's bug tracker a narrative of how I think the bug came to exist. Sometimes that's a silly and slightly embarrassing story like "It looks like Bob commented out line 96 for testing purposes, accidentally committed it in commit 5ab314c, and it got merged because we were rushing to get the Frobnicator deployed and weren't reviewing stuff properly that week." Still, learning that is good, at the point that you're solving the bug, because it reassures you that there's probably no good reason for the broken code to be the way it is and lets you fix it with more confidence. But sometimes you delve into the git blame, and you find that the "obviously broken" line of code you were about to change was introduced in a commit that purports to fix some other bug you hadn't thought of, and suddenly you realise that your "fix" is going to do some damage and that you need to do something cleverer.

Of course, as many other answers here note, it would be nice if the previous developer had left behind a test that would fail if you naively reintroduced an old bug - a kind of automated warning that the fence you're about to tear down has a purpose that you can't see. But you can't control how many tests your codebase's previous developers wrote, because that's the past; delving into the code's history is one of the few techniques that are available to you at the moment of fixing an existing bug to reduce your risk of breaking things.

Mark Amery
  • 1,241
  • 13
  • 27
  • 1
    +1 just for the Chesterton quote :) I wish I could upvote more than once. – David Sep 14 '17 at 16:13
  • 1
    The Chesterton quote contains wisdom. The rant about the wonderful features of git does not, since most of these bugs probably predate the existence of git by decades. Even in "high-tech" industries like aerospace, it's not unknown for code that was *first written more than 50 years ago* to still be in use. After all, the laws of physics haven't changed in 50 years, so neither has the technical requirements of the code itself. But porting it through many different hardware and software systems doesn't magically eradicate the 50-year-old bugs and features that are still hiding in there! – alephzero Sep 14 '17 at 16:56
2

I also work on a fragile legacy application. My number one goal for any change is "don't break anything that was previously working". (Number two goal is that the user must be able to finish their job, even if it gets a little clunky or you have to invoke some workaround, they can't get absolutely stuck at any point.)

  1. There is a book, Working Effectively with Legacy Code by Michael Feathers. It's quite useful, but I will warn you that it's 100% about adding automated testing. It is technically always possible (and he explains how to do it for a lot of tricky situations) but it's not always workable from the business perspective. The book is also useful because it has amusing little sidebars about how bad things can get.
  2. Tedious manual inspection of everything touching the point of change. If you changed two functions, foo and bar, do a full text search across your entire codebase (you do have it ALL checked out, right?) for the words foo and bar. (Note, you will want to become a power user of some search tool such as grep.) It is important not to try and simplify this search...don't just search in Java files, because your function may be called using reflection from something kicked off in some front end JavaScript...just an example. Keep back-tracking until you've fully tested everything that calls your change.
  3. Reverse your search technique and try to find points in your application doing the same thing without calling your point of change. Search for copied-and-pasted code or same-thing-done-two-ways and make sure you've changed all points that needed to be changed. (Then repeat the previous step.)

It does sound kind of horrible, but these difficult to maintain legacy applications are usually only still around and being updated because they're doing something important and someone really needs them to work. Try to get some satisfaction out of this.

user3067860
  • 773
  • 5
  • 7
1

"another piece of code (in a different class) used pointer arithmetic to get this point and use it for another feature, to make the mouse snap near this point when a certain feature in the software was enabled."

Yeah, you've been handed a high-maintenance codebase. It sounds like in this case rather than regular coupling between objects with obvious interfaces there was some "magic" going on, which of course stopped working as soon as it was changed.

There is no magic trick to this. A lot of software development techniques focus on managing complexity in order to make it possible to spot what the impact of a change will be - but these have not been followed here. This is the fault of the previous developers and the seniors overseeing the project.

pjc50
  • 10,595
  • 1
  • 26
  • 29
0

You might refactor the code to hide the direct access to that specific member and add an accessor. This will break any code that relies on direct access to the member, so you'll know every place that uses it.

As a bonus, relocate the member and fill its previous place with garbage - this will break code that accesses it via pointer arithmetic.

As a developer, you want your code to break early, break often. "Works 99.99% or the time" is much worse for bug fixing than "doesn't work at all". (of course, for production code, you don't want the "break early, break often")

im_chc
  • 103
  • 2