35

Bugs creeping into code can be minimized, but not entirely eliminated as it is written - programmers are, although many would disagree, only humans.

When we do detect an error in our code, what can we do to weed it out? How should we approach it to make most effective use of our valuable time and enable us to spend less time trying to find it and more time coding? Also, what should we avoid when debugging?

Note here that we're not talking about preventing bugs; we're talking about what to do when bugs do appear. This is a wide field, I know, and may be highly dependent on language, platform and tools. If so, keep to encompassing answers such as mindsets and general methods.

gablin
  • 17,377
  • 22
  • 89
  • 138
  • Linked question has been removed. –  Jul 25 '14 at 05:02
  • 1
    I think the approach is actually simple. If you developed it alone, you know everything about it. You may even fix the bug without debugging. With that in mind, the best way is to use your time coding something else, until someone that knows a lot about it can answer your question on how to fix it; or, let it rest, code other things, until an idea to fix it come to your mind, so you wont loose time neither energy. My guess is your question is about enterprise team management. – Aquarius Power Nov 18 '14 at 18:38
  • I think Raid. Off-the-shelf, bug killing spray. Is this a philosophical question? Books are made from the mere preponderance... – ejbytes Jul 06 '16 at 11:49

8 Answers8

38

The mindset and attitude to debugging is perhaps the most important part, because it determines how effectively you'll fix the error, and what you'll learn from it — if anything.

Classics on software development like The Pragmatic Programmer and Code Complete basically argue for the same approach: every error is an opportunity to learn, almost always about yourself (because only beginners blame the compiler/computer first).

So treat it as a mystery which will be interesting to crack. And cracking that mystery should be done systematically, by expressing our assumptions (to ourselves, or to others) and then testing our assumptions, one-by-one if need be — using every tool at our disposal, especially debuggers and automated test frameworks. Then after the mystery is solved, you can do even better by looking through all your code for similar errors you may have made; and write an automated test to ensure the error will not happen unknowingly again.

One last note - I prefer to call errors "errors" and not "bugs" - Dijkstra chided his colleagues for using the latter term because it's dishonest, supporting the idea that pernicious and fickle bug-fairies planted bugs in our programs while we weren't looking, instead of being there because of our own (sloppy) thinking: http://www.cs.utexas.edu/users/EWD/transcriptions/EWD10xx/EWD1036.html

We could, for instance, begin with cleaning up our language by no longer calling a bug a bug but by calling it an error. It is much more honest because it squarely puts the blame where it belongs, viz. with the programmer who made the error. The animistic metaphor of the bug that maliciously sneaked in while the programmer was not looking is intellectually dishonest as it disguises that the error is the programmer's own creation. The nice thing of this simple change of vocabulary is that it has such a profound effect: while, before, a program with only one bug used to be "almost correct", afterwards a program with an error is just "wrong" (because in error).

limist
  • 4,636
  • 25
  • 22
  • 7
    Actually I like the term "error" rather than "bug", not because it puts the blame on "the programmer who made the error", but because it makes it clear that it might *not* have been the programmer at fault. To me, "bug" implies error in the code; whereas "error" implies error *somewhere*. Maybe in the code, maybe in the environment setup, maybe in the requirements. Drives me nuts when my boss has a "bug list" where half the issues are requirements changes. Call it a task list, ferchrissakes! – Carson63000 Oct 09 '10 at 20:36
  • 2
    +1 for "every error is an opportunity to learn, almost always about yourself (because only beginners blame the compiler/computer first)" – Md Mahbubur Rahman Apr 07 '13 at 02:32
  • You are aware of the history of the term "bug", right? I mean, as used in software development. Of course, we don't have this problem today, but a bug actually did fly into the hardware of a computer unnoticed by the programmer and caused an issue. Lest someone thinks to correct me, I know that Edison used this term long before the moth incident, which is why I used the word 'history', not 'origin'. See http://www.computerworld.com/article/2515435/app-development/moth-in-the-machine--debugging-the-origins-of--bug-.html and https://en.wikipedia.org/wiki/Software_bug#Etymology – threed Feb 12 '16 at 18:58
  • 1
    @threed Of course. But for quite some time, insects have not caused the vast majority of software errors. – limist Feb 13 '16 at 09:06
18
  1. Write tests. Testing is not only great at preventing bugs (in my experience, TDD done right eliminates almost all trivial, stupid bugs), but also helps a lot in debugging. Testing forces your design to be rather modular, which makes isolating and replicating the problem a lot easier. Also, you control the environment, so there will be a lot less surprises. Moreover, once you get a failing test case, you can be reasonably sure that you've nailed the real reason of the behavior that is bothering you.

  2. Learn how to use a debugger. print statements may work reasonably well at some level, but a debugger most of the time is very helpful (and once you know how to use it, it is a lot more comfortable than print statements).

  3. Talk about someone about your problem, even if it's just a rubber duckie. Forcing yourself to express the problem you are working on in words really does miracles.

  4. Give yourself a time limit. If for example after 45 minutes you feel you are going nowhere, just switch to other tasks for some time. When you get back to your bug, you'll hopefully be able to see other possible solutions that you wouldn't have have considered before.

MasterMastic
  • 268
  • 3
  • 13
Ryszard Szopa
  • 1,810
  • 12
  • 11
  • 2
    +1 for "Forcing yourself to express the problem you are working on in words really does miracles." – Md Mahbubur Rahman Apr 07 '13 at 02:34
  • And to add to (1), almost every bug that you see in the code implies that there's a bug - or at least an omission - in the test suite. Fix both at the same time and not only do you prove you've fixed the problem at hand, you're safe against it being reintroduced. – Julia Hayward Jan 22 '14 at 09:15
5

I like most of the other answers, but here are some tips on what to do BEFORE you do any of that. Will save you beaucoup de time.

  1. Determine if there really is a bug. A bug is ALWAYS a difference between system behavior and requirements; the tester ought to be able to articulate expected and actual behavior. If he is unable to provide support for the expected behavior, there is no requirement and there is no bug-- just someone's opinion. Send it back.

  2. Consider the possibility that the expected behavior is wrong. This could be due to a misinterpretation of the requirement. It could also be due to a defect in the requirement itself (a delta between a detailed requirement and a business requirement). You can send these back too.

  3. Isolate the problem. Only experience will teach you the fastest way to do this-- some people can almost do it with their gut. One basic approach is to vary one thing while keeping all other things constant (does the problem occur on other environments? with other browsers? in a different test region? at different times of the day?) Another approach is to look at stack dumps or error messages-- sometimes you can tell just by the way it is formatted which component of the system threw the original error (e.g. if it's in German you can blame that third party you work with in Berlin).

  4. If you have narrowed it down to two systems that collaborate, inspect the messages between the two systems via traffic monitor or log files, and determine which system is behaving to spec and which one is not. If there are more than two systems in the scenario, you can perform pairwise checks and work your way "down" the application stack.

  5. The reason why isolating the problem is so critical is that the problem may not be due to a defect in code that you have control over (e.g. third party systems or the environment) and you want to get that party to take over as quickly as possible. This is both to save you work and to get them on point immediately so that resolution can be achieved in as short a time frame as possible. You don't want to work on an issue for ten days only to find it's really an issue with someone else's web service.

  6. If you have determined that there really is a defect and it really is in code that you control, you can further isolate the problem by looking for the last "known good" build and inspecting source control logs for changes that may have caused the issue. This can save a lot of time.

  7. If you can't figure it out from source control, now is the time to attach your debugger and step through the code to figure it out. Chances are by now you have a pretty good idea of the problem anyway.

Once you know where the bug is and can think of a fix, here's a good procedure for fixing it:

  1. Write a unit test that reproduces the problem and fails.

  2. Without modifying the unit test, make it pass (by modifying application code).

  3. Keep the unit test in your test suite to prevent/detect regression.

John Wu
  • 26,032
  • 10
  • 63
  • 84
3

There is an excellent book I read on this subject called Why Programs Fail, which outlines various strategies for finding bugs ranging from applying the scientific method to isolate and resolve a bug, to delta debugging. The other interesting part of this book is that it does away with term 'bug'. Zeller's approach is:

(1) A programmer creates a defect in the code. (2) The defect causes an infection (3) The infection propagates (4) The infection causes a failure.

If you want to improve your debugging skills, I highly recommend this book.

In my own personal experience, I've found plenty of bugs in our application, but management simply presses us onwards to get new features out. I've frequently heard "We found this bug ourselves and the client hasn't noticed it yet, so just leave it until they do". I think being reactive opposed to proactive in fixing bugs is a very bad idea as when the time comes to actually put a fix in, you've got other issues that need resolved and more features management want out the door ASAP, so you get caught in a vicious cycle that can lead to a great deal of stress and burn out and ultimately, a defect ridden system.

Communication is also another factor when bugs are found. Sending an email out or documenting it on the bug tracker is all fine and well, but in my own experience, other developers find a similar bug and rather than reuse the solution you put to fix the code (as they've forgotten all about it), they add their own versions, so you've got 5 different solutions in your code and it looks more bloated and confusing as a result. So, when you do fix a bug, make sure a few people review the fix and give you feedback in case they have fixed something similar and found a good strategy to dealing with it.

limist mentioned the book, The Pragmatic Programmer which has some interesting material on fixing bugs. Using the example I gave in the previous paragraph, I'd look at this: Software Entrophy, where the analogy of a broken widow is used. If two many broken windows appear, your team may become apathetic towards ever fixing it unless you take a proactive stance.

Desolate Planet
  • 6,038
  • 3
  • 29
  • 38
  • I've heard "We found this bug ourselves and the client hasn't noticed it yet, so just leave it until they do" too many times as well. And having gone on site visits, often the client **has** noticed, but hasn't reported it. Sometimes because they think there's no point because it won't be fixed, sometimes because they are already looking at a competitor's replacement, and sometimes (rightly or wrongly) "well, it's all a steaming pile of crap anyway". – Julia Hayward Jan 22 '14 at 09:19
  • @JuliaHayward - This is very often the case, but in your situation, your clients may be satisfied with the functionality and not be too concerned with what's going on under the hood. The problem starts to surface when the client comes back asking for extra features and you need to add another enhancements such as making your app multilingual, mobile compliant blah blah, you start to look at what you have and see all the cracks in the wall. – Desolate Planet Jan 22 '14 at 18:41
  • Just shows you, all the books in the world on software design, testing and good communication and a lot of the products you work on are a sprawling mess. Despite knowing what's right, stress and unrealistic deadlines (in the face on your already messed up code) are the reasons behind why the code is in the state it is. I don't have any answers to it myself, I'm pretty distinguished in the office as a moaning face ****** as I kick and scream to keep code healthy and the development process smooth, but sometimes the team doesn't bond well together. – Desolate Planet Jan 22 '14 at 18:46
3

I think the reproduction of a bug is also important. All cases which reproduce the bug can be listed and then you can make sure that your bug fix covers all those cases.

aslı
  • 757
  • 1
  • 5
  • 11
3

Bug, error, problem, defect - whatever you want to call it, it doesn't make much difference. I'll stick to problem since that's what I'm used to.

  1. Figure out what the perception of the problem is: translate from a customer's 'Bob is still not in the system' to 'When I try to create a user record for Bob, it fails with a duplicate key exception, although Bob isn't already in there'
  2. Figure out if it's really a problem or just a misunderstanding (indeed, Bob isn't in there, there is nobody called bob, and insert should work).
  3. Try to get minimal reliable steps you can follow to reproduce the problem - something like 'Given a system with a user record 'Bruce', when a user record 'Bob' is inserted, then an exception occurs'
  4. This is your test - if possible, put it in an automated test harness that you can run again and again, this will be invaluable when debugging. You can also make it part of your test suite to ensure that that particular problem doesn't reappear later on.
  5. Get your debugger out and start putting breakpoints - figure out the code path when you run your test, and identify what's wrong. While you do that, you can also refine your test by making it as narrow as possible - ideally a unit test.
  6. Fix it - verify your test passes.
  7. Verify the original problem as described by the customer is also fixed (very important - you might just have fixed a subset of the problem). Verify you didn't introduce regressions in other aspects of the program.

If you're very familiar with the code, or if the problem or fix is obvious, you can skip some of those steps.

How should we approach it to make most effective use of our valuable time and enable us to spend less time trying to find it and more time coding?

I take issue with that, as it implies that writing new code is move valuable than having a high quality working program. There is nothing wrong with being as effective as possible at fixing problems, but a program doesn't necessarily get better by just adding more code to it.

ptyx
  • 5,851
  • 2
  • 22
  • 21
1

When we do detect an error in our code, what can we do to weed it out? How should we approach it to make most effective use of our valuable time and enable us to spend less time trying to find it and more time coding? Also, what should we avoid when debugging?

Assuming that you are in a production environment, here is what you need to do:

  1. Describe the "error" correctly, and identify the events that cause it to happen.

  2. Determine if the "error" is a code error or specification error. For example, entering a 1 letter name may be considered an error to some systems but acceptable behavior for other systems. Sometimes a user would report an error that he/she thinks is a problem but the user's expectation for the behavior of the system was not part of the requirements.

  3. If you have proved that there in an error and the error is due to the code, then you can determine which code pieces need to be fixed to prevent the error. Also examine the effect of the behavior on current data and future system operations (impact analysis on code and data).

  4. At this point you would probably have an estimate of how much resources are going to be consumed to fix the bug. You can either fix it right away or schedule a fix within an upcoming release of the software. This depends also on the whether the end user is willing to pay for the fix. You should also evaluate different available options to fix the error. There may be more than one way. You need to select the approach that best suits the situation.

  5. Analyze the reasons that caused this bug to appear (requirements, coding, testing, etc.). Enforce processes that would prevent the condition from happening again.

  6. Document the episode adequately.

  7. Release the fix (or the new version)

NoChance
  • 12,412
  • 1
  • 22
  • 39
1

Here's how I do it:

  1. use the same method every time to find the problem. This will improve your reaction time to the errors.
  2. Best way is probably reading the code. This is because all the information is available in the code. You just need efficient ways to find correct position and ability to understand all the details.
  3. debugging is very slow way, and only necessary if your programmers do not yet understand how computer executes asm instructions/cannot understand call stacks and basic stuff
  4. Try to develop proof techniques like using function prototypes to reason about behaviour of the program. This will help finding the correct position faster
tp1
  • 1,902
  • 11
  • 10