How do you debug a complex application?

Question

I have an application that is returning the wrong output, when it is run with a particular input choice. I haven't been able to get anywhere near a diagnosis of the fault, despite spending about a day and a half on it. If it is run with Input A then it gets the correct output, if it has Input B then the output should be the same but is incorrect. Input B goes through some slightly different processes to Input A, in an attempt to optimise it, although most of the steps are in common.

I am trying to compare side-by-side an invocation of the program with the two inputs. Its a difficult way to debug as you have two application windows, two visual studio windows, and its easy to get mixed up.

The software is about 1 million lines of code, and some of the code is fiendishly complicated. It has many nested loops and loops that run for a lot of iterations, with very complex variables. It caches a lot of data, so when you find that values in one mode are different from the other you have to work out if the differences are significant, and then its hard to trap where the programs diverge since the values have already been calculated and you have to keep restarting the programs and going back.

I understand there is no silver bullet here but I just wondered if anyone had any tips?

Are you familiar with the appropriate debugger for the programming language? — , Mar 10 '15 at 14:23
You're not going to have much luck as long as you have to look at the entire 1,000,000 lines of code. Any solution is going to involve ruling out large chunks of code where the bug can't be. — Doval, Mar 10 '15 at 14:38
Enable all the warnings in your compiler. Try to use more than one compiler (e.g. both [GCC](http://gcc.gnu.org/) and [Clang](http://clang.llvm.org/) and enable all warnings in both). BTW, you should tell more about your application and your environment (which compiler, which programming language, which OS). Use the `-fsanitize=` options of your compiler, and [valgrind](http://valgrind.org/) if applicable. — Basile Starynkevitch, Mar 10 '15 at 14:39
BTW, spending a few days on a bug is not that much. Some bugs take weeks to be found, if not more.... — Basile Starynkevitch, Mar 10 '15 at 14:44
As an aside, you may wish to read [I've inherited 200K lines of spaghetti code — what now?](http://programmers.stackexchange.com/questions/155488/ive-inherited-200k-lines-of-spaghetti-code-what-now) — , Mar 10 '15 at 14:49

Murph · Accepted Answer · 2015-03-12T09:14:30.947

7

The answer is simple, the execution is complex.

Basically what you're trying to do is identify the point at which the variance occurs i.e. the point at which the actual result starts to move away from the expected result (and if this sounds like something that might show up through coded tests ("unit" and up) then that's not entirely surprising). Similarly you need to identify the point at which the variance stops i.e. where do the paths join such that you've got a result that changes consistently again. Having identified those points you can work to further narrow down the possible locations for the "errors".

The brute force way is adding debug/trace statements, breakpoints and inspection and then running the code.

You improve on this in two ways:

First by reducing the amount of code you have to run - there are diverse means. (As suggested, this means isolating/separating out the problematic code.)
Second by wrapping the stuff in tests (which is probably the best of the diverse means if feasible).

Sadly none of this is straightforward or necessarily quick.

edited Mar 12 '15 at 09:14

answered Mar 11 '15 at 16:40

Murph

7,813
1
28
41

1

This is the correct answer, but try to isolate the area where execution starts to produce the wrong answer and then drill down into it to find the real bug. – gbjbaanb Mar 12 '15 at 09:07
2

I used the Visual Studio debuggers to work out the area in the code where the bug occurred. Then I wrote logging code and compared the log files. That revealed the cause of the error. It took about a week, and according to other people that is not necessarily a long time. – Paul Richards Mar 12 '15 at 14:28
2

@Paul - Thank your the update. It would be nice if others provided resolution to their issues also. From this experience, should you ever design a system in the future, I would hope that logging is considered a critical part of the system, because it is. Most systems I work on, errors occur very seldom and those reporting them have no idea what they did to make them happen. Without logs, we'd be twiddling our thumbs taking wild guesses at what could have caused this vaguely worded anomaly and never knowing if we actually resolved the issue even if we fix a bug that we found. – Dunk Mar 12 '15 at 18:02

score 3 · Answer 2 · answered Mar 10 '15 at 14:36

Its a difficult way to debug as you have two application windows, two visual studio windows, and its easy to get mixed up.

Really? You have two monitors. Put one variant on one screen and the other on the other. If you don't have two monitors, then welcome to the 21st century. They're like $100.

I understand there is no silver bullet here but I just wondered if anyone had any tips?

So it sounds as though you have a overly complex (read: poorly designed/implemented) codebase. And it sounds as though you have no unit tests.

Ideally, I would look to simplify the codebase. Pass in dependencies to loosen coupling. Break big functions into smaller ones. Cut out side effects. There are piles of good practices to do that, but since that is "how to write good code" - I'm not going to go into it in depth. And really, you shouldn't simplify the codebase until you get the bug fixed.

Unit tests though will help. They can isolate the bug, and help you debug just that isolated part rather than the whole complex app. They provide clear input/expected-output pairs. They prevent later users from breaking things. And they will put pressure on you to write good re-usable code. Granted, it might be infeasible to make good unit tests if your code is overly complex.

Good points although I am not the designer of this code and I don't have the resources for wholesale refactoring. There are many unit tests although this particular area is probably not tested as much as it should be. — Paul Richards, Mar 10 '15 at 15:32
While unit tests would certainly be awesome to have, if they don't exist already and you have a million lines of code, I'm not sure there's much point in trying to create them. However, there is this great thing called logging that does wonders for trying to figure out values of variables and the state of the system at various checkpoints in the code. Proficiency in learning to use this feature will probably knock out 999,000 of those lines of code as not being a possible source of the problem in a matter of an hour or less. And that includes writing the logging system to. — Dunk, Mar 10 '15 at 21:52

score 0 · Answer 3 · answered Mar 12 '15 at 12:28

I have not much experience with Visual Studio, but in many IDEs you can keep a Watcher on a variable, and step through the code while keeping an eye on its value. Here is the MSDN documentation on a feature that seems to be equivalent to what I remember from other IDEs.

Input B goes through some slightly different processes to Input A, in an attempt to optimise it, although most of the steps are in common

It seems to me you should put a big, fat breakpoint at step 1 of the first 'slightly different' processes, and then step from that point on when debugging input B. (Compare its value to the value of input A at the previous step, i.e. before those 'slightly different' processes started, so as to be sure that the processing up to that point was the same.)

Another, quite subjective advice: if stepping through so many lines of code is really time-consuming, you may use a 'binary-search' approach once you know which set of code lines are involved (probably from line 1 of the 'slightly different' processing to the last line). Put a breakpoint in the middle of that interval; check if A and B are different; if they are, put a breakpoint in the middle of the interval before that line; if they are not, put a breakpoint in the middle of the interval after that line. If you do not know which lines of code are involved, then stepping ahead from the first point where processing is different is your only option.

Let me remark this is some pragmatic advice, to help solve the problem at hand. The real answer would be "you should have never gotten to the point that you have to step through 1 million lines of code to debug a result". Other answers point to practices to avoid this situation. — logc, Mar 12 '15 at 12:35
The difficulty is that parts of the program are executed multiple times and in many cases the results are correct. So if you debug it in Visual Studio you end up having to add conditional breakpoints (VERY slow) or put debug code in there so you know when the right iteration has been reached. I do use a binary search approach. The difficulty is that many values are cached and stored so you end up having to jump around in the program. Regarding your comment - I can't personally change the working practices of the whole company, I am a small cog in a big wheel. — Paul Richards, Mar 12 '15 at 14:31
@PaulRichards I see your situation is quite complex and you are aware of the tools you have. Sorry I can't offer any more advice -- and I totally understand that the responsibility lies with the team that let the code in such a state, not personally with you! — logc, Mar 12 '15 at 15:12

How do you debug a complex application?

3 Answers3