Understanding already existing complex code base

Question

Possible Duplicate:
What is the most effective way to add functionality to unfamiliar, structurally unsound code?

Till now, all I have worked on is with Java projects that I build from scratch (mostly course projects and some hobby stuff). But now, I have come across a huge code base of about 46000 lines spread across around 200 classes. Additionally there are around 10 dependent libraries.

The problem is not only that I have never worked with someone else's code before, but I have not worked with such a huge code-base before too.

I am asked to completely understand the code-base and suggest improvements to the existing design. Now, I am kind of stuck. I start with some part of the code and then by the time I reach some other part of the code, I have lost the working details of the previous part.

I could really use your suggestions and experiences to deal with this issue. How do I proceed/document the details of the classes to understand the code better?

Edit: This is a university level research project. Students have developed it over couple of years and the bad part: The students who wrote this have already graduated. The existing documentation (only in the form of Javadocs, not UML stuff) is a lot but not helpful in terms of understanding the application design.

Is there any documentation already? Can you speak to the guy/guys that wrote it originally? — rks, Apr 29 '12 at 08:28
Its actually a research project in a department at my university. So, the guys who wrote it earlier were students. Now, the code is written in a fashion so as to just work somehow. The documentation is there but there are gaps and I would say that it is not of best quality — Ankit, Apr 29 '12 at 16:24
Even if it is an elephant, you can still eat it one bite at a time. — , Apr 29 '12 at 17:59
Sorry to tell you , 46KLOC is a small program in the commercial world. Many of us here work on programs int the order of millions of SLOC. — mattnz, Apr 29 '12 at 20:24
I think you should check out Michael Feathers' [Working Effectively With Legacy Code.](http://www.amazon.com/gp/aw/d/0131177052) It had a wealth of techniques for dealing with unknown and poor code bases. — Christian Horsdal, Apr 29 '12 at 18:06
For now, you could use mind mapping tools but have to find a way to do it fast as it's time-consuming but golden and fun. That's why a visual programming environment that follows the fellow of the code is the only way to go in future with the help of AI. That will be a tireless game to joy. — Eftekhari, Dec 26 '18 at 22:00

Oleksi · Answer 1 · 2012-04-29T18:14:23.693

Working with the code is the best way to learn it. There is no "express" way to learn a huge code base, but there are a few things you can do.

You can try modifying the code to add small features and to refactor and learn the code base in that way. Try focusing on small, localized sections of the code base, trying to learn it in little pieces.

If there are unit tests, you can try studying them to get a better insight into how the code is meant to work. If there aren't tests, writing them can be a fantastic way to understand parts of the code. By definition, unit tests are supposed to only test one unit, so you can focus on just that unit, mocking away any other dependencies. This will let you focus and learn one unit of code at a time, until you know more and more of the code base.

Another great technique is running the code in a debugger and stepping through the whole execution of some use case. This will give you a good view of how the system responds and executes some feature.

Don't expect this to happen overnight. It'll take some time to understand the intricacies of the code base, but writing/understanding unit tests is a fantastic way to learn them.

+1, there is no short-cut with this stuff. I would add that it is very important to have clear goals in mind. To "Completely understand and then 'improve' the design" is a very general and hard-to-define goal. It should be broken down into tangible problems/bugs/feature-requests. — Angelo, Apr 29 '12 at 13:14
+1 for the tests part.. writing tests is an efficient way to simulate and understand how the application/system flows.. — Zuko, Jan 15 '15 at 07:10

score 18 · Answer 2 · answered Apr 29 '12 at 09:03

18

I have to do lot of code reviews and it includes larges and complex code base.

I try to build a mind map of the component when it's possible.

If I have to work or contribute on the project, something very effective I do is adding (or changing) comments in area that were not clear at the first place.

Finally, a simple "architecture" overview of the project must be build for future code reviewers. The mind map will be very helpful for this step.

answered Apr 29 '12 at 09:03

4

+1 Mindmaps. I use the following: https://bubbl.us/ – CodeART Apr 29 '12 at 09:18
2

@CodeWorks: thanks for the link! I knew a few of online mind mapping software, but none were actually really usable. – Apr 29 '12 at 10:57
This one does the job for me and it's free. You can also export mind maps :) – CodeART Apr 29 '12 at 11:11
I recommend FreePlane as it's open source and native – HaveAGuess Oct 26 '12 at 23:13
It may not be pretty but you can also just use Excel for your diagrams. You can also do quick calculations if elements are numerical, such as if nodes and edges have values in them. – ximiki Feb 11 '20 at 20:39

CodeART · Answer 3 · 2012-04-29T09:17:06.740

I was in a similar position after I finished first year at university. Based on that experience I think that you bit more than you can chew on.

The following might make your life easier, but you'll need to put in a lot of work.

Gain complete understanding of your problem domain. Talk to the business and model application's features with use case diagrams or something similar. I assume you have unlimited time and resources to do that.
You need to break down your problem into smaller tasks. Set yourself a goal and work towards the goal. E.g. I want to understand authentication process. Find the code that you think does that and start working through that. If you'll get hang up on entire system, you will be very unlikely to get anywhere in my opinion.
Get contacts of people who have worked on this previously. If they are still in the same company, then work with them after you get familiarised with the system.
Going through code, documenting what it does will take you a very long time. I doubt that you can afford that.
Deploy system in a fresh environment. Hopefully it'l break and you'll have to debug lots of code to see what's happening.
Assuming you have the use cases, start to debug through the codebase.
Make sure you are clear on why business wants you to review the codebase. Surely it's not just an expensive and pointless exercise, so they must have a good reason. This should give you a good idea where to look.
There are tools that analyse codebase and tell you whether it's tightly coupled and how much duplicate code is there. This might be useful, but not at the start of your project.
Check whether there are any unit tests. Start running those to see what the system is capable of doing.

Things to keep in mind:

It's not an easy task, so don't beat up yourself.
There will be a lot you won't understand, so don't hang up on those things and keep moving forward.
This is normally quite de-motivational, therefore it's important you take a note of your progress. I normally have this on a whiteboard in front of me.
Don't hang up on big numbers. Yes, you can use that in self-defence and 200 classes probably equals to quite a few responsibilities, but this doesn't make your life any easier, so lets look for solutions :)

Helpful comments on motivation - 'this is not an easy task'. — EleventhDoctor, Jun 05 '15 at 10:23

score 7 · Answer 4 · answered Apr 29 '12 at 18:05

Welcome the world of software development. Most of the time you are going to:

Work with a large preexisting code base.
There will be limited documentation.
The original creators are long gone.
No one will really know the system end to end.

Most of the time you are going to be tossed in and expected to figure it out. Sure there will people along the way and some amount of documentation to help but in the end it is up to you to make sense of things. I think that this making sense of things is different for each person and is a skill that really makes a difference.

From my experiences/observations of how others do it here is what I see that works:

Approaching it in way that you find the skeleton of the system first and then start filling in the details.
Writing down call trees that show how to get from one place to the other in the code. Many times this is just one particular area not the whole system.
Use of tools that help you understand the code. I use the search function and references function of Slickedit. I also think that eclipse has some tools like this.
Experimentation on how the system works. Often times I use a high level of tracing/printfs with testing different things to help me understand the flow.

In the end the thing that helps the most is Repetition! The more times that you go through things the easier it is to know whats what. In industry we often times say that it will take a few months before you can consider someone to have the knowledge to be effective. This is because of the learning curve.

Lastly in my own experience I feel like there are a few stages to understanding a large new code base:

Stage 1: Getting your bearings. In this stage it is all about learning where things are and getting familiar. I would say that you should expect to be able to make small changes successfully.
Stage 2: You are familiar with where things are and are capable of making medium sized changes with some luck. I would say that at this point you would need a fair amount of help to get things right but you can get things done.
Stage 3: You have a very good understanding of the code and are capable of making large complicated changes. The most common problem at this level is that an incomplete system level understanding causes you to introduce bugs. Still you are very effective.
Stage 4: You have a complete understanding and can bend the system to your will. I would say that very few actually ever get here.

ftr · Answer 5 · 2012-04-30T08:46:03.227

5

Unless something like this exists in documentation, I like to draw UML class and sequence diagrams to get a feel for static structure and dynamic behaviour of the code. These diagrams do not have to be formally correct, but give me something to look at when I'm having a "what is this doing again??" moment.

If they are already there, I also like to look at unit tests to see how it all comes together. If there aren't any, writing them is also a good way to get into the code base.

EDIT: I don't draw the diagrams for all the code, only the part that I'm currently working on and that confuses me (complex inheritance hierarchies etc).

edited Apr 30 '12 at 08:46

answered Apr 29 '12 at 08:36

ftr

1,601
1
14
19

2

There is normally only so much that you can model. Resources are normally very limited and you can spend weeks making the diagrams trying to understand entire system. Unit tests is a good way forward :) – CodeART Apr 29 '12 at 09:09
1

Yes, I should have been more clear that I usually only do that for parts of the code that I'm currently working on, and there only the stuff that confuses me. – ftr Apr 30 '12 at 07:17

Maglob · Answer 6 · 2012-04-29T12:26:39.140

Focus on data structures -- that is where the rubber hits the road.

Data definition (e.g. database schema, file formats) takes far less lines (of code) than application logic, but is crucial for really understanding what application is doing. Learn the data structures and relations. After that it's far more easier to wrap your head around rest of the program.

Knowing the data layout gives you sense of direction while reading code. It's like having map and compass while navigating in forest.

"Show me your flowchart and conceal your tables, and I shall continue to be mystified. Show me your tables, and I won't usually need your flowchart; it'll be obvious." -- Fred Brooks, The Mythical Man-Month

PS. Data lives forever. Programs (and computers) are transitory.

Sundeep · Answer 7 · 2012-06-14T13:01:59.103

I have been working on a complex code base for more than a year now. See if my insights can help you:

Your insights are right, by the time you reach a different part of the code, you forget about the previous part. It can be a never ending cycle. The important lesson to take away here is that the product cannot work without all the parts working properly. Even, if one part fails, the product doesn't work. See it from another angle: if you improve one part dramatically, it still MIGHT NOT result in better working of the product, which is your main goal here.

So, at First: Don't be a developer. Be a tester.

Don't try to understand part by part. Understand the whole product and its working when all parts are together. From a production environment(i.e., a non development environment - no debug points), test the product. Then, just like every tester does, log the problems you face into a bug tracker. Assign the severity and priority to it. As this software existed for quite some time, see if there is already a bug tracker created. If there is one already, you are lucky. Add to those and take time and verify each of the existing ones. At the end of this cycle, you understand the product from a user point of view(you definitely shouldn't miss it) and also a QA point of view. Due course, you might even realize that a line of code will fix the bug, and those who coded it didn't do so as there was no real need back then.

Second Step: Wear you designer cape

Break the product into several parts(not literally or according to your convenience, but according to how they work together). May be your work uptil now or existing knowledge might come into play. Then, try to understand how they work with each other as well as with the 10 dependent libraries. Then, for each tracked bug, write your notes identifying entities of code(e.g.: This change involves modifying classes X,Y,Z, etc.). Probably, by the end of this step, you will have FEW hints of what are the problems with current architecture and what can be improved.

Then, you can decide if the current architecture/design is sufficient and you can go with improving the software OR if the product needs a better design or changes in the existing design.

House of Cards

Also, since complex products come with a lot of code, we might not be in a position to pick up a few things and tweak or improve them. This is because the whole system can be intertwined in such a way that making change to one of the classes is equivalent to changing the position of one card in a house of cards, you never know which end might break. In my experience, this has been true. I have picked a part, improved its code, unaware of the contracts it had with other parts of the code and ended up abandoning the code and realizing my mistake. So, instead of trying to understand parts, try and understand it is a whole.

Prioritize your concerns

You need to keep in mind what you are trying to improve:

Do you want the product to be faster?

Of course you do. But is it the primary of the concerns? Is it being slow? If yes, create performance criteria, identify bottlenecks and improve on those parts. Test again.

Do you want to improve the usability?

Then it's pretty much the API/UI side.

Do you want to improve the security?

Then it's the boundaries you should be exploring.

I have provided only 3 examples, but there are a lot more to look for.

Latest and Best Documentation

I have read here in one of the posts that the latest and best documentation is the code itself. Even if you create a good amount of documentation today, it is history after a while. So, code is your latest piece of documentation. So, whenever you browse through some code, write your understanding in the comments there. While passing the code base, caution them to depend NOT ONLY on comments!

score 1 · Answer 8 · edited Apr 12 '17 at 07:31

1

Related to this. But with the addition of a Lines of code component. And by the way, 46000 lines of code is a medium sized application in my experience.

edited Apr 12 '17 at 07:31

Community

1

answered Apr 29 '12 at 12:20

NWS

1,319
8
17

It might be small as far as industry projects are considered. But for me, this is the biggest I have worked on yet – Ankit Apr 29 '12 at 16:32
@Ankit Just remember it can always get bigger. Read the answers to the linked question & and just keep working hard and ask plenty of questions :) – NWS Apr 29 '12 at 19:17

score 1 · Answer 9 · edited May 23 '17 at 12:40

Some strategies:

Create an inheritance diagram. There are tools that'll do this for you, but even drawing it by hand could be helpful.
Of the 200 classes, you'll probably find that a dozen or so are really important. These will be the classes that are close to the root of the inheritance tree. The others will be variations on a theme, and once you know what their superclasses do, you'll have a pretty good idea how all those leaf classes work too.
Make notes. Using a wiki can be helpful here. Eventually, you might turn some of your notes into better documentation than currently exists.
Draw diagrams of the objects that are created and how they relate to each other.
Use a debugger to step through sections you don't understand.
Use a profiling tool to find performance issues without worrying too much about exactly what's going on in the code. Just put the application through its paces while you record performance metrics. Knowing where the application spends most of it's time can a) give you insight into how it works and b) suggest areas for improvement without you needing to understand every detail.

score 1 · Answer 10 · answered Apr 30 '12 at 05:35

tl;dr: Write docs, write tests. Turn understanding into making.

Full version:

Not so long ago I too got to work on existing projects for the first time at a company, and had to get up to speed quickly.

I found that all projects invariably suffer from the very serious issue: insufficient documentation and testing. This is very bad from the point of project success.

High bus factor. Few people really understand how stuff works, except for maybe one lead developer who's assigned to work on project.
Communication is hard since there's no common picture of architecture one can refer to. Well-informed architecture improvement decisions are hard to make.
It's hard on people. Newcomers struggle to understand a program.

So what worked for me is turning understanding into making.

If there's a particular task I have to do at new project, I'd go about it in the way that allows me to gather the most information about the system.
If there's no task, but ‘just’ an overall understanding is required (which seems to be your case), then I'd write documentation.

Writing documentation to get general understanding

I found that it's hard to make the brain understand complex abstract systems. However, the same understanding comes easily if you're making something (it's probably somehow related to the concept of “flow”).

It's kind of a hack. I think most people enjoy making quality products and take pride in their work, so you can make documentation the end product. You'll get to learn new tools (documentation system like Sphinx, for example) and solve difficult problems of how to convey your understanding, as it emerges, to the target audience. You'll have an end-product to show yourself and others, and people will thank you later, which is a great motivator.

Don't forget the meta-documentation: it's frustrating when your docs go obsolete because others don't bother updating them. Document briefly the development workflow—versioning, testing, documentation and coding style guide, etc.

Writing tests

If I got a particular task at hand—new feature or bugfix—I'd first write a test for it first.

To write a test, I have to understand how test suite works and have it up-and-running. This usually isn't obvious, so I briefly document it. Test suite often has a lot of its own problems like unit tests and integration tests all mixed up, so I also note down the ideas on how to fix it later. Sometimes test suite doesn't work or a all tests fail, so I fix it. There are cases when the test suite is entirely absent, which is a chance to learn how to set up one.

Then I write a test. If it's a large task, I write a high-level “integration” test first. When writing unit tests, the first task is to figure out which unit(s) should implement necessary functionality. This will give you quite a bit of architecture understanding. After that, writing the code implementing the feature is easy.

Writing a test may well become a bigger task than implementing a feature or fixing a but. However, it's rewarding later. Once there's a solid test suite, you (or other developers) can learn a lot by trying to improve random things and see if tests pass.

Understanding already existing complex code base

10 Answers10

Writing documentation to get general understanding

Writing tests

Linked

Related