16

When dealing with a project that has many different files, I always seem to loose track of how the parts interact with one another. I've never really had much of a problem understanding smaller components in isolation, but as the complexity of the project increases, I find myself unable to mentally construct an understanding of what is going on. I notice this especially with OOP projects, as the number of methods and source files increase.

My background: I'm a self-taught web programmer. I've dealt mostly with python for quick and dirty scripts, but I've also done a few basic django projects. I like web frameworks such as flask, because in the simplicity of a single-file layout, I can easily keep track (mostly) of what is going on.

I now find myself in a situation where I need to interact with a large Zend Framework PHP project that someone else developed, and I'm overwhelmed with trying to understand the code spread out to numerous files.

What techniques and processes have you found useful to understand a large code base that someone else has developed? Is there any particular diagram that you find helps you grasp the larger picture?

linqq
  • 262
  • 2
  • 6

7 Answers7

8

The trick to understanding a large code base is to not try to understand all of it. After a certain size, you can't hold a mental model in your head of the entire thing. You start with an anchor point that makes sense for whatever task you need to work on first, then branch from there, learning only the parts you need and trusting that the rest of it works as advertised. It's just like understanding recursion. If you try to hold the entire stack in your head your brain explodes.

Grep, debuggers, and intellisense are your friends here. If you don't know how a function ends up getting called, set a breakpoint on it and work your way down the stack trace.

The other thing to note is that large code bases don't spring up out of nowhere. The larger it is, the more programmers there are with experience on it, so ask them where to start, but be specific. Ask questions like, "I need to add a new payment provider. Where in the code should I be looking?" Focus on just that task instead of trying to understand the entire code base, and piece by piece your familiarity will grow.

Karl Bielefeldt
  • 146,727
  • 38
  • 279
  • 479
  • Thank you for your insights. I've been using vim w/ ctags along with grep. Still getting used to PHP's Xdebug. I think your last paragraph, however, is the most useful advice. – linqq Jan 06 '12 at 14:13
  • There is one last question I would ask you, though. Suppose that you learn the procedure for adding a new payment processor. Beyond storing it mentally, do you have a favorite way of keeping track of such information (e.g. a spreadsheet, flat text file, some have suggested UML) – linqq Jan 06 '12 at 14:18
  • I keep it simple. Short term goes on my whiteboard. For longer term, browser bookmarks and a project folder on a backed-up disk with relevant files in whatever format makes the most sense. I have word documents, pdfs, spreadsheets, plain text files, shortcuts, and saved emails in there. I've tried more integrated solutions like mind mapping software, wikis, evernote, etc. and I can never maintain it long term. – Karl Bielefeldt Jan 06 '12 at 15:46
  • "The larger it is, the more programmers there are with experience on it" they don't necessarily still work there, or they may not remember it well (management) – user1821961 Dec 17 '18 at 15:02
2

As requested, here is my comment as an answer.

When working with other people's code, I tend to create or if possible generate UML class diagrams to give me an overview of the static structure. The visual diagram helps me especially when I have to go back later on and already forgot the context of a class. I sometimes do it for dynamic behaviour as well to line out the interactions between collaborateurs, but I don't do that that often.

If the codebase contains tests (integration or unit), those are sometimes worth checking out as well.

ftr
  • 1,601
  • 1
  • 14
  • 19
  • I use this method at least once a week. For the drawings you can use https://excalidraw.com/ or https://draw.io both free and great tools for drawing. – Hop hop Sep 01 '22 at 05:37
2

There is no shortcut. You just have to suffer through it.

To answer your question about how to get diagrams, doxygen is what you want. AFAIK it works with PHP.

More generally, I go through the roughly following stages when encountering a new codebase:

  1. Understand what it does from a user point-of-view. Be able to actually use the application yourself like a power-user. Understand how the real end-users work with it. This could require sitting down with them until you gain a solid understanding of what they do.

  2. Communicate with the original developers, if possible. At first, you'll have architectural questions stimulated by the end-user experience. Later on, you'll have implementation questions about edge-cases and details. Being able to get answers from the developers will help far more than any comments or documentation (which is at best incomplete and often misleading or entirely absent).

  3. Learn about whatever framework you're using. At a minimum, you should be able to create a "hello world" or other simple application with that framework before diving into the production application.

  4. Get a grip on the entire deployment process (best done while original developers are holding your hand). If you can't take the current codebase and build it and deploy it through a test/validation/prod environment, you're toast. Even the smallest change will require jumping through all the hoops of deployment, so why not get this part down right away? In so doing, you'll get introduced to all the lovely servers, databases, services and scripts used by the app-- you will know "where it lives".

  5. Get a grip on the functional tests (if any). How do you know if the thing is running properly? What do the operations folks have to do for the care and feeding of the application?

  6. Understand the logs of the app. Although I've never worked with PHP, I'll take a wild guess and say that any serious PHP application will have some type of logging. If you understand the logs, you'll have a good starting point when the time comes for debugging problems.

---- Note that up until here, I've haven't even mentioned looking closely at the codebase. There is A LOT you can learn about a large project without even looking at the code. At some point, of course, you have to get comfortable with the code. Here's what helps me:

  1. For diagrams, doxygen is an excellent tool that will generate call-graphs and other relationships for you. It happens to have PHP capability! If you haven't tried doxygen, you absolutely have to give it a spin. Although I can't vouch for how intelligible it will be for code inside a framework, but it could help. Original developers are often shocked at what they see when presented with doxygen-generated docs of their code. The good news is that it really helps to jog their memory and help you better.

  2. If you have a suite of unit tests, taking a close look at them should provide a window into the inner workings of the application. These will also be the first place to look for bugs you might have introduced while making changes.

  3. IDE bookmarks are invaluable for tagging hot-spots in the codebase. Being able to toggle through them rapidly will promote understanding.

  4. Reading recent bug-reports and their resolutions are also valuable for understanding hot-spots and will help you get up to speed on the most relevant portions of the codebase.

albert
  • 155
  • 5
Angelo
  • 1,614
  • 13
  • 9
1

I am actually going to start doing this during the course of this week where a new client needs enhancements for a product that was left by another developer. Below are the steps to be followed:

a) Identify the programming framework used, which helps in knowing how the application flows.

b) Identify common services - logging, exception handling, MVC, database connection, auditing, view (page generation) since these are the parts where we shall be using the most.

c) Run through common user flows (in the application) then try to align them to how the code is laid out

d)Try to make some changes and see how they come out. This is the biggest step because until you start making changes the code is still a black box ....

I will let you know what other ideas I get over the course of the next two weeks

0

My thought is that you should read the documentation. I know hackers love to tell you "the code is the documentation" and use that as an excuse not to write any documentation, but they are wrong. Look at the Linux kernel, a massive software project of many millions of lines of code: I don't think anyone could really come in fresh without having read a book and just pick it up. If the code you are working with isn't documented (or well commented if a smaller project) then it probably is not good code.

adrianmcmenamin
  • 678
  • 3
  • 14
  • The code is sparsely commented and otherwise undocumented. This is regrettable, but there's nothing I can do to change that short of documenting it myself. – linqq Jan 04 '12 at 21:16
  • Adding comments retrospectively is often pointless, as all you can do is re-write the code in english. You cannot get back the mind of the original coder, so you cannot write the important comments about **why** he did things the way he did. – MattDavey Jan 05 '12 at 10:29
0

If you're working with something really big with zero documentation (I've been there too, it's rough!), what I've found that helps is to try to isolate the part you're working on. In that part of the code, figure out how data/events/messages/interactions pass in and out of that unit. In other words, reverse engineer the interface. Write it down. Next time you work on another unit (bonus if it talks to the one you worked first), do the same thing. Keep all your documentation. After a few months of this you'll have a nice picture of how the thing flows.

Figure out the interface of one small unit which you are working on, and record it for later reference. Over time you will stitch together most of how it works. Find what your program does and trace how that message flows. For example, if your system takes some input network message and sends an output message, trace how that message flows through the system, without worrying about all the details- just see where it goes.

anon
  • 1,474
  • 8
  • 8
0

What I do is to create a single UML model from all files which had been reversed from java to UML. This approach means that the model is not anymore just an abstract view of the project but the project itself entirely mapped to MOF and therefore UML.

What I get is a large single model composed by multiple sub models each composed by packages which are composed by classifiers etc.... Working at multi project level also allows me to trace each classifiers and method calls at multiproject levels. I mean that the same method can call one classifier in the project A and another classifier in the project B. The only way to see the full structure of the project is to reverse both of them at the same time. I don't have time to create component diagrams and the information is not really accurate. I prefer to ask the computer to reverse the full project for me. I do a reverse at each iteration with the team and all my diagrams are immediately updated. The reverse engineering is incremental and use the Java to UML Ids mapping. I mean that each java element is mapped to a single and unique MOF element which remain the same during all the project life even if refactored. Doing that gives no more limit to UML modeling and allows very very large and complex project modeling. For your information I work with project having more than 5 000 000 lines of OOP code. All my projects are reversed properly and graphical navigation is possible

I only use class diagrams because from my UML model I can create as many views as needed which are always up to date. I can also model very complex projects.

UML_Guru
  • 61
  • 1
  • 2