Best way to analyse a large class before refactoring it into smaller classes?

Question

Foreword

I'm not looking for a way to refactor a large spaghetti code class, that topic has been covered in other questions.

Question

I'm looking for techniques to begin to understand a class file written by another coworker that spans over 4000 lines and has a single huge update method that is more than 2000 lines.

Eventually I hope to build unit tests for this class and to refactor it into many smaller classes that follow DRY and the Single Responsibility Principle.

How can I manage and approach this task? I'd like to be able to eventually draw a diagram of the events that occur within the class and then move on to abstracting functionality, but I'm struggling to obtain a top-down view of what the class's responsibilities and dependencies are.

Edit: In the answers people have already covered topics relating to input parameters, this information is for newcomers:

In this case the main method of the class has no input parameters and runs a while loop until instructed to stop. The constructor takes no parameters either.

This adds to the confusion of the class, its requirements and dependencies. The class gets references to other classes through static methods, singletons and reaching through classes it already has reference to.

How is this different from most refactoring tasks? You're not planning on breaking the interface are you? — JeffO, Mar 18 '15 at 14:12
@JeffO I think, he wants a 30000 ft view, how to methodologically get grips onto what is the class doing and what needs to be done; not "extract method", "extract variable" etc. — Thomas Junk, Mar 18 '15 at 14:16
@ThomasJunk yes, that's what I'm looking for. I don't want to touch the actual code right now. I'm looking to discover what it does and how it fits into the entire system. In my mind I'm wondering if there are any tools or practices that can help me to do this methodically to avoid (where ever possible) confusion or excessive time wasting. — sydan, Mar 18 '15 at 14:31
Do not *eventually* build unit tests. Start building unit tests ***right now*** and in the tests encode every bit of information that you learn about the class as you go along. (Unfortunately, the tests might discover bugs in the class, which will end up adding confusion to the whole picture, but it is still better than trying to understand what's going on without any tests in place.) — Mike Nakis, Mar 18 '15 at 15:24
Could be a duplicate: http://programmers.stackexchange.com/questions/6395/how-do-you-dive-into-large-code-bases — JeffO, Mar 18 '15 at 15:31
@JeffO this is really useful related question. The main point of the top answer is less applicable to me though... at the moment I cannot step through the code in its environment. It is written in Code::Blocks as a code library with no host application. The compiled libraries are exported when built and then used when the machine is setup in a real time environment. — sydan, Mar 18 '15 at 15:49
Could you build a test environment, with a test host application that can use this class in an uncompiled form so you could step through the code? — JeffO, Mar 18 '15 at 16:05
@JeffO excellent! That was a suggestion I just discussed with one of my coworkers and we think that would be a good next step. — sydan, Mar 18 '15 at 16:19
possible duplicate of [Deciphering foreign code](http://programmers.stackexchange.com/questions/87460/deciphering-foreign-code) — gnat, Mar 18 '15 at 16:55
@gnat I feel that my question is targeted much more at the size of the code than the foreign nature of it. The answers to that question do not address the specific nature of the issue in the same way that the answers presented here do. — sydan, Mar 20 '15 at 08:13

score 10 · Answer 1 · answered Mar 18 '15 at 15:18

Interestingly, refactoring is the most efficient way I've found to understand code like this. You don't have to do it cleanly at first. You can do a quick and dirty analysis refactoring pass, then revert and do it more carefully with unit tests.

The reason this works is because refactoring done properly is a series of small, almost mechanical changes, that you can do without really understanding the code. For example, I can see a blob of code that's getting repeated everywhere and factor that out into a method, without really needing to know how it works. I might give it some lame name at first like step1 or something. Then I notice that step1 and step7 seem to frequently appear together, and factor that out. Then suddenly the smoke has cleared enough that I can actually see the big picture and make some meaningful names.

For finding overall structure, I've found creating a dependency graph often helps. I do this manually using graphviz as I've found the manual work helps me learn it, but there are probably automated tools available. Here's a recent example from a feature I'm working on. I've found this is also an effective way to communicate this information to my colleagues.

dependency graph

Damn i wish i would have read this before my interview today....I was given the task of refactoring a class, this would have helped me a lot. — Grim, Aug 04 '17 at 05:03

score 8 · Answer 2 · answered Mar 18 '15 at 14:52

First, a word of caution (even though it isn't what you're asking) before answering your question: A tremendously important question to ask is why are you wanting to refactor something? The fact that you mention "coworker" begs the question: Is there a business reason for the change? Especially in a corporate environment, there will always be both legacy and even newly committed code that you won't be happy with. The fact of the matter is that no two engineers will solve a given problem with the same solution. If it works and doesn't have horrible performance implications, then why not leave it be? Feature work is generally much more beneficial than refactoring because you don't particularly like an implementation. That said, there's something to be said for technical debt and cleanup of unmaintainable code. My only suggestion here is that you've already made the distinction and have buy in to make the changes, otherwise you'll likely be asked "why are you making these changes" after having sunk in a whole lot of effort and will likely then get much more (and understandable) push back.

Before tackling a refactor, I always want to make sure that I have a firm understanding of what it is I'm changing. There are two methods I generally employ when tackling this kind of thing: logging and debugging. The latter is facilitated by the existence of unit tests (and system tests if they make sense). As you don't have any, I'd strongly urge you to write tests with full branch coverage to ensure that your changes don't create unexpected behavioural changes. Unit tests and stepping through the code help you get a low level understanding of behaviour under a controlled environment. Logging helps you get a better understanding of behaviour in practice (this is especially useful in multithreaded situations).

Once I get a better understanding of everything that's going on through debugging and logging, then I'll start reading code. At this point, I generally have a much better grasp of what's going on and sitting down and reading 2k LoC should make much more sense at that point. My intention here is more to get an end-to-end view and make sure that there isn't any sort of odd edge cases that the unit tests I've written aren't covering.

Then I'll give the design a think and draw up a new one. If backwards compatibility is of any importance, I'll make sure that the public interface doesn't change.

With a firm understanding of what's going on, tests and a fresh new design in hand, I'll put it up for review. Once the design has been vetted by at least two people (hopefully familiar with the code), I'll start implementing the changes.

Hope that isn't too painfully obvious and helps.

Your answer is really useful to me. I actually hadn't thought to produce log results from the current class first however there are some issues: I don't believe I can unit test this class. All of its main functionality occurs in one method with no input parameters, this method picks up most of its state from other classes that are not passed in, its constructor takes no arguments as well, it's dependencies are accessed through singletons. Your suggestion is to unit test first then begin to redesign, how do I adapt if it turns out that a redesign is likely needed in order to unit test? — sydan, Mar 18 '15 at 15:10
An additional note on your word of caution. I understand this, I've seen people talk about how its often best to leave these things alone, however the class is currently still being developed and new developers are starting to work with it. With new modifications and no unit tests there is no way to understand how a change affects the class. I've already been tasked with abstracting one feature of the class and I completed this task, but this has made little difference to the mammoth size of the problem and I believe a lot more needs to be done. — sydan, Mar 18 '15 at 15:18
@sydan: Are you able to mock the interaction with other classes? I know this is easier in some languages than others. If that's a no-go, perhaps system tests to get started? — Demian Brecht, Mar 18 '15 at 15:33

Thomas Junk · Answer 3 · 2015-03-18T15:25:09.823

There is no general recipe, but some rules of thumb (supposing a statically typed language but that shouldn't really matter):

1) Take a look a the method signature. It tells you what goes in and hopefully comes out. From the overall state of this god-class, I suppose this to be a first pain point. I suppose, that more than one parameter is used.

2) Use the search function from your Editor / EDI to determine Exit-Points (usually a return statement_ is used)

From that you know, what the function needs for testing and what you expect in return.

So a simple first test would be calling the function with necessary parameters and expect that the result is non-null. That is not much, but a starting point.

From that you could enter a hermeneutic circle (a term coined by H.G. Gadamer - a german philosopher). The point is: you have now a rudimentary understanding of the class and update this understanding with new detail knowledge and have a new understanding of the whole class.

This combined with the scientific method: make assumptions and look if they hold.

3) Take one parameter and look, where in the class it is somehow transformed:

E.g. you are doing Java like me, there are usually getter and setter for which you can look. Searchpattern $objectname. (or $objectname\.(get|set) if you are doing Java)

You could now make further assumptions on what the method does.

Trace only the input parameters (first) each through the method. If necessary make some diagrams or tables, where you write down every change to each of the variables.

From that, you can write further tests, describing the behaviour of the method. If you have a rough understanding how each input parameter is transformed along the method, start experimenting: pass in null for one parameter or strange input. Make assumptions, check the outcome and vary input and assumptions.

If you do this a time, you have a "ton" of tests describing the behaviour of your method.

4) In a next step I would look for dependencies: what does the method need besides its input to work properly? Are there possibilities to reduce or restructure those? The less dependencies you have, you clearer see the points, where to make the first splits.

5) From there you could go down the whole refactoring road with refactoring patterns and refactoring to patterns.

Here a nice Vid: GoGaRuCo 2014- The Scientific Method of Troubleshooting It's about troubleshooting but nevertheless usefull for a general methodology of understanding how something works.

You mention, that the called function has no input parameters: In this special case i would try to first identify the dependencies and refactor them to parameters, so you could swap them in and out as you like.

Hi Thomas, thank you for your answer, like Demian's it is very useful and a good methodical approach. I believe I will start to move down this route by combining the logging that Demian suggest with the varying of inputs that you have suggested. Sadly the method this class runs has no input parameters. It is a clock task that runs an eternal while loop, it looks as though the dependencies are grabbed via singletons, static classes and then by reaching through classes to grab others. — sydan, Mar 18 '15 at 15:14
The first stage of establishing the methods inputs is likely going to be the hardest for me as there is not way to know which variables that the class uses are inputs, constants, outputs, temporary variables, or random debugging variables. — sydan, Mar 18 '15 at 15:15
Oh, that sounds like fun! Perhaps, you jump right into it and follow the first object you find. The circle then starts like described ;) — Thomas Junk, Mar 18 '15 at 15:17
That may be the best route... would a 'ctrl-f' for the most used variable seem like a strong starting point? Should I edit my original question to mention the lack of input parameters and the methods structure? — sydan, Mar 18 '15 at 15:20
You should mention it, because it makes your situation "special". — Thomas Junk, Mar 18 '15 at 15:23
@sydan document your few-days-long scientific systematic journey in a shareable **midmap**, including all your findings and all open questions. That's what works for me all the time, especially in complex long running tasks — xmojmr, Mar 19 '15 at 12:09

score 4 · Answer 4 · answered Mar 18 '15 at 15:40

4

You mention a 2000 lines update method.

I'd start to read this method top to bottom. When you find a set of statements which belong to each other, i.e. share one responsibility (e.g. create a user), extract these statements into a function.

This way, you don't change how the code works but you

shorten this huge method
gain a better, higher level understanding of this method
prepare the method for further refactorings

Repeat these steps for other methods.

answered Mar 18 '15 at 15:40

Oliver Weiler

2,448
19
28

1

This process when done non-destructively (just read, notice, plan) is sometimes called [code review](http://en.wikipedia.org/wiki/Code_review). Steven C. McConnell's book - Code Complete - has some nice [check list template](http://www.matthewjmiller.net/files/cc2e_checklists.pdf) for this occasion – xmojmr Mar 19 '15 at 15:54

Best way to analyse a large class before refactoring it into smaller classes?

4 Answers4