Refactoring long methods with a lot of cyclomatic complexity

Question

I'm attempting to refactor what is becoming a very large method -- currently 350 or so lines -- that contains a high degree of cyclomatic complexity.

I understand and ascribe to the theories that methods should be short and that concerns should be separated, and I've been reading over things like this and this among others. I'm struggling to refactor my method, though, for a couple of reasons. To help illustrate these, first here is a simplified example of what the method looks like before refactoring:

private static void ProcessMessage(Message toProcess)
{
    var processingIssues = new StringBuilder();
    var transformedAttachments = new List<TransformedAttachmentData>();

    foreach (var attach in toProcess.Attachments)
    {
        //in reality there is quite a bit of nested logic here,
        //but I'll represent it as a single if-else block.
        if (true)
        {
            //in reality, a third-party library is used in multiple
            //levels of this logic hierarchy to produce the transformed data.
            transformedAttachments.Add(new TransformedAttachmentData(attach));
        }
        else
        {
            processingIssues.AppendLine(
                string.Format(@"Issue ""{0}"" occurred with " +
                @"attachment ""{1}"".", "blah blah problem", attach.Name));
        }
    }

    /*
     * do something else complex here with 
     * processingIssues and validResults
     */
}

First, the very cyclomatic complexity which is concerning to me is born of a complicated set of business rules that need to be applied which together sum up to a couple shared sets of data (represented here as processingIssues and transformedAttachments). Basically what this means to me is that not only should I break the foreach out into its own method, but I should also further consider breaking the logic tree contained within it into additional methods in the interest of keeping my methods "short". In other words, it would seem as though it's not enough just to break my current method into its two obvious outer-most parts: "pre-processing" and "post-processing" of the attachments, if you will. How far do I need to go with this given that I am dealing with a complicated logic tree with concerns that are not so "separate" from one another. In other words, if "separation of concerns" is one reason to refactor to smaller methods, but so is "cyclomatic complexity" -- both of which seem to be orthogonal concepts -- which is the "right" reason to refactor, and how should I do it in the latter case?

Second, in order to separate out some of the concerns to separate methods, those separate methods will have to return multiple variables (again, processingIssues and transformedAttachments), both of which are only needed within the separate methods themselves, so I'm faced with the choice: use output parameters, or create a small class to wrap the return values. Which is better?

Any guidance on this topic and my specific hang-ups would be much appreciated.

Your provided example already looks fine to me. If this is working code, I suggest that you post the entirety of your ProcessMessage function on http://codereview.stackexchange.com, and let them provide detailed feedback. — Robert Harvey, May 14 '15 at 19:03
It's difficult to make meaningful suggestions without seeing the more of the code. If you want to improve the testability of the code (which I'm guessing is almost zero at the moment), then I'd try factoring out some new functions from the existing "God function". If your function takes up more than a screen of code -- some would even argue significantly less than that, then your sense of code smell should be alerting you to potential design issues. — , May 14 '15 at 22:16
I agree with Robert Harvey, you should really post the complete code on codereview.stackexchange. One thing though is, it would be good to create a function that tries to transform a single attachment. This function should return e.g. an `Either` type indicating and storing either a successfull transformation or a transformation failure. This would separate transforming an attachment from processing a message. More information about `Either` here: https://siliconcoding.wordpress.com/2012/10/26/either_in_csharp/ — valenterry, May 15 '15 at 00:11

Mike Nakis · Accepted Answer · 2015-05-15T10:35:59.507

Your scenario seems like you have placed an entire "message processing subsystem" inside a single function. In order to simplify the function, you will need to come up with an actual message processing subsystem consisting of several classes. Some, if not all, of these classes will need to be instantiated, allowed to run, and discarded on each invocation of your method.

This will be a subsystem which will have some state and some classes implementing operators (the actual rules.) The issue of returning multiple results is transformed into the task of modifying more than one item of state of the subsystem, and the issue of various rules interacting with each other is also transformed into having some rules modifying the state of the subsystem while other rules look at this modified state in order to figure out what to do.

Essentially, there would be a pretty close correlation between the set of local variables of your function and the set of member variables of the message processing subsystem class, though the subsystem might need to contain more member variables in order to capture information which is currently encoded in conditional branching (if statements) in your existing method.

Needless to say, the total number of lines in such a subsystem might easily exceed your current 350 lines by a factor of 2 or 3, so, before you embark on such a refactoring, consider whether you really need it.

@Ewan well, that opening part of my answer which stated that a 350 line method is not unreasonably long was rather unnecessary, so I removed it. If writing tiny methods works for you, good for you. However, my thoughts on this issue can be found here: [michael.gr - My notes on "Clean Code" (Prentice Hall)](http://blog.michael.gr/2013/03/my-notes-on-clean-code.html) (Scroll down to where it says "Page 35" and read the first paragraph.) — Mike Nakis, May 15 '15 at 10:28
Thank you for your thoughtful answer. I've definitely derived some useful guidance from it. — rory.ap, May 15 '15 at 13:11

Ewan · Answer 2 · 2015-05-15T19:02:29.177

I would post the whole function to code review.

But, here are my views on your snippet and questions from an OOP view point:

static ProcessMessage should be message.GetTransformedAttachments() and TransformedAttachment.Process()
if (logic) should be attach.IsLogicTrue
TransformedAttachmentData(attach) should be attach.GetTransformedData()

So thats 3 more methods already!

If you create the mini classes you need to return multiple values from the functions, and then move the functions to those classes, you can then use inheritance to remove conditional statements in the logic. This will reduce the length of your code (in methods) and cyclelowhatsit complexity.

your method could end up just being,

for each x in y
   x.doSomething

if x is subclassed enough you have no if statements and the object graph takes care of all your outputs!

note: I'm not saying this is the best solution. A service class with a ton of ifs can sometimes be good if it matches the business definition of the process and lots of inheritance can make debugging a pain in the ass. But you should def give it a go and see if you like it.

Refactoring long methods with a lot of cyclomatic complexity

2 Answers2