Refactoring into lots of methods - is this considered clean or not?

Question

So, I watched as my colleague complained a bit about a project he has inherited from someone who is, shall we say, not very experienced as a programmer (intern left to his own devices on a project).

At one point there was duplicate code, about 7-8 lines of code duplicated (so 14-16 lines total) in a method of say 70-80 lines of code. We were discussing the code, and he spotted a way to refactor this to remove the duplication by simply altering the structure a bit.

I said great, and then we should also move the code out to a separate method so the large method gets a little more readable.

He said 'no, I would never create a method for just 7-8 lines of code'.

Performance issues aside, what are your input on this? Do you lean against using more methods (which in c# pads code with about 3 lines) or larger methods, when that particular code will probably not be used anywhere else? So it is purely a readability issue, not a code-reuse one.

Cheers :)

I would **strongly** advise you to read a copy of Robert C. Martin's "Clean Code", which contains several specific metrics that advise you on how much/little to put in your methods. — STW, Jul 20 '11 at 12:49
It more about complexity (cyclomatic) than lines of code. Run the code through a complexity analysis tool. Usually anything under 10 (Not LoC, but complexity score) is good maintenance wise and anything over 25 should be put in the refactor TODO bin. Set a complexity score that your team feels comfortable with maintaining. Every decision point increases complexity, so in general small methodes with less decision points are easier to maintain that larger methods with more decision points. — Jon Raynor, Jul 20 '11 at 15:54
@Jon I just tried pulling one of my small, private projects through VS2010 Code Metrics - it reports the whole project as 56 in CC, but it seems this is just added up about 1 or 2 per method, so it seems that a larger project will always have higher complexity (duh). Are you talking about 10 per method or class or...? — Max, Jul 20 '11 at 16:47
Anyone who says something like, "I would never create a method for just 7-8 lines of code" is just as ignorant as the intern. I generally aim for my functions to be between 5-15 lines of code. 20-30 is pushing it, and above 30 is where I start rethinking things. — riwalk, Jul 20 '11 at 17:24
@Max- Method level complexity. I think with the VS2010 there is a hierarchy that you can drill into, Solution, Project, Class, Method. My general guidelines for method is <=10 no problem, 11 to 25 pushing it (could use refactoring), > 25 should be refactored. Unfortunately I've seen some numbers in the 500s for some past codebases, really ugly stuff. If you have a code base where every method is under 20, your probably not pulling your hair out. I want to stress numbers should be a guide not an absolute, but it gives you an idea of where the code smells are in the codebase. — Jon Raynor, Jul 20 '11 at 17:43
I might add that different languages can have have functions with different sizes before they become unmanageable. 7 lines long is a biggish Haskell function, while it's tiny for ASM. — Theo Belaire, Jul 20 '11 at 20:17

score 84 · Accepted Answer · answered Jul 20 '11 at 08:08

84

The LoC in a method is a completely pointless measure. The important things are separation of concerns and code duplication.

A method should only do one thing, and that one thing should be expressed in its name. Other things should be left to other methods.

The problems arising from code duplication cannot be overestimated (in other words, they are always bigger than you think). And that alone will justify even methods with 1 line of code.

On the other hand, a method that populates a big form with details from a big data object can easily have 50 or more lines of code, but without a single conditional statement. Why would you want to break that down into smaller pieces if there is no functional requirement (or code duplication)?

answered Jul 20 '11 at 08:08

wolfgangsz

5,373
1
22
24

7

All of this is true, but lines-of-code-per-method is not a pointless measure by any means. Many methods have enough looping and branching to make them much more difficult to read and understand if you can't see the whole thing on screen at once. It's easier to make the argument on readability that way than to venture into the subjective territory of separation-of-concerns. I agree that a method that is strictly linear and deals with exactly one concern is quite readable, but such methods are also rare. – jprete Jul 20 '11 at 13:20
1

+1 Goos answer. As usual, everything depends on the situation. Separation of concerns and code duplication are certainly a good way to split code in methods ! Plus most modern compiler will inline the code when apropriate so no performance hit (and anyway, thinking performance before profiling is usually a stupid idea). – deadalnix Jul 20 '11 at 17:20
I agree, but there is a caveat: when you establish and comply the rule "A method should only do one thing", then you can not write real software. You could write a library with utility stuff. But in any real software you sooner or later will have to write some kind of sequence: first do this, then do this task, and finally do this task. Right, eh? Probably that rule should just be reworded. – frunsi Jul 31 '11 at 17:44
3

@frunsi I think that Bob Martin, Kent Beck, and Martin Fowler would disagree. You *can* write "real" software and comply with the rule that "A method should only do one thing." It has been done. But it requires a clear understanding of levels of abstraction and what does "one thing" really mean. Running a sequence can be "one thing" as long as the abstraction is at the appropriate level. Example: a "purchaseTicket" function might "do one thing" by calling a string of methods to run a credit card, contact an airline, etc. as long as is only orchestrating subtasks (not every low-level detail). – Allan Dec 11 '13 at 18:56
@Allan so, the "a method should do one thing only" rule is just outpaced, right? What about introducing an "orchestrate" thing besides that "method" thing? You may state rules for methods, but if you state that a method does only one thing, then it is as you stated. Your rule states that a method can not be a sequence of things. How do you think that it is ok to state another thing that conflicts with your rule? And please do not just list random persons here. That "do one thing" thing seems to be either extremely tensile, or it may be just wrong. – frunsi Oct 23 '14 at 00:45
@Allan Trying to reword: "A method should do either only one thing, or should just execute a sequence of other methods in a controlled order.". WTF??? Yes, that "only do one thing" rule is wrong, incomplete, wrong, stupid, everything. And enumerating big names of software does not help either. I could even just state that "you" should just state only one opinion ever, never ever anything else, because you are a thing that should just only "do one thing". Strange, 'eh? But that is exactly what you did. – frunsi Oct 23 '14 at 01:00
@Allan One last hint: if your "only do one thing" rule is as flexible as you want it to be, then, that "one thing" could be anything: a logical expression, any expression at all, a sequence of any other things, a whole as-complex-as-it-could-be function in same process or in another process, maybe marshalled in & out of another process on another hardware in another location on this world or even on another world anywhere, anytime. The sky is the limit to "only do one thing" ;-) – frunsi Oct 23 '14 at 01:11
Woah, calm down there @frunsi. :) I didn't just mention "random" people here, but people who have written on this topic and could therefore be useful. But to address your issue, it's a question of abstraction. As you hint at, it revolves around understanding what is "one thing?". Is tying your left shoe one thing, or is it many things? Saying that a function should do "one thing" means it should have a single logical objective at a single level of abstraction. So `tieShoe()` can be "one thing" even though it might call other more-narrow functions to manipulate the laces, and such. – Allan Oct 23 '14 at 15:20
@frunsi (continued...) the idea is that when you look at a function, you should be able to easily state the one "thing" (and hence the name should say what that "thing" is). The use of the word "thing" is intentionally vague to cover the concepts of abstraction. The idea is to 1) avoid functions that "obviously do two different things" and 2) avoid functions that mixes high and low-level details. – Allan Oct 23 '14 at 15:30
@frunsi "Functions should only do one thing" can also be restated as the [Single Responsibility Principle](http://en.wikipedia.org/wiki/Single_responsibility_principle). Maybe you'd prefer that terminology? I mentioned Martin because he writes about this quite a bit in *Clean Code* chapter 3. Fowler addresses it indirectly in his *Refactoring* book. (See [this link](http://pragmaticcraftsman.com/2006/07/single-responsibility-principle/).) Can't find a good quote from Beck off-hand though; so maybe you're right. I should provide links, not just names. – Allan Oct 23 '14 at 15:46
@Allan I am steadied now ;-) However, that "only do one thing" rule is still very very vague and totally insufficient. It is a vague heuristic, not a rule. That is my point! Replace it with conclusive arguments and I will shut up :) – frunsi Oct 23 '14 at 19:30
@frunsi I think that will have to go outside the realm of comments then. I will recommend reading *Clean Code* since he probably does a better explanation than I could. But also, I think that heuristics may be better than hard-and-fast rules in cases like this. – Allan Oct 23 '14 at 22:45

Michael Borgwardt · Answer 2 · 2011-07-21T07:39:15.617

25

Shorter methods are very much more readable - fewer things to keep in mind to understand them. The most vocal advocates of clean code would probably say that almost all methods should be shorter than 8 lines. And pretty much every developer who's interested in the topic will agree that 70-80 lines of code in a method is too much and it should be broken up.

edited Jul 21 '11 at 07:39

answered Jul 20 '11 at 08:04

Michael Borgwardt

51,037
13
124
176

3

This is way too over-generalized. Anyone working with large database objects knows that keeping some methods under 80 lines is completely impossible. I often interface with external databases whose average tables have around 70 columns. – Morgan Herlocker Jul 20 '11 at 15:07
2

@ironcode: there are exceptions, sure. But you'd agree that it's *rare* to have an 80 line method that cannot or should not be broken up, right? – Michael Borgwardt Jul 20 '11 at 15:11
I think it largely depends on what the method is doing. Data access blocks for CRUD stuff could be 200 lines and I could care less (of course, it might indicate a poorly structured database), provided that no more than a few lines is actual logic. Sometimes code is fundamentally repetitive in these types of operations. I guess I could agree that a method should strive to have less than 8 lines (maybe even 3-4) of logic and/or calculations. – Morgan Herlocker Jul 20 '11 at 16:38
1

That is a statement, but you miss the reasoning in your answer. Why "shorter than 8 lines" is good? Why "70-80 lines of code in a method" is too much? Why? – frunsi Jul 21 '11 at 03:00
@frunsi: the reasoning is in the first sentence: the shorter the method, the fewer things you have to keep in mind to understand it. The number of factors (variables, control flow levels, etc.) humans can keep in mind while considering something else is severely limited (will add a link about that). – Michael Borgwardt Jul 21 '11 at 07:37
@Michael: but IMHO "to understand" a method does not require to keep a total "mental model" of a method in mind (which would justify your 8-line limit, BTW: you probably meant this link: http://en.wikipedia.org/wiki/Chunking_%28psychology%29). The point is, a chunk does not have to be a single line. This would be way too limiting - and there IMHO would not be much benefit by remembering single lines of a method. It is all about concept: A chunk should be a kind of sub-task of a method, a logical section = a logic chunk = a block of a few lines, a loop construct etc. – frunsi Jul 31 '11 at 17:27
@Michael: So, IMHO a more practical limit could be "a maximum of 7 logical tasks" in a single method. With that in mind, many methods should contain a single "logical task", but some should consist of a sequence of such tasks. On the extreme lower end: you could not write useful software when each method can only do a single task - you just could not write any "task sequences" at all - with that limit you could just write a library of utility methods, but no real software! On the extreme upper end: any method would have a max. number of 7 tasks, which we are able to remember in our minds. – frunsi Jul 31 '11 at 17:32
1

@frunsi: what makes methods complex is not so much "tasks" as "state" - Variables and control flow levels. A method that has neither can be long without being hard to understand, but most methods introduce quite a lot of state. – Michael Borgwardt Aug 01 '11 at 08:17
@Michael: you are right. But you also disproved your own statement. To state that the maximum number of lines should be a certain number is a stupid rule - and you know why. So why state that after all? – frunsi Aug 01 '11 at 23:16
@frunsi: because it's still a useful rule of thumb and correct for at least 95% of all methods. There are exceptions to most useful rules. – Michael Borgwardt Aug 02 '11 at 07:29
@Michael: exceptions to rules are a different thing. This rule is just a subjective guess. It is like stating, that there should be 5 woman in a mans life. I am afraid that we will never agree. – frunsi Aug 03 '11 at 07:48

score 8 · Answer 3 · answered Jul 20 '11 at 07:56

It's all about separation of concerns. The method should answer 1 question or make 1 decision. And that's all. So, having the methods of 100 lines of code means these method are just do too much. 7-8 LoC methods are absolutely great. The other question is - where are you going to put all these methods. Having 20 short methods in a single class is probably a topic for concern. Having 30 of them is, I believe, the good reason to understand something's wrong with abstractions. Just keep everything as short and simple as possible.

100 lines of code does not *mean* that the method or function is doing to much, it is an indicator. A pretty good indicator, but still not definitive. — Matt Ellen, Jul 20 '11 at 08:08

score 6 · Answer 4 · answered Jul 20 '11 at 08:04

6

There's a school of thought that says 7-8 LOC is a bit long for a method. I don't subscribe to that but I'd say go right ahead - anything less than a screen is good for me.

answered Jul 20 '11 at 08:04

mcottle

6,122
2
25
27

Depends on your screen size, ;). I think it's better to see it in terms of how long it takes to explain or rewrite it (based on the explanation). If your explanation goes like "It does X, then Y, and Z, too", then that's at least two methods you could split your code up with. – cthulhu Jul 24 '11 at 20:15

score 5 · Answer 5 · answered Jul 20 '11 at 12:47

5

Methods should be as simple as necessary, but no simpler.

answered Jul 20 '11 at 12:47

Andrew Lewis

533
1
4
12

+1 for the Einstein quote. – Morgan Herlocker Jul 20 '11 at 15:08
Yeah, I had to butcher it a bit. Methods can be made as simple as possible, but sometimes that's just too simple. – Andrew Lewis Jul 20 '11 at 15:43

score 5 · Answer 6 · edited May 24 '14 at 15:46

LoC is simply not a valuable metric. Instead, look to how much a method does. A method should generally do one thing. IMHO, it is sometimes ok to tack on 1 or 2 very simple operations. For example, calculating the perimeter of a rectangle:

It could be:

public int GetPerimeter(int side1, int side2)
{
    totalWidthLength = side1 * 2;
    totalHeightLength = side2 * 2;
    return totalWidthLength + totalHeightLength;
}

-OR-

public int GetPerimeter(int side1, int side2)
{
    return DoubleSide(side1) + DoubleSide(side2);
}

private int DoubleSide(int side)
{
    return side * 2;
}

Personally, I prefer the first method, even though it technically performs more than one operation. At the same time, you can also have methods with many lines of code that are simple to read and understand.

public void UpdateCustomer()
{
    customer Customer = new Customer();

    customer.Name = textboxName.Text;
    customer.AddrLine1= textboxName.Text;
    customer.AddrLine2 = textboxName.Text;
    customer.OfficialName = textboxName.Text;
    customer.Age = textboxName.Text;
    customer.BirthDay = textboxName.Text;
    customer.NickName = textboxName.Text;
    customer.LastLocation = textboxName.Text;
    customer.... = textboxName.Text;
    customer....= textboxName.Text;
    customer....= textboxName.Text;
    customer....= textboxName.Text;
    customer....= textboxName.Text;
    customer....= textboxName.Text;
    customer....= textboxName.Text;
    customer....= textboxName.Text;
    customer....= textboxName.Text;
    customer....= textboxName.Text;
    customer.... = textboxName.Text;
    customer....= textboxName.Text;
    customer....= textboxName.Text;
    customer....= textboxName.Text;
    customer....= textboxName.Text;
    customer....= textboxName.Text;
    customer....= textboxName.Text;
    customer....= textboxName.Text;
    customer....= textboxName.Text;
    customer....= textboxName.Text;
    customer.... = textboxName.Text;
    customer....= textboxName.Text;
    customer....= textboxName.Text;
    customer....= textboxName.Text;
    customer....= textboxName.Text;
    customer....= textboxName.Text;
    customer....= textboxName.Text;
    customer....= textboxName.Text;
    customer....= textboxName.Text;
    customer....= textboxName.Text;

    db.InsertOnCommit(customer)
    db.SubmitChanges();
}

The above example is about 50 lines, but it could even be 200 lines, and would make no difference. Each property assignment on the customer object adds practically no complexity, so the 50 line method remains at the same complexity level as your average 4 line method. In fact, I would go so far as to say that it has basically the same readability as the method below:

public void UpdateCustomer()
{
    customer Customer = new Customer();

    customer.Name = textboxName.Text;

    db.InsertOnCommit(customer)
    db.SubmitChanges();
}

The 50 line method could probably be acheived in under 10 lines through the use of reflection and regex, iterating over each of the properties in Customer and searching for a textbox that has a matching LIKE value in its name. Obviously, that would be an example of code that is way too clever and resorts to wacky hacks to acheive an asthetic ideal. The bottom line is that LoC is an extremely unreliable complexity/readability metric. Everyone already knows this (my CRUD example is in no way uncommon), and yet while most people talk about how horrible LoC is at measuring productivity, we constantly hear about this LoC per method rules that are supposed to measure code quality. It really cannot be both ways.

I think breaking it down for `DoubleSide` function is good example of breaking something apart just only to break it apart and without forethought to how it can be reused (which helps prove your point that LoC isn't a valueable metric and I agree with that), since as I understand it I don't think doubling a value is that useful compared to just multiplying by 2. My preference would have been to have GetPerimeter be simplified to: `return side1 * 2 + side2 * 2;`. You could even simplify further if you wanted to, but it might make the code less clear: `return 2 * (side1 + side2);` — morbidhawk, Apr 24 '17 at 14:47

Refactoring into lots of methods - is this considered clean or not?

6 Answers6

Linked

Related