146

Generally I use private methods to encapsulate functionality that is reused in multiple places in the class. But sometimes I have a large public method that could be broken up into smaller steps, each in its own private method. This would make the public method shorter, but I'm worried that forcing anyone who reads the method to jump around to different private methods will damage readability.

Is there a consensus on this? Is it better to have long public methods, or break them up into smaller pieces even if each piece is not reusable?

Jordak
  • 921
  • 2
  • 7
  • 10
  • 14
    Possible duplicate of [Is it OK to split long functions and methods into smaller ones even though they won't be called by anything else?](https://softwareengineering.stackexchange.com/questions/195989/is-it-ok-to-split-long-functions-and-methods-into-smaller-ones-even-though-they) – gnat Jun 05 '17 at 15:37
  • 8
    All answers are opinion based. Original authors always think their ravioli code is more readable; subsequent editors call it ravioli. – Frank Hileman Jun 05 '17 at 17:03
  • 5
    I suggest you read Clean Code. It recommends this style and has lots of examples. It also has a solution for the "jumping around" problem. – Kat Jun 05 '17 at 17:08
  • 1
    I think this depends on your team and what lines of code are contained. – HopefullyHelpful Jun 05 '17 at 18:02
  • 1
    Isn't the entire framework of STRUTS based off of single-use Getter/Setter methods, all of which are single-use methods? – Zibbobz Jun 05 '17 at 20:27
  • 1
    Possible duplicate of [One-line functions that are called only once](https://softwareengineering.stackexchange.com/questions/107669/one-line-functions-that-are-called-only-once) – Kilian Foth Jun 06 '17 at 06:13
  • 1
    If a method has a clean interface: self documenting, single responsibility, clearly documented preconditions and side effects then you shouldn’t have to jump into it to see how it’s implemented. You can look at it (or glance at documentation) to understand the inputs and results and move on. By abstracting the code of the larger method into smaller parts it can become much more readable, bug-free, and easier to work with. Of course, that can go too far but where that line is drawn can be up to personal preference. –  Jun 06 '17 at 15:33
  • 1
    @jpaugh Ravioli is also harder to make and less messy to eat. Now Lasagne is the *real* dish. But Baklava is even better... –  Jun 06 '17 at 17:59
  • 1
    @nocomprende Wow! Maybe you don't understand the value of what you said! If it's true, baklava could **revolutionize** Silicon Valley! – jpaugh Jun 06 '17 at 22:04
  • Depends how complicated the smaller pieces are. If they're well named and it's somewhat obvious what they're doing then it may not harm readabiliy. I doubt the smaller functions would have absolutely no reusability though. – Pharap Jun 07 '17 at 01:40
  • Depends as well on the language. In C++, for example, it is often considered better to use unexported non-friend functions in an anonymous namespace, where what you are doing can be expressed in terms of the public interface of the class. This has a couple of advantages - it forces you to minimise the number of places where you're depending on your own implementation detail; and in C++, private methods form part of the public interface, so this avoids changing the public interface when implementations change. Feel free to steal this and make it part of your answer; I can't answer myself. – Muzer Jun 07 '17 at 14:41
  • If your language supports it, consider using local functions instead. – svick Jun 18 '17 at 11:19

6 Answers6

205

No, this is not a bad style. In fact it is a very good style.

Private functions need not exist simply because of reusability. That is certainly one good reason to create them, but there is another: decomposition.

Consider a function that does too much. It is a hundred lines long, and impossible to reason about.

If you split this function into smaller pieces, it still "does" as much work as before, but in smaller pieces. It calls other functions which should have descriptive names. The main function reads almost like a book: do A, then do B, then do C, etc. The functions it calls may only be called in one place, but now they are smaller. Any particular function is necessarily sandboxed from the other functions: they have different scopes.

When you decompose a large problem into smaller problems, even if those smaller problems (functions) are only used/solved once, you gain several benefits:

  • Readability. Nobody can read a monolithic function and understand what it does completely. You can either keep lying to yourself, or split it up into bite-sized chunks that make sense.

  • Locality of reference. It now becomes impossible to declare and use a variable, then have it stick around, and used again 100 lines later. These functions have different scopes.

  • Testing. While it is only necessary to unit test the public members of a class, it may be desirable to test certain private members as well. If there is a critical section of a long function that might benefit from testing, it is impossible to test it independently without extracting it to a separate function.

  • Modularity. Now that you have private functions, you may find one or more of them that could be extracted into a separate class, whether it is used only here or is reusable. To the previous point, this separate class is likely to be easier to test as well, since it will need a public interface.

The idea of splitting big code into smaller pieces that are easier to understand and test is a key point of Uncle Bob's book Clean Code. At the time of writing this answer the book is nine years old, but is just as relevant today as it was back then.

  • 7
    Don't forget maintaining consistent levels of abstraction and good names that ensure peeking inside the methods won't surprise you. Don't be surprised if an entire object is hiding in there. – candied_orange Jun 05 '17 at 15:43
  • 15
    Sometimes splitting methods can be good, but keeping things in-line can also have advantages as well. If a boundary check could sensibly done either inside or outside a block of code, for example, keeping that block of code in-line will make it easy to see whether the boundary check is being done once, more than once, or not at all. Pulling the block of code into its own subroutine would make such things more difficult to ascertain. – supercat Jun 05 '17 at 16:29
  • 7
    The "readability" bullet could be labelled "Abstraction". It's about abstraction. The "bite-sized chunks" *do one thing*, one low-level thing that you don't really care about the nitty-gritty details for when you're reading or stepping through the public method. By abstracting it to a function you get to step over it and keep focus on the big scheme of things / what the public method is doing at a higher level. – Mathieu Guindon Jun 05 '17 at 16:32
  • 7
    "Testing" bullet is questionable IMO. If your private members want to be tested, then they probably belong in their own class, as public members. Granted, first step to extracting a class/interface is to extract a method. – Mathieu Guindon Jun 05 '17 at 16:35
  • So, you mentioned "Clean Code" and "private methods" in the same text. If you read the book, you would know that it says not to add private methods. Instead, take that functionality into new class or functions. With rest I agree. – BЈовић Jun 06 '17 at 07:47
  • 6
    Splitting up a function purely by line numbers, code blocks and variable scope with no concern for the actual functionality is a terrible idea, and I'd argue the better alternative to that is just leaving it in one function. You should be splitting functions according to functionality. See [DaveGauer's answer](https://softwareengineering.stackexchange.com/a/350200/84237). You're somewhat hinting at this in your answer (with mentions of names and problems), but I can't help but feel like this aspect needs a whole lot more focus, given its importance and relevance in relation to the question. – Bernhard Barker Jun 06 '17 at 12:52
  • 3
    **Bug reporting.** When a user gives you a symbol reference from a stack trace (remember that line numbers are almost *never* included in a **real** stack-trace) it's easier to find the issue in an 8-line method than a 2000-line method. – Der Kommissar Jun 06 '17 at 16:25
  • 1
    Readability/Reads like a book: Not only is the function smaller (it's a chapter instead of a book), **it comes with a chapter title!** Now, before you even start looking at the code, you have some idea what the function does or at least is supposed to do, based on its (obviously well-chosen) name. Granted, like Dukeling points out, you shouldn't be splitting functions *just because* they are long, but in my experience long functions *tend to be good candidates* for splitting into smaller, bite-sized chunks. – user Jun 06 '17 at 20:55
  • I take issue with "the main function should read like a book" - the main function should have just as singular a responsibility as the others - if it reads like a book, you aren't done decomposing. – Ant P Jun 07 '17 at 07:17
  • 1
    "Descriptive names" should be written in font 50, bold. Many people do not realize that naming a function correctly is one of the most important steps to achieve readability – BgrWorker Jun 07 '17 at 09:11
  • @Michael - you are taking this too literally. "Read like a book" should mean that the body of the method reads like a (mostly) linear listing of things to do. – Roland Tepp Jun 08 '17 at 07:06
  • Another advantage of private functions is replacing one thing with other is easier. If you want to replace algorithm A with algorithm B, it is straightforward if the "huge" logic is split up into private functions. With a monolithic code, it is a lot more cumbersome and error-prone. I suppose this point is a kind of corollary to the point on unit testing. – Masked Man Jun 08 '17 at 09:35
36

It's probably a great idea!

I do take issue with splitting up long linear sequences of action into separate functions purely to reduce the average function length in your codebase:

function step1(){
  // ...
  step2(zarb, foo, biz);
}

function step2(zarb, foo, biz){
  // ...
  step3(zarb, foo, biz, gleep);
}

function step3(zarb, foo, biz, gleep){
  // ...
}

Now you've actually added lines of source and reduced total readability considerably. Especially if you're now passing lots of parameters between each function to keep track of state. Yikes!

However, if you've managed to extract one or more lines into a pure function that serves a single, clear purpose (even if called only once), then you've improved readability:

function foo(){
  f = getFrambulation();
  g = deglorbFramb(f);
  r = reglorbulate(g);
}

This likely won't be easy in real-world situations, but pieces of pure functionality can often be teased out if you think about it long enough.

You'll know you're on the right track when you have functions with nice verb names and when your parent function calls them and the whole thing practically reads like a paragraph of prose.

Then when you return weeks later to add more functionality and you find that you can actually re-use one of those functions, then oh, rapturous joy! What wondrous radiant delight!

DaveGauer
  • 1,287
  • 7
  • 12
  • 2
    I have heard this argument for many years, and it never plays out in the real world. I am speaking specifically of one or two line functions, called from only one place. While the original author always thinks it is "more readable", subsequent editors often call it ravioli code. This depends on the call depth, generally speaking. – Frank Hileman Jun 05 '17 at 17:00
  • 34
    @FrankHileman To be fair, I never saw any developer compliment the code of his or her antecessor. – T. Sar Jun 05 '17 at 17:13
  • 1
    Your function names remind me of the plumbus scene from Rick and Morty. – Captain Man Jun 05 '17 at 19:12
  • 4
    @TSar the previous developer is the source of all problems! – Frank Hileman Jun 06 '17 at 01:03
  • 18
    @TSar I've never even complimented the previous developer when I was the previous developer. – IllusiveBrian Jun 06 '17 at 13:48
  • 3
    The nice thing about decomposing is that you can then compose: `foo = compose(reglorbulate, deglorbFramb, getFrambulation);` ;) – Warbo Jun 06 '17 at 14:04
  • 1
    @Warbo one thing is for sure, after a programmer is done composing, he starts to decompose again. –  Jun 06 '17 at 18:02
  • @nocomprende We all [decompose](http://www.sciencemuseum.org.uk/hommedia.ashx?id=7285&size=Large) in the end - even the non-programmers! – Pharap Jun 07 '17 at 01:45
  • 1
    @Pharap [Monty Python - Decomposing Composers](https://www.youtube.com/watch?v=sjWPXybVjYE) – Steadybox Jun 07 '17 at 16:19
  • 1
    @FrankHileman I see it useful with complex statements which are difficult to understand: A complex shell call which requires a half dozen variables can be encapsulated in a `sign_mail(...)`, which makes it immediately clear what that is supposed to do, and you don't have to waste brain power on it unless you want to troubleshoot the mail signature. Compared to an inline 200 character long `subprocess.check_call(shlex.split("openssl smime -sign -binary..."))`. It also makes it easy to drop in a replacement signing method. – TemporalWolf Jun 07 '17 at 19:23
  • I fail to see why you wouldn't create a unique class to encapsulate State, and thus only pass one parameter. In addition, there is no doubt that refactoring improves readability and testability, and often allows or enhances code reuse as well. – Brinky Jan 31 '23 at 17:42
19

The answer is it highly depends on the situation. @Snowman covers the positives of breaking up large public function, but its important to remember there can be negative effects too, as you are rightly concerned about.

  • Lots of private functions with side-effects can make code difficult to read and quite fragile. This is especially true when these private functions depend upon the side effects of each other. Avoid having tightly-coupled functions.

  • Abstractions are leaky. While its nice to pretend how an operation is performed or how data is stored doesn't matter, there are cases where it does, and it's important to identify them.

  • Semantics and context matter. If you fail to clearly capture what a function does in its name, you again can reduce readability and increase fragility of the public function. This is especially true when you start creating private functions with large numbers of input and output parameters. And while from the public function, you see what private functions it calls, from the private function, you don't see what public functions call it. This can lead to a "bug-fix" in a private function that breaks the public function.

  • Heavily decomposed code is always clearer to the author than to others. This doesn't mean that it can't still be clear to others, but it's easy to say that it makes perfect sense that bar() must be called before foo() at the time of writing.

  • Dangerous re-use. Your public function likely restricted what input was possible to each private function. If these input assumptions are not properly captured (documenting ALL assumptions is hard), someone may improperly re-use one of your private functions, introducing bugs into the codebase.

Decompose functions based on their internal cohesion and coupling, not absolute length.

dlasalle
  • 842
  • 6
  • 12
  • 6
    Ignore readability, let's look at the bug reporting aspect: if the user reports that a bug happened in `MainMethod` (easy enough to get, usually symbols are included and you should have a symbol dereference file) which is 2k lines (line numbers are never included in bug reports) and it's an NRE which can happen at 1k of those lines, well I'll be damned if I'm looking into that. But if they tell me it happened in `SomeSubMethodA` and it's 8 lines and the exception was an NRE, then I'll probably find and fix that easy enough. – Der Kommissar Jun 06 '17 at 16:23
2

IMHO, the value of pulling out blocks of code purely as a means of breaking up complexity, would often be related to the difference in complexity between:

  1. A complete and accurate human-language description of what the code does, including how it handles corner cases, and

  2. The code itself.

If the code is a lot more complicated than the description, replacing the in-line code with a function call may make the surrounding code easier to understand. On the other hand, some concepts can be expressed more readably in a computer language than in human language. I would regard w=x+y+z;, for example, as more readable than w=addThreeNumbersAssumingSumOfFirstTwoDoesntOverflow(x,y,z);.

As a large function gets split up, there will be less and less difference there is between the complexity of sub-functions versus their descriptions, and the advantage of further subdivisions will decrease. If things are split up to the point that descriptions would be more complicated than code, further splits will make the code worse.

supercat
  • 8,335
  • 22
  • 28
  • 2
    Your function name is misleading. There is nothing on the w=x+y+z that indicates any sort of overflow control, while the method name by itself seem to imply otherwise. A better name for your function would be "addAll(x,y,z)", for example, or even just "Sum(x,y,z)". – T. Sar Jun 05 '17 at 17:11
  • @TSar: In C, a function to add three `int` values x, y, and z could be written as `(int)((unsigned)x+y+z)`, or as `(x+y)+z`, `x+(y+z)`, `(x+z+)y`, or on some compilers, `__UNSPECIFIED ? (x+y)+z : x+(y+z)` [the latter might allow optimizations not possible with `(x+y)+z`]. Those different ways of writing the function would offer different guarantees in corner cases and the name `Sum(x,y,z)` would give no clue as to which behavior would apply. I'm not sure how my name is misleading since it says nothing about what would happen if the indicated assumption is violated. – supercat Jun 05 '17 at 17:46
  • 6
    Since were are talking about _private methods_, this question isn't possibly referring to C. Anyway, that's not my point - you could just call the same function "sum" without the need of tackling the assumption on the method signature. By your logic, any recursive method should be called "DoThisAssumingStackDoesntOverflow", for example. Same goes for "FindSubstringAssumingStartingPointIsInsideBounds". – T. Sar Jun 05 '17 at 18:27
  • 1
    In other words, basic assumptions, whatever they may be (two numbers won't overflow, stack won't overflow, index was passed correctly) shouldn't be present on the method name. Those things should be documented, of course, if needed, but not on the method signature. – T. Sar Jun 05 '17 at 18:29
  • @TSar: The same principles would apply in e.g. C# when using a checked context, or in C++, though perhaps a somewhat better example might be a function to compute the average of three int values (rather than the sum). Some applications may need such a function to handle cases where the average exceeds 1/3 of the maximum integer value, and some might not. Someone who looks just at the function and not the caller might not realize that the caller would need it to handle large values. Someone who looks just at the call site might not realize that the function doesn't handle such values. – supercat Jun 05 '17 at 19:26
  • Again, adding basic assumptions to the signature of a method is highly unusual. I've never once saw an Math Api that had any of those "Assuming this" or "assuming that" as a clause on any calls. – T. Sar Jun 05 '17 at 20:29
  • 1
    @TSar: I was being a little facetious with the name, but the key point is that (switching to averaging because it's a better example) a programmer who sees "(x+y+z+1)/3" in context will be far better placed to know whether that is an appropriate way of computing the average given the calling code, than someone who is looking at either the definition or the call site of `int average3(int n1, int n2, int n3)`. Splitting out the "average" function would mean that someone who wanted to see whether it was suitable for requirements would have to look in two places. – supercat Jun 05 '17 at 21:22
  • 1
    If we take C# for example, you would only have a single "Average" method, which can be made to accept any number of parameters. [In fact, in c#, it works over any collection of numbers](https://msdn.microsoft.com/en-us/library/system.linq.enumerable.average(v=vs.110).aspx) - and I'm sure it way easier to understand than tackling crude math into the code. – T. Sar Jun 05 '17 at 22:48
  • @TSar: That "average" method returns "double" even when processing integers. If one is interested in computing the average of three integers, there are a variety of ways one can write a function to do so, some of which will only work for values within a limited range, and others of which will work for any three values of integer type but may be significantly slower. If code needs to average three non-negative integers that aren't too big, `(x+y+z+1)/3` is a clear and easy way to do it (or `(x+y+z+1)//3` in Python); if code does that a lot, the speed could be an advantage. – supercat Jun 06 '17 at 00:13
  • It isn't on several other languages. On C#, for example, that math line of yours could return _anything_ from a int to a float to a long to a double to a decimal, because C# picks the biggest number type used on the operation to determine what type the result will be. So, unless the developer knows beforehand that x and y and z are ints, it is impossible to be sure that the result will be also an int - and it is very easy to forgot that the language will drop the decimal cases in this case. Using the built-in average method, this doesn't happen. – T. Sar Jun 06 '17 at 09:56
  • 1
    Another point to consider - when writting (x+y+z+1)/3, it's _really easy_ to forget the Math rules for the language and assume that this will return something with decimal places, which it won't if every number here is an int. This causes _so many problems_ in _so many places_ in production code, in the form of big compounding rounding erros, and all of them could be avoided by using an Average method. – T. Sar Jun 06 '17 at 10:00
  • @TSar: I was assuming for purposes of the example that the programmer would know the types of x, y, and z, along with the arithmetic rules of the language. If what's needed *is* an integer, the aforementioned code will yield a correctly-rounded value in C, C++, C#, or Java, in cases where the sum is a non-negative value of that type. – supercat Jun 06 '17 at 19:48
  • Between knowing the types of several different variables that may have the same name but other types in different pieces of code, and knowing the type of a single method call, I'll stick to the method call as easier and more readable. I'm not advocating that you should use methods for everything, mind you, but there is a false sense of security in "simple math operations" when dealing with those languages. This type of math code is normally the one you want to unit test the most. – T. Sar Jun 06 '17 at 20:15
  • Which is, now that I stop to think about it, another point in favor of not using inline math code - if your math is segregated on a separated method, you can unit-test it more easily. – T. Sar Jun 06 '17 at 20:16
2

It is a balancing challenge.

Finer is better

The private method effectively provides a name for the contained code, and sometimes a meaningful signature (unless half of its parameters are completely ad-hoc tramp data with unclear, undocumented interdependences).

Giving names to code constructs is generally good, as long as the names suggest a meaningful contract for the caller, and the contract of the private method accurately corresponds to what the name suggests.

By forcing oneself to think of meaningful contracts for smaller portions of the code, the initial developer can spot some bugs and avoid them painlessly even before any dev testing. This only works if the developer is striving for concise naming (simple sounding but accurate) and is willing to adapt the boundaries of the private method so that concise naming is even possible.

Subsequent maintenance is helped, too, because additional naming helps make the code self-documenting.

  • Contracts over small pieces of code
  • Higher level methods sometimes turn into a short sequence of calls, each of which does something significant and distinctly named - accompanied with the thin layer of general, outermost error handling. Such a method can become a valued training resource for anyone who has to quickly become an expert on the overall structure of the module.

Until it gets too fine

Is it possible to overdo giving names to small chunks of code, and end up with too many, too small private methods? Sure. One or more of the following symptoms indicate that the methods are getting too small to be useful:

  • Too many overloads that don't really represent alternative signatures for the same essential logic, but rather a single fixed call stack.
  • Synonyms used just to refer to the same concept repeatedly ("entertaining naming")
  • Lots of perfunctory naming decorations such as XxxInternal or DoXxx, especially if there's no unified scheme in introducing those.
  • Clumsy names almost longer than the implementation itself, like LogDiskSpaceConsumptionUnlessNoUpdateNeeded
Jirka Hanika
  • 710
  • 6
  • 12
1

Contrary to what the others have said, I would argue that a long public method is a design smell that is not rectified by decomposing into private methods.

But sometimes I have a large public method that could be broken up into smaller steps

If this is the case then I would argue each step should be its own first class citizen with a single responsibility each. In an object-oriented paradigm, I would suggest creating an interface and an implementation for each step in such a way that each has a single, easily-identifiable responsibility and can be named in a way that the responsiblity is clear. This allows you to unit-test the (formerly) large public method, as well as each individual step independently of each other. They should all be documented as well.

Why not decompose into private methods? Here are a few reasons:

  • Tight coupling and testability. By reducing your public method's size, you have improved its readability, but all the code is still tightly coupled together. You can unit test the individual private methods (using advanced features of a testing framework), but you cannot easily test the public method independently of the private methods. This goes against the principles of unit testing.
  • Class size and complexity. You've reduced the complexity of a method, but you've increased the complexity of the class. The public method is easier to read, but the class is now more difficult to read because it has more functions that define its behavior. My preference is for small single-responsibility classes, so a long method is a sign that the class is doing too much.
  • Cannot be reused easily. It's often the case that as a body of code matures reusability is useful. If your steps are in private methods, they cannot be reused anywhere else without first extracting them somehow. Additionally it may encourage copy-paste when a step is needed elsewhere.
  • Splitting in this way is likely to be arbitrary. I would argue that splitting a long public method doesn't take as much thought or design consideration as if you were to break up responsibilties into classes. Each class must be justified with a proper name, documentation, and tests, whereas a private method doesn't get as much consideration.
  • Hides the problem. So you've decided to split your public method into small private methods. Now there is no problem! You can keep adding more and more steps by adding more and more private methods! On the contrary, I think this is a major problem. It establishes a pattern for adding complexity to a class that will be followed by subsequent bug fixes and feature implementations. Soon your private methods will grow and they will have to be split.

but I'm worried that forcing anyone who reads the method to jump around to different private methods will damage readability

This is an argument I've had with one of my colleagues recently. He argues that having the entire behavior of a module in the same file/method improves readability. I agree that the code is easier to follow when all together, but the code is less easy to reason about as complexity grows. As a system grows, it become intractable to reason about the entire module as a whole. When you decompose complex logic into several classes each with a single responsiblity, then it becomes much easier to reason about each part.

Samuel
  • 9,137
  • 1
  • 25
  • 42
  • 1
    I would have to disagree. Too much or premature abstraction leads to unnecessarily complex code. A private method can be the first step in a path that could potentially need to be interfaced and abstracted, but your approach here suggests wandering around places that may never need visiting. – WillC Jun 08 '17 at 17:51
  • 1
    I understand where you're coming from, but I think code smells like the one OP is asking about are signs that it *is* ready for a better design. Premature abstraction would be designing these interfaces before you even run into that problem. – Samuel Jun 08 '17 at 19:48
  • I agree with this approach. In fact, when you follow TDD, it's the natural progression. The other person says it's too much premature abstraction, but I find it much easier to quickly mock a functionality I need in a separate class (behind an interface), than having to actually implement it in a private method in order for the test to pass. – Eternal21 Nov 17 '17 at 12:41