50

We are designing coding standards, and are having disagreements as to if it is ever appropriate to break code out into separate functions within a class, when those functions will only ever be called once.

For instance:

f1()
{
   f2();  
   f4();
}


f2()
{
    f3()
    // Logic here
}


f3()
{
   // Logic here
}

f4()
{
   // Logic here
}

versus:

f1()
{
   // Logic here
   // Logic here
   // Logic here
}

Some argue that it is simpler to read when you break up a large function using separate single-use sub functions. However, when reading code for the first time, I find it tedious to follow the logic chains and optimize the system as a whole. Are there any rules typically applied to this sort of function layout?

Please note that unlike other questions, I am asking for the best set of conditions to differentiate allowable and non-allowable uses of single call functions, not just if they are allowed.

David
  • 1,839
  • 3
  • 14
  • 11
  • 24
    At this abstract level, no, not really. Single use functions are perfectly fine if separating out that piece of functionality improves the readability and/or testability of the code, which obviously you have to judge on a case-by-case basis. But long functions do have a habit of getting longer and more confusing over time, so when in doubt err on the side of small functions. – Ixrec Jan 23 '16 at 00:18
  • 1
    This can also depend on the language you're using. A functional language that doesn't use a classical object-oriented approach could benefit greatly from function composition. It also encourages single-responsibility and testability. However, in a classical language without a functional slant, it might not be worth the trouble (dealing with access modifiers, reduced testability due to access modifiers, non-first-class functions....etc). –  Jan 23 '16 at 02:45
  • Also - this is not an answer - but my comment above only applies to functions with return values. Void functions (subroutines) that purely create side effects fall squarely into the "not worth the trouble" category, in my opinion. –  Jan 23 '16 at 02:46
  • 6
    Is there an obvious name for the function that lets you know exactly what happens in it? Then it probably makes sense to make it separate. – RemcoGerlich Jan 23 '16 at 10:46
  • The answer is given here: https://www.cqse.eu/en/blog/the-real-benefits-of-short-methods/ – Nils Göde Jan 25 '16 at 06:22
  • This also gives you a natural location to put documentation. – Thorbjørn Ravn Andersen May 20 '22 at 17:58

8 Answers8

95

The rationale behind splitting functions is not how many times (or how often) they will be called, it's keeping them small and preventing them from doing several different things.

Bob Martin's book Clean Code gives good guidelines on when to split a function:

  • Functions should be small; how small? See the bullet bellow.
  • Functions should do only one thing.

So if the function is several screens long, split it. If the function does several things, split it.

If the function is made of sequential steps aimed to a final result, then no need to split it, even if it's relatively long. But if the function does one thing, then another, then another, and then another, with conditions, logically separated blocks, etc., it should be split. As a result of that logic, functions should be usually small.

If f1() does authentication, f2() parses input into smaller parts, if f3() makes calculations, and f4() logs or persist results, then they obviously should be separated, even when every one of them will be called just once.

That way you can refactor and test them separately, besides the added advantage of being easier to read.

On the other hand if all the function does is:

a=a+1;
a=a/2;
a=a^2
b=0.0001;
c=a*b/c;
return c;

then there is no need to split it, even when the sequence of steps is long.

Tulains Córdova
  • 39,201
  • 12
  • 97
  • 154
  • 1
    Thanks for the answer. I think we are internally at a disagreement because we have some long complex loading to do. As such, its very linear in nature. For me, breaking it out and following a tree is hard. I once found (due to revisions) a loading function that called 5-6 methods deep with no other callers on each step down. Once combined, the process simplified greatly. However, I see what you are saying too. It seems like a bit of a balancing act. – David Jan 23 '16 at 00:37
  • 1
    Again, the number of callers is not necessary the best indicator of when to split. In my answer I state that if things are linear, sequential and simple, there's no need to split for the sake of splitting. – Tulains Córdova Jan 23 '16 at 00:41
  • @user61852: I think the problem he's getting at is that you don't explain *why* you should do that. You give guidance, but no reasoning. – Nicol Bolas Jan 23 '16 at 02:51
  • 4
    @NicolBolas I wrote *"That way you can refactor and test them separatelly, besides the added advantage of being easier to read"*. – Tulains Córdova Jan 23 '16 at 02:54
  • While I agree with your post, number of times something is called does play into it too. – whatsisname Jan 23 '16 at 04:46
  • 1
    Also note: if you have difficulty splitting up a large function into smaller ones, that is an indicator that you should just leave it as one large function. (Or look for a way to refactor it so that it's more easily separable) – user253751 Jan 23 '16 at 06:11
  • @David Rules of thumb are just that... rules of thumb. There are times where it makes sense to break it out, and there are times where it makes sense to keep it all in one place. Breaking out a function creates a formal break in the stream of consciousness, as you noted. There are times where that is very beneficial. – Cort Ammon Jan 23 '16 at 06:47
  • Well, a switch-statement with a handful lines of prologue and/or epilogue may be multiple screens long, without splitting it being in any way useful for readability and maintainability. But, arguing from the exceptions is a bad idea... – Deduplicator Jan 23 '16 at 08:56
  • @Hulk My mistake, I just changed it from `print` to `return`. – Tulains Córdova Jan 23 '16 at 15:27
  • @David It's one of the major balancing acts, yes. But your example really sounds like "we broke up the methods because we have a guideline that methods shouldn't be longer than 20 lines of code". That's just bonkers - if you can't figure out a good, descriptive name for the function, it probably means that you're doing something wrong. And in turn, if you use good, descriptive names for all your methods, following what's happening rarely means having to understand the whole tree. Bonus points apply when using e.g. pure functions - you'd be surprised how trivial they are to reason about. – Luaan Jan 23 '16 at 17:14
  • @Luaan I reused the OP's original sample function names. – Tulains Córdova Jan 23 '16 at 17:24
  • I have to say, as a novice programmer my *very first* thought in creating a new function isn't *is this only doing its specific purpose?*; it's *how many times is this going to be called? Can I just **not** make a function if it's only called once?* It's a terrible habbit, but I understand why it's an issue for a lot of (especially new) programmers. – Chris Cirefice Jan 23 '16 at 18:56
  • 2
    I'd be hesitant to use "Clean Code" to prove this specific point, I had the impression that the particular example was very artificial and ended up making the code less readable. – Kos Jan 24 '16 at 15:26
  • 1
    Bob Martin's guideline should be taken with a grain of salt because it is not based on experimental evidence. I am convinced the actual reason behind that guideline is because Bob Martin is opposed to comments in source code. Splitting is a way to compensate for the lack of comments - comments are replaced by calls to named functions. – ZunTzu Jan 25 '16 at 13:58
  • @ZunTzu Don't you think most people can handle only so much detail at any given time? – Tulains Córdova Jan 25 '16 at 14:02
  • @user61852 Remember: I am assuming that comments are used. Do you have experimental evidence to back the claim that a commented long function is more difficult to read then several uncommented short functions? Bob Martin gives no such evidence. – ZunTzu Jan 25 '16 at 14:23
  • @ZunTzu "War and Peace", the novel, is longer than any function I know (although I have no evidence of that claim) and yet it can be read. Code is different. I've seen functions when the body of a `while loop` is 1500 lines long. I cannot post it here because it's not my property. No amount of comments can make that right. – Tulains Córdova Jan 25 '16 at 15:02
  • @user61852 and you would be right to split the function. I am just saying Martin's guideline should not be followed blindly. – ZunTzu Jan 25 '16 at 15:12
42

I think function naming is very important here.

A heavily dissected function can be very self-documenting. If each logical process within a function is split out into its own function, with minimal internal logic, then the behavior of each statement can be reasoned out by the names of the functions and the parameters they take.

Of course, there is a downside: the function names. Like comments, such highly specific functions can often get out of sync with what the function is actually doing. But, at the same time, by giving it a proper function name you make it harder to justify scope creep. It becomes more difficult to make a sub-function do something more than it clearly ought to.

So I would suggest this. If you believe a section of code could be split out, even though nobody else calls it, ask yourself this question: "What name would you give it?"

If answering that question takes you more than 5 seconds, or if the function name you pick is decidedly opaque, there's a good chance that it isn't a separate logical unit within the function. Or at the very least, that you're not sure enough about what that logical unit is really doing to properly split it out.

But there is an additional problem that heavily dissected functions can encounter: bug fixing.

Tracking down logical errors within a 200+line function is hard. But tracking them through 10+ individual functions, while trying to remember the relationships between them? That can be even harder.

However again, semantic self-documentation through names can play a key role. If each function has a logical name, then all you need to do to validate one of the leaf functions (besides unit testing) is to see if it actually does what it says it does. Leaf functions tend to be short and focused. So if every individual leaf function does what says it should, then the only possible problem is that someone passed them the wrong stuff.

So in that case, it can be a boon to bug fixing.

I think it really does come down to the question of whether you can assign a meaningful name to a logical unit. If so, then it can probably be a function.

Nicol Bolas
  • 11,813
  • 4
  • 37
  • 46
  • 15
    Being unable to think of a good name for a part of a function could also mean that you're trying to split too much at once. – Sebastian Redl Jan 23 '16 at 21:53
  • 1
    While not part of the OP's specific example, nesting depth may also be a factor (both positively and negatively). Viewing code with an abstracted hierarchy (i.e., function calling function calling function) can be easier than viewing code with a transparent hierarchy. (ISTM, an IDE should be able to "inline" functions on request, like some web interfaces provide "details" links which present the details inline when the link is followed.) Your five-second rule-of-thumb is nice, but I wonder if naming difficulty might sometimes suggest refactoring desirability vs. inherent complexity. IANAP. –  Jan 24 '16 at 17:31
13

Any time you feel the need to write a comment to describe what a block of text is doing, you have found an opportunity to extract a method.

Rather than

//find eligible contestants
var eligible = contestants.Where(c=>c.Age >= 18)
eligible = eligible.Where(c=>c.Country == US)

try

var eligible = FindEligible(contestants)
dss539
  • 899
  • 5
  • 8
5

DRY - Don't repeat yourself - is just one of several principles that have to be balanced.

Some other that come to mind here are naming. If the logic is convoluted not obvious to the casual reader, extraction into method/function whose name better encapsulated what and why it is doing it it can improve program readability.

Also aiming for less than 5-10 lines of code method/function may come into play, depending on how many lines the //logic becomes.

Also, a function with parameters can act as an api and can name the parameters appropriately to then make the code logic clearer.

Also you may find that over time collections of such functions do reveal dome useful grouping, e.g. admin and then they can be easily gathered under it.

Michael Durrant
  • 13,101
  • 5
  • 34
  • 60
  • 3
    Why is DRY relevant? Either way the code is not repeated. The question is just about how to break it up. – Martin Smith Jan 23 '16 at 17:37
  • 5
    @MartinSmith: DRY is relevant because it's implicit in the question. (The OP is asking: is there any point in splitting something out into a function when this won't help with DRY? And this answer is saying: yes, because DRY is just one of several reasons for splitting something out into a function.) – ruakh Jan 24 '16 at 01:34
5

The point about splitting functions is all about one thing: simplicity.

A reader of code cannot have more than about seven things in mind simultaneously. Your functions should reflect that.

  • If you build too long functions, they will be unreadable because you have much more than seven things inside your functions.

  • If you build a ton of one-line functions, readers similarly get confused in the tangle of functions. Now you would have to keep more than seven functions in your memory to understand the purpose of each one.

  • Some functions are simple even though they are long. The classic example is when a function contains one big switch statement with many cases. As long as the handling of each case is simple, your functions is not too long.

Both extremes (the Megamoth and the soup of tiny functions) are equally bad, and you have to strike a balance in between. In my experience, a nice function has something around ten lines. Some functions will be one-liners, some will exceed twenty lines. The important thing is that each function serves an easily understandable function while being equally understandable in its implementation.

4

It's all about separation of concerns. (ok, not all about it; this is a simplification).

This is fine:

function initializeUser(name, job, bye) {
    this.username = name;
    this.occupation = job;
    this.farewell = bye;
    this.gender = Gender.unspecified;
    this.species = Species.getSpeciesFromJob(this.occupation);
    ... etc in the same vein.
}

That function is concerned with one single concern: it sets the initial properties of a user from the provided arguments, defaults, extrapolation, etc.

This is not fine:

function initializeUser(name, job, bye) {
    // Connect to internet if not already connected.
    modem.getInstance().ensureConnected();
    // Connect to user database
    userDb = connectToDb(USER_DB);
    // Validate that user does not yet exist.
    if (0 != userDb.db_exec("SELECT COUNT(*) FROM `users` where `name` = %d", name)) {
        throw new sadFace("User exists");
    }
    // Configure properties. Don't try to translate names.
    this.username = removeBadWords(name);
    this.occupation = removeBadWords(translate(job));
    this.farewell = removeBadWords(translate(bye));
    this.gender = Gender.unspecified;
    this.species = Species.getSpeciesFromJob(this.occupation);
    userDb.db_exec("INSERT INTO `users` set `name` = %s", this.username);
    // Disconnect from the DB.
    userDb.disconnect();
}

Separation of concerns suggests this should be handled as multiple concerns; the DB handling, the user-existence validation, the setting of the properties. Each of these is very easily tested as a unit, but testing all of them in a single method makes for a very convoluted unit test set, which would be testing things as different as how it handled the DB going away, how it handles creation of empty and invalid job titles, and how it handles creating a user twice (answer: badly, there's a bug).

Part of the problem is that it goes all over the place in terms of how high-level it is: low level networking and DB stuff has no place here. That's part of the concern separation. The other part is that stuff that should be the concern of something else, is instead the concern of the init function. For example, whether to translate or apply bad language filters, might make more sense as a concern of the fields being set.


Later edit: this all means that in an ideal codebase there should be no methods or functions which are called only once. They should be called at the very least twice: once by the code they're intended for, and once by the unit tests.

Dewi Morgan
  • 253
  • 2
  • 7
  • 1
    I would like to see your “ensureConnected” method. – gnasher729 Aug 15 '19 at 20:20
  • 1
    @gnasher729 Well, given the quality of the rest of the code in my example, I think it'd likely be something like... `while (!(ping(DB_HOST)) { modem.getInstance().redial(); }` – Dewi Morgan Jul 14 '21 at 01:17
2

Pretty much depends on what your // Logic Here is.

If it's a one-liner, then probably you don't need a functional decomposition.

If, on the other hand, it's lines and lines of code, then it is much better to put it into a separate function and name it appropriately (f1,f2,f3 does not pass muster here).

This is all have to do with human brains on average are not very efficient with processing large amounts of data at a glance. In a sense, it does not matter what data: multi-line function, busy intersection, 1000-piece puzzle.

Be a friend of your code maintainer's brain, cut down on pieces in the entry point. Who knows, that code maintainer may even be you a few month later.

1

The "right" answer, according to the prevalent coding dogmas, is to split large functions into small, easy to read, testable and tested functions with self documenting names.

That said, defining "large" in terms of "lines of code" can seem arbitrary, dogmatic, and tedious, which can cause unnecessary disagreements, scruples, and tension. But, fear not! For, if we recognize that the primary purposes behind the lines-of-code limit are readability and testability, we can easily find the appropriate line-limit for any given function! (And start building the foundation for an internally relevant soft-limit.)

Agree as a team to allow the megalithic functions, and to extract lines into smaller, well-named functions at the first sign that the function is either difficult to read as a whole, or when subsets are uncertain in correctness.

... And if everyone is honest during the initial implementation, and if none of them are touting an IQ over 200, the limits of understandability and testability can often be identified before anyone else sees their code.

svidgen
  • 13,414
  • 2
  • 34
  • 60