8

I do programs for several years.
And now I know from my colleagues my pros and cons:
Pros: I can solve very complex problem
cons: I make overcomplicated solutions for simple tasks.

Now I'm trying to fix my cons and looking for generic principles and guidelines for this question:

When should I create an additional piece of code (a function, a class, a module, whatever). And when I should not.

I've seen an answer from some Python guy: whenever you can, don't create, make it as simple as possible.

Well this is a great advice, but having one big function isn't a good idea too.

The problem for me is that I have a principle when to create a separate function for sure, but I don't have an opposite one (when not create). The principle is:

Create a function when you'll have to copy-paste some code and as a result you'll have duplicated code.

(Why: duplicated code is bad, because you'll forget to fix some of copies when you fix a bug in it, you'll have to create times as many tests as you copy, and the is times more letters to read for your team and yourself later, etc...) This can sound as an example of what kind of answers I'm looking for.

So, how do you decide whenever to create a separate function or just extend an existing one?

Thanks everyone who already answered this question, but please add at least some principles when you should not split function (when you should better keep it as a whole).

Because for now most (all?) the answers here are about only when you should split. And my problem is that I make too many functions, I split too often, as coders tell me, - and this is why my question differs from this one Should I extract specific functionality into a function and why? .

Martin Spamer
  • 542
  • 2
  • 13
Yuri Yaryshev
  • 237
  • 2
  • 7
  • I like the principle of delegating responsibility to different methods/functions. A method should do one thing and it should do it well. When it comes to classes I sometimes need to look at the bigger picture and decipher what I'm trying to achieve. A class should in my opinion provide the possibility to resolve or process data of a specific functional area; e.g. authentication, database CRUD commands, object factory etc. –  Sep 01 '17 at 18:40
  • 2
    Possible duplicate of [Should I extract specific functionality into a function and why?](https://softwareengineering.stackexchange.com/questions/166884/should-i-extract-specific-functionality-into-a-function-and-why) – gnat Sep 02 '17 at 08:42
  • see also: [Rule of thumb for cost vs. savings for code re-use](https://softwareengineering.stackexchange.com/questions/127118/rule-of-thumb-for-cost-vs-savings-for-code-re-use) – gnat Sep 02 '17 at 08:42

9 Answers9

14

Seems like a pretty broad question, but I'll happily share my "wisdom" FWIW

  1. Portions of code that could be extracted to pure functions (output determined solely by inputs) should be extracted to pure functions. For example, if halfway through your function you discover that you need a string to be formatted a certain way, you can extract the string formatting logic to a string library. Even if you only use the function once, this will render the code more readable and reduce the complexity of your main function.

  2. If your function's name is ridiculous, maybe it needs to be split up, e.g. OpenFileParseContentsAndCreateObjects seems like it might be better off separated into three functions.

  3. If the function cannot be read and understood by an average developer within 1 minute, it probably could benefit from simplification or being broken into smaller bits.

  4. If the function requires ~20 or more unit tests in order to exercise 100% of the logic paths, it should be broken up.

  5. If the function is taller than a single viewable page in the IDE, unless there is a compelling reason, it should be considered for restructuring.

John Wu
  • 26,032
  • 10
  • 63
  • 84
  • I think OP is looking for an example of your "compelling reason". And I am too tbh. – Chris Wohlert Sep 13 '17 at 07:27
  • @ChrisWohlert I think, for example, on optimization as a "compelling reason". – Wikiti Sep 13 '17 at 09:54
  • @Wikiti, Sure, but there are others. I want them listed and explained. Reasons **not** to factor to a new function is the most important part of the question, according to OP. – Chris Wohlert Sep 13 '17 at 09:56
  • 1. Depending if the format is well-defined and named, as well as complex enough to matter. Often, either doesn't apply. 2. Reasonable. 3. Depends. Some algorithms are quite complex or domain-specific, and actually *need* commenting, even if only a pointer to external resources. 4-5. Acceptable as a rule-of-thumb. – Deduplicator Oct 08 '18 at 15:22
11

One way to tell if you should break a function up is when you see a pattern like this:

error_t getInfoFromFile(const char* filename)
{
    // Open the file
    ... 5 lines to open a file...

    // Check the magic number
    ... 3 lines read some bytes and make sure they match the magic number...

    // Read the header
    ... 5 lines to read the header

    // Find the offset of the info
    ... 10 lines to find the offset of the information you need

    // Read the info
    ... 5 lines to read the information you need

    // Reformat the info
    ... 10 lines to change the format of the information into whatever you need ...

    // etc.
}

Each of those comments probably indicates a separate function that could be written for the task at hand. You might not need all of them. Maybe you can leave the file opening code there bare, for example. But you'll probably have other code that needs to read the header and find an offset in the file, too, so those could be separate functions. Reading the info is what the function is supposed to do, so I'd leave that. Reformatting it should probably be a separate function, but it might be reasonable to call it from this function, depending on its scope. Basically, if there are multiple steps to perform for the task at hand, if those steps are more than a line or two, I'd put them into a function.

user1118321
  • 4,969
  • 1
  • 17
  • 25
  • I like the guideline: by introducing functions / methods / classes you create the language that you want to express your top-level solution in. And that closely matches the case described here. – Ralf Kleberhoff Sep 02 '17 at 12:57
6

Creating a function whenever you have some duplicated code is a bad idea. Most of that duplication is likely to be incidental, and subject to change, depending on how deep you look.

A better guideline is to create an abstraction (function, class, named constant, namespace, ...) when you have identified that something is a clearly defined building-block with a good name.

That good building-blocks being the stuff to build more things with, in addition to abstracting details away, reduces code duplication is just a happy side-effect.

Deduplicator
  • 8,591
  • 5
  • 31
  • 50
  • 1
    "Most of that duplication is likely to be incidental, and subject to change, depending on how deep you look." I disagree with that at the function level. I find that when I'm cutting and pasting within a function, it's almost *never* incidental duplication. There may be small changes between instances, but they can be encoded in the function I pull out, too. – user1118321 Sep 02 '17 at 03:31
  • 3
    @user1118321, Sometimes it is ok to have duplication code only because different instances of duplication have different reasons to change. – Fabio Sep 02 '17 at 05:17
  • The probability of copy-pasting code is greater than the one for identifying duplicated code and eliminate it, because it's the lazy option. – Adrian Iftode Sep 02 '17 at 07:13
  • 1
    Duplication is *never* incidental, unless someone incidentally happens to write the same lines of code the exact same way more than once. I wouldn't keep duplicate code because one version of it might change in the future. If that happens then you write a new method. Duplicating code because we *might* want one version to do something *later* is the opposite of reuse. Preventing needless duplication is exactly why 'building-blocks' exist. – Scott Hannen Sep 21 '17 at 18:53
  • @ScottHannen: It happens all the time, especially if specifications change. It's incidental that the gradient is 17%, and look there, VAT is too. The first might change due to a landscaping or road-building project, the latter needs a law. You really want to unify them? And yes, that does happen with more complex things than constants too, though the more complex the less likely. – Deduplicator Sep 24 '17 at 15:48
  • @Deduplicator - I wouldn't view two constants with the same value as code duplication. Duplication is when you have a number of lines of meaningful code that are repeated multiple times. If you have five lines of *identical* code sprinkled throughout multiple classes it's either copying-and-pasting (intentional duplication) or an astounding, unlikely coincidence. – Scott Hannen Sep 25 '17 at 15:05
  • @ScottHannen: As I said, the more complex the less likely is duplication incidental. And the more often it exists, the more likely at least two of the instances have the same meaning instead of only incidentally and possibly transitorally the same expression. – Deduplicator Sep 25 '17 at 15:21
  • @Deduplicator When we talk about code duplication, the "more complex" scenario (several or more lines of identical code) is really *all* we're talking about. `const SomeNumber = 1; const SomeOtherNumber = 1;` doesn't even count as code duplication. – Scott Hannen Sep 25 '17 at 15:42
2

Modules, classes functions are programmer's tools which we are using for building application.

Programmer should use tools freely without any restrictions for their amount. If during developing you end up with 100 functions and classes this mean that you decide you need them all - and it is ok.
Programmer by itself will never be able to answer a question is my solution overkill for this particular task.
That is why we have codereviews - where other programmers can see your approach and tell their opinion. Then putting your opinion and opinions of reviewers you end up with approach which satisfy your team.

Programming is all about context, only having full knowledge about concrete problem we can say which approach should be chosen and how much abstractions will be enough for this task.
But our problem is that in the beginning of the task we have less information about context of the problem(about domain of the problem) then in the end of the task. That is why usually after task is done, programmer will see/feel that it should be done in different way. Of course you can not rewrite your solution, because you need to start next task. Which lead after 1-2 years in building awful workarounds to make things done without massive rewriting of existing code.

That is why so important to write application which can be modified with less effort/cost.

Coupling, straight dependencies are few reasons of many which prevents efficient changing. You end up with situation where changes in one place will follow big amount of changes in others.

That is why very important to separate logic in different "blocks" which can be replaced, combined together with minimum effort.

SOLID principles can be very good guideline for programmer. Less programming experience you have more strictly you should follow this guideline.

In your particular case "Single responsibility principle" ("S" in SOLID) is your guideline - "class/function should have only one reason to change"

Useful approach for finding candidates for extracting to own function/class is writing code in "test first" approach (TDD or Test-Driven Development).

When you start writing tests for some function and end up with multiple different test cases, where test case configuration become complicated - you can put part of the logic behind abstractions which can be mocked up in the tests.

Fabio
  • 3,086
  • 1
  • 17
  • 25
2

In addition to the other answers, a few more indicators which suggest it might be a good idea to split a function/method:

  • Static code analysis tools warn about Cyclomatic Complexity
  • It violates the CQS Principle.
  • It contains logic which does not fit its name semantics - for example, a method called ReadFile which also executes a query against a SQL database.
  • It contains a switch with large or non-trivial case blocks.
  • It accepts boolean flag arguments which significantly alter its behaviour.
  • One or more of its arguments are only relevant in conditional nested branches or seem unrelated to the others - for example Save(File[] files, string sqlConnectionString, TextBoxControl directoryTextBox, IpAddress webServer).
Ben Cottrell
  • 11,656
  • 4
  • 30
  • 41
1

Apply the single responsibility principle to achieve a strong separation of concerns. This results in flexible and maintainable code.

Robert C. Martin expresses single responsibility principle as "A class should have only one reason to change". Do not think this only applies to OO code. Class should be considered as a meta-class type. The principle is applicable to any approach for code decomposition, modules, functions, methods, procedures could replace the word 'class' and the principle holds true.

Any piece of code should have the responsibility over a single part of the functionality. There is a corollary that the responsibility should be entirely encapsulated by the 'class' to minimise coupling. Any software service should be narrowly aligned with the responsibility.

In practice if you cannot describe the functionality in single short phrase then you should decompose the code. Divide the code on the word 'and'.

Given the requirement 'A Person is composed of a Name and an Age', would result in three separate pieces of code a Person, a Name and an Age.

This approach maximises reliability and maintainability while localising and minimising complexity, through a separation of concerns.

Martin Spamer
  • 542
  • 2
  • 13
1

If I understand correctly, the central question is when to modify an existing function and when to create a new one, when creating a new one might result in duplicated code.

First, the reason to create a new function is because you need a function that does something significantly different from what an existing function does. If the existing function serves its purpose, I wouldn't modify it to serve another purpose to avoid duplicating code.

But creating a new function does not mean that you must duplicate code. If your new function requires a chunk of code that's identical to what's in the existing function, you can extract that into its own function and call it from both the existing and new functions.

IOW, whether to modify an existing function and whether to duplicate code are usually unrelated questions.

If the duplicate code is required more than once in a given class than you can refactor it into a method. If it's required across classes then that's a good case for a new class. If a piece of code is a) long enough that you don't want to duplicate it and b) useful enough that you need it more than once, then it's likely worth unit testing.

Scott Hannen
  • 989
  • 1
  • 6
  • 13
0

In addition to what you already wrote about avoiding code duplication:

You should do it whenever you find a good name for the function or class. A good name tells the reader what the function or class does without having to look at the implementation.

Frank Puffer
  • 6,411
  • 5
  • 21
  • 38
-1

I agree with the other posters on this. I use the rule of 2. If I am doing something in 2 places, say within a class for example, that is when I turn that something into a method. Right away before I write another line of code.

  • 2
    So, did you write whole logic in one function in case when your code doesn't have duplications? ;) – Fabio Sep 02 '17 at 06:59