110

Moderator note
This question has already had seventeen answers posted to it. Before you post a new answer, please read the existing answers and make sure your viewpoint isn't already adequately covered.

I've been following some of the practices recommended in Robert Martin's "Clean Code" book, especially the ones that apply to the type of software I work with and the ones that make sense to me (I don't follow it as dogma).

One side effect I've noticed, however, is that the "clean" code I write, is more code than if I didn't follow some practices. The specific practices that lead to this are:

  • Encapsulating conditionals

So instead of

if(contact.email != null && contact.emails.contains('@')

I could write a small method like this

private Boolean isEmailValid(String email){...}
  • Replacing an inline comment with another private method, so that the method name describes itself rather than having an inline comment on top of it
  • A class should only have one reason to change

And a few others. The point being, that what could be a method of 30 lines, ends up being a class, because of the tiny methods that replace comments and encapsulate conditionals, etc. When you realize you have so many methods, then it "makes sense" to put all the functionality into one class, when really it should've been a method.

I'm aware that any practice taken to the extreme can be harmful.

The concrete question I'm looking an answer for is:

Is this an acceptable byproduct of writing clean code? If so, what are some arguments I can use to justify the fact that more LOC have been written?

The organization is not concerned specifically about more LOC, but more LOC can result in very big classes (that again, could be replaced with a long method without a bunch of use-once helper functions for readability sake).

When you see a class that is big enough, it gives the impression that the class is busy enough, and that its responsibility has been concluded. You could, therefore, end up creating more classes to achieve other pieces of functionality. The result is then a lot of classes, all doing "one thing" with the aid of many small helper methods.

THIS is the specific concern...those classes could be a single class that still achieves "one thing", without the aid of many small methods. It could be a single class with maybe 3 or 4 methods and some comments.

Robert Harvey
  • 198,589
  • 55
  • 464
  • 673
Pablo Gonzalez
  • 843
  • 2
  • 7
  • 8
  • 98
    If your organization uses *only* LOC as a metric for your code bases, then justifying clean code there is hopeless to begin with. – Kilian Foth Mar 18 '19 at 12:12
  • 2
    It's not the only metric, but it's somewhat important because we are a very small team supporting a relatively large and undocumented code base (that we inherited), so some developers/managers see value in writing less code to get things done so that we have less code to maintain. – Pablo Gonzalez Mar 18 '19 at 12:15
  • I agree with your sentiment, although I would also say you shouldn't worry too much about it for the most part. I typically only to split off conditionals into private methods if there's a chance they'll be re-used. – Graham Mar 18 '19 at 12:59
  • 1
    If you believe that the code is not really clean now, then perhaps you should just encapsulate a little less to make it cleaner. No need to overthink this. – Christian Hackl Mar 18 '19 at 13:06
  • 25
    If maintainability is your goal, LOC is *not* the best metric to judge - it's one of them, but there's far more to consider than simply keeping it short. – Zibbobz Mar 18 '19 at 13:59
  • 3
    Trying to make code as short and concise as possible is rarely a good goal, thus suggesting a major flaw in the DRY principal. Sometimes you do want to repeat yourself to make things easier to read and to change. Cramming everything into regex-like lines often makes the code harder to read and change in the future. – Mark Rogers Mar 18 '19 at 14:22
  • 3
    @KilianFoth Given that it's been demonstrated multiple times that every "more worthwhile" metric is strongly correlated with LOC count, an organization could certainly do worse than to adopt "keeping LOC per feature ratio low" as their only metric. – Mason Wheeler Mar 18 '19 at 14:45
  • 1
    This is not a _Clean Code_ (the book/style) issue in particular. Almost _all_ of the advice for improving code readability out there uses more space than short, [Mel-Style](http://catb.org/jargon/html/story-of-mel.html) hacker code - and pretty much nobody would call the latter more readable. – R. Schmitz Mar 18 '19 at 14:48
  • 30
    Is not an answer, but a point to be made: There is a whole subcommunity about writing code with as few lines/symbols as possible. https://codegolf.stackexchange.com/ One can argue that most of the answers there are not as readable as they could be. – Antitheos Mar 18 '19 at 14:56
  • 14
    Learn reasons behind every best practice not just rules themselves. Following rules without reasons is Cargo cult. Every single rule has a reason of its own. – Gherman Mar 18 '19 at 15:40
  • 11
    Just as an aside, and using your example, sometimes pushing things out to methods will make you think: "Maybe there is a library function that can do this". For example, to validate an email address, you can create a **System.Net.Mail.MailAddress** which will validate it for you. You can then (hopefully) trust the author of that library to get it right. This means your codebase will have more abstractions, and decrease in size. – Gregory Currie Mar 18 '19 at 16:18
  • 5
    @Antitheos isn't that a bit of a strawman? There's a big difference between Paul Graham Revenge of the Nerds compression and shortest in bytes wins. I don't think anybody ever makes the argument that you should golf production code. Plenty of people make the argument that all other things being equal smaller is better. – Jared Smith Mar 18 '19 at 17:46
  • 1
    @Mason It's amazing how well that works in practice, even if people always go back to waay more complex metrics - that are generally simply worse (and really hard to explain to people). – Voo Mar 18 '19 at 17:46
  • @KilianFoth can you please expand? Not sure I understood the point and/or example you provided. – Pablo Gonzalez Mar 18 '19 at 17:49
  • 5
    Compilers and hardware are good enough these days that for most applications and uses, you can side with the philosophy that you are writing code *for other people* to read, not for the computer itself to read. Thus - code golf ends up being a net loss because a human will spend more time trying to parse out what is happening vs the miniscule gains you earn in the hardware via that kind of optimization. Make it easy to read, like a book, and you'll save yourself and more importantly, other people working with your code, many headaches – NKCampbell Mar 18 '19 at 18:39
  • Related presentation: [Object-oriented programming is embarassing: 4 short examples](https://www.youtube.com/watch?v=IRTfhkiAqPw) – user253751 Mar 18 '19 at 20:36
  • I think you should go one step further, and create an `Email` class. Make a private constructor, and a static method that takes a string, validates it, and returns an `Optional`. As soon as you get a string into your system (from the db, network, UI, etc.), construct an `Email` instance out of it. From then on, all other code that has an `Email` object knows its valid, without even needing to check (because it's not even possible to create an invalid `Email` object. – Alexander Mar 18 '19 at 21:56
  • As a counter example to lines of code, show the executives a simple function written in BrainFuck, and a similar function written in your language + style of choice. Ask the executives which one is easier to read. Preface that by showing a slide of dense English, and the equivalent well formatted document with the line numbers down the side. Ask them the same question which is easier to read? You can do similar examples with named constants/magic numbers, gigantic function/small functions+classes, etc... – Kain0_0 Mar 19 '19 at 00:50
  • 7
    The amount of strawman arguments in this comment section is absolutely astonishing. As if inlining some tiny private function to balance things for readability had actually something to do with code golf or LOC metrics. FWIW, endless extraction of helper functions is almost as bad as the opposite, and as far as I am concerned, the only real flaw immediately apparent in `if(contact.email != null && contact.emails.contains('@')` without knowing more about the actual context is the silent swallowing of a null-pointer bug. – Christian Hackl Mar 19 '19 at 06:58
  • 2
    @ChristianHackl the email check is only a simplified example, that's not the code that I'm dealing with but was simply an example on how to encapsulate conditionals. – Pablo Gonzalez Mar 19 '19 at 09:03
  • 2
    Lines of code is a meaningless measure of the amount of coding work involved. I put three blank lines between every line of code in all my work, and since my boss pays me per line of code, I earn four times as much as I should. Kapish? – Reversed Engineer Mar 19 '19 at 09:27
  • 2
    In most programming languages, the LOC of any program can be reduced to around 1 by simply removing all the linebreaks. See for yourself how much more readable fewer lines of code are. – Erik Mar 19 '19 at 10:55
  • Any of the answers [here](https://codegolf.stackexchange.com/questions/tagged/code-golf) are fully functioning, short and concise code. You'd be hard pressed to call _any_ of them "clean". – Davidmh Mar 20 '19 at 14:34
  • Your example code seems wrong in two ways: you reference the data member both as `contact.email` and `contact.emails` (which is it?) as a data member; but then in your method signature, you reference `(String email)` as an argument, not a member. If `isEmailValid()` is a method not a utility function, it should take no argument. (or at least, overload it for 0-arg and 1-arg versions). Or did you mean it should be a classmethod? – smci Mar 20 '19 at 23:22
  • On thinking further, if you want to verify both the data member, and arbitrary array of string, you'd need both a method, and a utility classmethod(/Java static method), and you can't delegate from one to the other, IIUC. So, this is not so trivial after all. (Where do we stick the utility functions? Do we duplicate them? Do we have a mixin class? etc.) – smci Mar 21 '19 at 00:16
  • One strong argument for encapsulating this code as a function(/method) is to prevent ever getting an exception from checking `.contains('@')` on a null string. – smci Mar 21 '19 at 00:17
  • 1
    @smci: Suppressing the exception you get from checking `.contains('@')` on a null string is, typically, a very bad idea. – Christian Hackl Mar 21 '19 at 06:37
  • @ChristianHackl: no, not necessarily, it depends. If `contact.email` is a data member of an object which is not required to have been initialized, then no. If it was required to have been initialized, then yes. It depends on the semantics of your object. – smci Mar 21 '19 at 07:05
  • @smci: Of course it depends, but generally, an uninitialised string should not end up in a string-verification routine. It does not make sense to validate something that isn't there. Getting null where you expect an object is a bug. Using null to represent a legal "is not yet initialised" state in your business logic is apparently becoming bad practice even in Java, which has only recently introduced an option type. – Christian Hackl Mar 21 '19 at 07:12
  • @ChristianHackl: objects can have multiple data members, some can be optional, not all are initialized (and indeed some could have been initialized with an explicit null, anyway). At least in Python it's actually correct coding practice for uninitialized members to be set to `None`, hence the idiom: `if self.email and '@' in self.email...` If this question is about Java-specific coding idiom then it should state that clearly. – smci Mar 21 '19 at 07:18
  • 2
    Apparently you feel you have to (justify writing clean code). This smells like a lack of autonomy or some sort of micro management. Or are you anticipating criticism that may never come? – Martin Maat Mar 21 '19 at 16:13
  • 1
    @MartinMaat neither. It was a discussion that we had with other devs in the team and I thought of bringing it up to the wider community. – Pablo Gonzalez Mar 21 '19 at 16:53
  • 1
    CRDev, maybe a better title statement would have been *"What should the right code metrics be to promote writing clean code, given some clean-code practices increase LOC?"*. To avoid getting into the employee-management end. But then that title makes it more subjective. Aargh. – smci Mar 22 '19 at 00:09
  • Don't use `private Boolean isEmailValid(String email){...}`, as it allows a `null` return value that'll never make sense and only create problems. Use a primitive `boolean` for the return type. – Ralf Kleberhoff Mar 24 '19 at 13:12

13 Answers13

155

Yes, it's an acceptable byproduct, and the justification is that it is now structured such that you don't have to read most of the code most of the time. Instead of reading a 30-line function every single time you are making a change, you are reading a 5-line function to get the overall flow, and maybe a couple of the helper functions if your change touches that area. If your new "extra" class is called EmailValidator and you know your problem isn't with email validation, you can skip reading it altogether.

It's also easier to reuse smaller pieces, which tends to reduce your line count for your overall program. An EmailValidator can be used all over the place. Some lines of code that do email validation but are scrunched together with database access code can't be reused.

And then consider what needs to be done if the email validation rules ever need to be changed—which would you rather: one known location; or many locations, possibly missing a few?

Infiltrator
  • 103
  • 2
Karl Bielefeldt
  • 146,727
  • 38
  • 279
  • 479
  • 10
    far better answer then the tiring “unit testing solves all your problems” – Dirk Boer Mar 19 '19 at 13:36
  • 15
    This answer hits on a key point that Uncle Bob and friends always seem to miss - refactoring into small methods only helps if you don't have to go read all the small methods to figure out what your code is doing. Creating a separate class to validate email addresses is wise. Pulling the code `iterations < _maxIterations` into a method called `ShouldContinueToIterate` is *stupid*. – BJ Myers Mar 20 '19 at 04:52
  • 1
    @DirkBoer, it's always nice to have a safe harbour away from those pesky folk telling you that unit tests are useful, eh? :) – David Arno Mar 20 '19 at 08:05
  • 4
    @DavidArno: "to be useful" != "solves all your problems" – Christian Hackl Mar 20 '19 at 12:17
  • 1
    @ChristianHackl indeed, just as neither my answer, nor anyone else’s, makes the claim that “unit testing solves all your problems”. But some folk love a good strawman so I thought I’d supply one too. :) – David Arno Mar 20 '19 at 12:25
  • 2
    @DavidArno: When one complains about people implying that unit testing "solves all your problems", they obviously mean people who imply that unit testing solves, or at least contributes to the solution of, almost all problems in software engineering. I think nobody accuses anyone of suggesting unit testing as a way to end war, poverty and disease. Another way to put it is that the extreme overrating of unit testing in many answers, not only to this question but on SE in general, is being (rightfully) criticised. – Christian Hackl Mar 21 '19 at 06:31
  • 2
    Hi @DavidArno, my comment was clearly a hyperbole not a strawman ;) For me it's like this: I'm asking how to get my car fixed and religious people come by and tell me I should live a less sinful life. In theory something worth discussing, but it is not really helping me to get better in fixing cars. – Dirk Boer Mar 21 '19 at 09:25
  • 1
    @BJMyers I agree. There seems to be at least two fallacies behind such an extreme position (which this answer avoids taking): 1) if X has its uses, then all X all the time is optimal; 2) small functions (and objects) are easier to understand, so reducing the size of a function (or object) must make the code easier to understand. For one thing, it ignores the fact that complexity also exists in the interaction between elements of code. – sdenham Mar 22 '19 at 14:25
132

... we are a very small team supporting a relatively large and undocumented code base (that we inherited), so some developers/managers see value in writing less code to get things done so that we have less code to maintain

These folk have correctly identified something: they want the code to be easier to maintain. Where they've gone wrong though is assuming that the less code there is, the easier it is to maintain.

For code to be easy to maintain, then it needs to be easy to change. By far the easiest way to achieve easy-to-change code is to have a full set of automated tests for it that will fail if your change is a breaking one. Tests are code, so writing those tests is going to swell your code base. And that is a good thing.

Secondly, in order to work out what needs changing, you code needs to be both easy to read and easy to reason about. Very terse code, shrunk in size just to keep the line count down is very unlikely to be easy to read. There's obviously a compromise to be struck as longer code will take longer to read. But if it's quicker to understand, then it's worth it. If it doesn't offer that benefit, then that verbosity stops being a benefit. But if longer code improves readability then again this is a good thing.

David Arno
  • 38,972
  • 9
  • 88
  • 121
  • 28
    "By far the easiest way to achieve easy-to-change code is to have a full set of automated tests for it that will fail if your change is a breaking one." This is simply not true. Tests require additional work for _every_ behavioural change because the _tests also need changing_, this is by design and many would argue makes the change safer, but it also necessarily makes changes harder. – Jack Aidley Mar 18 '19 at 12:48
  • 63
    Sure, but the time lost in maintaining these tests is dwarfed by the time you *would have lost* diagnosing and fixing bugs that the tests prevent. – MetaFight Mar 18 '19 at 12:50
  • 29
    @JackAidley, having to change the tests along with the code might give the appearance of more work, but only if one ignores the hard-to-find bugs that changes to untested code will introduce and that often won't be found until after shipping. The latter merely offers the illusion of less work. – David Arno Mar 18 '19 at 13:00
  • 8
    I'd add to this answer that the "more code" version proposed also makes it easier to test because the logic is now isolated in a single location that you can write tests for, i.e. test the `isEmailValid` method itself rather than indirectly by testing unrelated code that happens to implement some tests for validity. And if you ever change the rules for validity... – Colin Young Mar 18 '19 at 13:06
  • 31
    @JackAidley, I completely disagree with you. Tests make code easier to change. I'll concede though that badly designed code that's too tightly coupled and thus coupled tightly to tests can be hard to change, but well structured, well tested code is simple to change in my experience. – David Arno Mar 18 '19 at 13:29
  • 5
    @JackAidley I thing it is implicit that we are talking about changes that do not introduce new bugs. – Stop harming Monica Mar 18 '19 at 13:32
  • 22
    @JackAidley You can refactor a lot without changing any API or interface. It means you can go crazy while modifying the code without having to change a single line in unit or functional tests. That is, if your tests don't test a specific implementation. – Eric Duminil Mar 18 '19 at 13:53
  • 1
    @JackAidley: Okay. But in that case, using NetBeans/Eclipse/IntelliJ refactor tools can make it very easy. – Eric Duminil Mar 18 '19 at 14:02
  • 4
    *Integration* testing is mostly what you want. – pjc50 Mar 18 '19 at 15:28
  • 3
    It's worth noting that Paul Graham has [famously argued](http://www.paulgraham.com/power.html) that using more lines in your programming leads to more bugs, and therefore you should write all your programs using as few lines as possible. The broader software engineering community disagrees with him, but he's well-known and oft-read (and that essay has been around a long time), and so there's a non-trivial number of people out there who have followed his lead on that front. – Xiong Chiamiov Mar 18 '19 at 16:14
  • 4
    Just a note. Tests make safer to change existing code. Whether the process of changing code is easy or not depends on many things – Laiv Mar 18 '19 at 16:47
  • 4
    @JackAidley The fallacy here is that a behavioral change does not, in practice, require all the tests to be rewritten - not even close. What you gain in reusing tests vastly exceeds the cost of updating those tests that need it, even if it is more costly to update a test script than it is to run it manually each time you need to, which is not certain (the alternative is to not do that test at all, but that is ultimately the most costly option of all.) And all this ignores the biggest benefit of reusing automated tests, which is they *they have already been tested* (yes, tests can have errors!) – sdenham Mar 18 '19 at 20:55
  • 5
    @JackAidley Then you are making a pedantic and pointless claim, because in practice, it is only true if you don't care if the code works. Furthermore, that is not all you said - you said David's claim "by far the easiest way to achieve easy-to-change code is to have a full set of automated tests for it that will fail if your change is a breaking one" was false on account of it - but code with good tests is easier to change than that same code without them, if you care that the end result works. You might as well say that the David's goal is most easily reached by not writing anything. – sdenham Mar 18 '19 at 21:48
  • 2
    @JackAidley: Your observations match my experience exactly. Tests are not a free lunch, and you adapt them to new business requirements or API changes on a regular basis. I do think, however, that "easy-to-change code" does not necessarily imply "fast-to-change code". You could say that code is easier to change when the additional work imposed by the test adaptions forces you to take your time. – Christian Hackl Mar 19 '19 at 07:22
  • 1
    @JackAidley It is not pretending than those costs do not exist but akcnowledging that the costs of having more bugs that are harder to find, debug and fix also exist. – Stop harming Monica Mar 19 '19 at 10:15
  • 2
    In support of David Arno's position on the value of test code, I would add that all readers should take the **situational** clue that OP's team **inherited** a **large code base** with plenty of **existing** functionality. Given these clues, one could infer that the team also inherited a huge **software requirements list**, and that list isn't going to change haphazardly. Tests that ensure basic functionality aren't broken (automated, behavioral, regression, tests) are not going to be rendered useless anytime soon. Unless if the whole software system were abandoned. – rwong Mar 19 '19 at 12:05
  • 2
    If "a full set of tests" includes complete module tests which whitebox-test a specific implementation they will *not* make code changes easier; on the contrary, it's easy to end up in a lock-in situation where the work needed to adapt the module tests is a disincentive to code changes. Of course you'd be very sure that a given change is correct, after adapting the module tests; but that's a different statement than "its easy". (Often correctness could be shown by integration tests; and they shouldn't change much with code changes, so the lock-in is weak.) – Peter - Reinstate Monica Mar 19 '19 at 12:51
  • 2
    "Easier" is the contentious word here. I find it *easier* to make correct changes when tests break. – Leonardo Herrera Mar 19 '19 at 12:57
  • 1
    @JackAidley The specific claim I was replying to was indeed pointless and pedantic, and I am glad to see that you have abandoned it, but now you have hung your argument on the non-sequitur that released code is not bug-free, as if Dave's point depended on it being so. You have also changed 'test' to 'unit test', but Dave's claim is not predicated on that, and, in fact, it is more true for system-functionality tests... – sdenham Mar 19 '19 at 13:25
  • 1
    @JackAidley ...The point that you have avoided answering is that having a suite of scripted tests for a system beats running tests manually over the long term, for any effective level of testing, and the benefits accumulate the longer it goes on. This is so even if some older tests have to be modified to accommodate changed functionality, as the cost is amortized over the number of times they are run. – sdenham Mar 19 '19 at 13:26
  • 1
    @sdenham: this has descended into a pointless extended discussion. Goodbye. – Jack Aidley Mar 19 '19 at 13:56
  • @JackAidley There are badly written tests that really are nothing but "change detectors." If all a test does it detect that any change occurred, even if that change is not likely to introduce a bug, then it may be bad. For example, testing for exact error messages is not good. Consider exactly matching `"Could not start server, no value in environment variable PORT."` Then someone comes along and rewrites that in the real code, and BOOM the test is broken. A better test would just look for the error condition + wanting the output to contain `PORT` and `environment` (for example). – ErikE Mar 20 '19 at 01:05
34

Bill Gates was famously attributed to saying, "Measuring programming progress by lines of code is like measuring aircraft building progress by weight."

I humbly agree with this sentiment. This is to not say that a program should strive for more or less lines of code, but that this isn't ultimately what counts to create a functioning and working program. It helps to remember that ultimately the reason behind adding extra lines of code is that it is theoretically more readable that way.

Disagreements can be had on whether a specific change is more or less readable, but I don't think you'd be wrong to make a change to your program because you think by doing so you're making it more readable. For instance making an isEmailValid could be thought to be superfluous and unnecessary, especially if it is being called exactly once by the class which defines it. However I would much rather see an isEmailValid in a condition than a string of ANDed conditions whereby I must determine what each individual condition checks and why it is being checked.

Where you get into trouble is when you create a isEmailValid method which has side effects or checks things other than the e-mail, because this is worse than simply writing it all out. It is worse because it is misleading and I may miss a bug because of it.

Though clearly you're not doing that in this case, so I would encourage you to continue as you're doing. You should always ask yourself if by making the change, it is easier to read, and if that is your case, then do it!

Neil
  • 22,670
  • 45
  • 76
  • 1
    Aircraft weight is an important metric, though. And during the design the expected weight is monitored closely. Not as a sign of progress, but as a constraint. Monitoring lines of code suggest more is better, while in aircraft design less weight is better. So I think mister Gates could have chosen a better illustration for his point. – jos Mar 18 '19 at 14:05
  • 21
    @jos on the particular team OP is working with, it appears fewer LOC is deemed 'better'. The point that Bill Gates was making is that LOC is _not related_ to progress in any meaningful way, just like in aircraft construction weight is not related to progress in a meaningful way. An aircraft under construction may have 95% of its final weight relatively quickly, but it would just be an empty shell with no control systems it, it is not 95% complete. Same in software, if a program has 100k lines of code, that doesn't mean each 1000 lines provides 1% of the functionality. – Mr.Mindor Mar 18 '19 at 14:51
  • 7
    Progress monitoring is a tough job, isn't it? Poor managers. – jos Mar 18 '19 at 15:45
  • @jos: in code also it is better to have fewer lines for the same functionality, if all else is equal. – RemcoGerlich Mar 20 '19 at 12:49
  • @jos Read carefully. Gates doesn't say anything about whether weight is an important measure for an aircraft itself. He says that weight is an awful measure for *progress* of building an airplane. After all, by that measure as soon as you have the whole hull thrown on the ground you're basically done since that presumably amounts for 9x% of the whole plane's weight. – Voo Mar 21 '19 at 09:04
24

so some developers/managers see value in writing less code to get things done so that we have less code to maintain

This is a matter of losing sight on the actual goal.

What matters is lowering hours spent on development. That is measured in time (or equivalent effort), not in lines of code.
This is like saying that car manufacturers should build their cars with less screws, because it takes a non-zero amount of time to put each screw in. While that is pedantically correct, a car's market value is not defined by how many screws it does or doesn't have. Above all else, a car needs to be performant, safe, and easy to maintain.

The rest of the answer are examples of how clean code can lead to time gains.


Logging

Take an application (A) which has no logging. Now create application B, which is the same application A but with logging. B will always have more lines of code, and thus you need to write more code.

But a lot of time will sink into investigating issues and bugs, and figuring out what went wrong.

For application A, developers will be stuck reading the code, and having to continually reproduce the problem and step through the code to find the source of the issue. This means that the developer has to test from the beginning of the execution to the end, in every used layer, and needs to observe every used piece of logic.
Maybe he is lucky to find it immediately, but maybe the answer is going to be in the last place he thinks of looking.

For application B, assuming perfect logging, a developer observes the logs, can immediately identify the faulty component, and now knows where to look.

This can be a matter of minutes, hours or days saved; depending on the size and complexity of the codebase.


Regressions

Take application A, which is not DRY-friendly at all.
Take application B, which is DRY, but ended up needing more lines because of the additional abstractions.

A change request is filed, which requires a change to the logic.

For application B, the developer changes the (unique, shared) logic according to the change request.

For application A, the developer has to change all instances of this logic where he remembers it being used.

  • If he manages to remember all instances, he'll still have to implement the same change several times.
  • If he does not manage to remember all instances, you're now dealing with an inconsistent codebase that contradicts itself. If the developer forgot a rarely used piece of code, this bug may not become apparent to the end users until well into the future. At that time, are the end users going to identify what the source of the issue is? Even if so, the developer may not remember what the change entailed, and will have to figure out how to change this forgotten piece of logic. Maybe the developer doesn't even work at the company by then, and then someone else now has to figure it all out from scratch.

This can lead to enormous time wastage. Not just in development, but in hunting and finding the bug. The application can start behaving erratically in a way that developers cannot easily comprehend. And that will lead to lengthy debugging sessions.


Developer interchangeability

Developer A created application A. The code is not clean nor readable, but it works like a charm and has been running in production. Unsurprisingly, there is no documentation either.

Developer A is absent for a month due to holidays. An emergency change request is filed. It can't wait another three weeks for Dev A to return.

Developer B has to execute this change. He now needs to read the entire codebase, understand how everything works, why it works, and what it tries to accomplish. This takes ages, but let's say he can do it in three weeks' time.

At the same time, application B (which dev B created) has an emergency. Dev B is occupied, but Dev C is available, even though he doesn't know the codebase. What do we do?

  • If we keep B working on A, and put C to work on B, then we have two developers who don't know what they're doing, and the work is bering performed suboptimally.
  • If we pull B away from A and have him do B, and we now put C on A, then all of developer B's work (or a significant portion of it) may end up being discarded. This is potentially days/weeks of effort wasted.

Dev A comes back from his holiday, and sees that B did not understand the code, and thus implemented it badly. It's not B's fault, because he used all available resources, the source code just wasn't adequately readable. Does A now have to spend time fixing the readability of the code?


All of these problems, and many more, end up wasting time. Yes, in the short term, clean code requires more effort now, but it will end up paying dividends in the future when inevitable bugs/changes need to be addressed.

Management needs to understand that a short task now will save you several long tasks in the future. Failing to plan is planning to fail.

If so, what are some arguments I can use to justify the fact that more LOC have been written?

My goto explanation is asking management what they would prefer: an application with a 100KLOC codebase that can be developed in three months, or a 50KLOC codebase that can be developed in six months.

They will obviously pick the shorter development time, because management doesn't care about KLOC. Managers who focus on KLOC are micromanaging while being uninformed about what they're trying to manage.

Flater
  • 44,596
  • 8
  • 88
  • 122
23

I think you should be very careful about applying "clean code" practices in case they lead to more overall complexity. Premature refactoring is the root of many bad things.

Extracting a conditional to a function leads to simpler code at the point where the conditional was extracted from, but leads to more overall complexity because you now have a function which is visible from more points in the program. You add a slight complexity burden to all other functions where this new function is now visible.

I'm not saying you shouldn't extract the conditional, just that you should consider carefully if you need to.

  • If you want to specifically test the e-mail validation logic. Then you need to extract that logic to a separate function - probably even class.
  • If the same logic is used from multiple places in the code then you obviously have to extract it to a single function. Don't repeat yourself!
  • If the logic is obviously a separate responsibility, e.g. the email validation happen in the middle of a sorting algorithm. The email validation will change independent of the sorting algorithm, so they should be in separate classes.

In all of the above the is a reason for the extraction beyond it just being "clean code". Furthermore, you probably wouldn't even be in doubt if it was the right thing to do.

I'd say, if in doubt, always choose the simplest and most straightforward code.

JacquesB
  • 57,310
  • 21
  • 127
  • 176
  • 7
    I have to agree, turning every conditional into a validation method can introduce more unwanted complexity when it comes to maintenance and code reviews. You now have to switch back and forth in the code just to make sure your conditional methods are right. And what happens when you have different conditions for the same value? Now you might have a naming nightmare with several small methods that only get called once and look mostly the same. – pboss3010 Mar 19 '19 at 11:44
  • 7
    Easily the best answer here. Especially the observation (in the third paragraph) that complexity is not simply a property of the entire code as a whole but something that exists, and differs, on multiple abstraction levels simultaneously. – Christian Hackl Mar 19 '19 at 12:17
  • 2
    I think one way to put this is that, in general, extracting a condition should be done *only* if there's a meaningful, non-obfuscated name for that condition. This is a necessary but not sufficient condition. – JimmyJames Mar 19 '19 at 15:22
  • Re *"...because you now have a function which is visible from more points in the program"*: in [Pascal](https://en.wikipedia.org/wiki/Pascal_(programming_language)) it is possible to have local functions - *["...Each procedure or function can have its own declarations of goto labels, constants, types, variables, and other procedures and functions, ..."](https://en.wikipedia.org/wiki/Pascal_(programming_language)#Procedures_and_functions)* – Peter Mortensen Mar 20 '19 at 01:51
  • 2
    @PeterMortensen: It is also possible in C# and JavaScript. And that is great! But the point remains, a function, even a local function, is visible in a larger scope than an inline code fragment. – JacquesB Mar 20 '19 at 06:42
  • [Refactoring rule of three](https://en.wikipedia.org/wiki/Rule_of_three_(computer_programming)): *"The first time you do something, you just do it. The second time you do something similar, you wince at the duplication, but you do the duplicate thing anyway. The third time you do something similar, you refactor."* – ahiijny Mar 20 '19 at 17:39
9

I'd point out there is nothing inherently wrong with this:

if(contact.email != null && contact.email.contains('@')

At least assuming it's used this one time.

I could have problems with this very easily:

private Boolean isEmailValid(String email){
   return email != null && email.contains('@');
}

A few things I'd watch for:

  1. Why is it private? It looks like a potentially useful stub. Is it useful enough to be a private method and no chance of it being used more widely?
  2. I wouldn't name the method IsValidEmail personally, possibly ContainsAtSign or LooksVaguelyLikeEmailAddress because it does almost no real validation, which is maybe good, maybe not what is exected.
  3. Is it being used more than once?

If it is being used once, is simple to parse, and takes less than one line I would second guess the decision. It probably isn't something I'd call out if it wasn't a particular problem from a team.

On the other hand I have seen methods do something like this:

if (contact.email != null && contact.email.contains('@')) { ... }
else if (contact.email != null && contact.email.contains('@') && contact.email.contains("@mydomain.com")) { //headquarters email }
else if (contact.email != null && contact.email.contains('@') && (contact.email.contains("@news.mydomain.com") || contact.email.contains("@design.mydomain.com") ) { //internal contract teams }

That example is obviously not DRY.

Or even just that last statement can give another example:

if (contact.email != null && contact.email.contains('@') && (contact.email.contains("@news.mydomain.com") || contact.email.contains("@design.mydomain.com") )

The goal should be to make the code more readable:

if (LooksSortaLikeAnEmail(contact.Email)) { ... }
else if (LooksLikeFromHeadquarters(contact.Email)) { ... }
else if (LooksLikeInternalEmail(contact.Email)) { ... }

Another scenario:

You might have a method like:

public void SaveContact(Contact contact){
   if (contact.email != null && contact.email.contains('@'))
   {
       contacts.Add(contact);
       contacts.Save();
   }
}

If this fits your business logic and isn't reused there is not a problem here.

But when someone asks "Why is '@' saved, because that's not right!" and you decide to add actual validation of some sort, then extract it!

You'll be glad you did when you also need to account for the presidents second email account Pr3$sid3nt@h0m3!@mydomain.com and decide to just go all out and try and support RFC 2822.

On readability:

// If there is an email property and it contains an @ sign then process
if (contact.email != null && contact.email.contains('@'))

If your code is this clear, you don't need comments here. In fact, you don't need comments to say what the code is doing most the time, but rather why it's doing it:

// The UI passes '@' by default, the DBA's made this column non-nullable but 
// marketing is currently more concerned with other fields and '@' default is OK
if (contact.email != null && contact.email.contains('@'))

Whether the comments above an if statement or inside a tiny method is to me, pedantic. I might even argue the opposite of useful with good comments inside another method because now you would have to navigate to another method to see how and why it does what it does.

In summary: Don't measure these things; Focus on the principles that the text was built from (DRY, SOLID, KISS).

// A valid class that does nothing
public class Nothing 
{

}
ErikE
  • 1,151
  • 1
  • 8
  • 18
AthomSfere
  • 341
  • 1
  • 8
  • 4
    `Whether the comments above an if statement or inside a tiny method is to me, pedantic.` This is a "straw that broke the camel's back" problem. You're right that this one thing isn't particularly hard to read outright. But if you have a big method (e.g. a large import) which has dozens of these small evaluations, having these encapsulated in readable method names (`IsUserActive`, `GetAverageIncome`, `MustBeDeleted`, ...) will become a noticeable improvement on reading the code. The problem with the example is that it only observes one straw, not the entire bundle that breaks the camel's back. – Flater Mar 19 '19 at 08:38
  • @Flater and I do hope this is the spirit the reader takes from that. – AthomSfere Mar 19 '19 at 12:02
  • 1
    This "encapsulation" is an anti-pattern, and the answer actually demonstrates this. We come back to read the code for purposes of debugging and for purposes of extending the code. In both cases understanding what the code actually does is critical. The code block starting `if (contact.email != null && contact.email.contains('@'))` is buggy. If the if is false, none of the else if lines can be true. This is not at all visible in the `LooksSortaLikeAnEmail` block. A function containing a single line of code is not much better than a comment explaining how the line works. – Quirk Mar 19 '19 at 13:06
  • 1
    At best, another layer of indirection obscures the actual mechanics and makes debugging harder. At worst, the function name has become a lie in the same way comments become lies - the contents are updated but the name is not. This is not a strike against encapsulation in general, but this particular idiom is symptomatic of the great modern issue with "enterprise" software engineering - layers and layers of abstraction and glue burying the relevant logic. – Quirk Mar 19 '19 at 13:10
  • @quirk I think you're agreeing with my overall point? And with the glue, your going down a whole different problem. I actually use code maps when looking at a new teams code. It's scary bad what I've seen done for some large methods calling a series of large methods even at the mvc pattern level. – AthomSfere Mar 19 '19 at 13:13
  • @AthomSfere: I'm agreeing with your larger point, disagreeing quite strongly with Flater, and having something of an issue with the example that purports to show a place where wrapping a single line in a function is valid. Wrapping a single line in a function designed to be used once only is symptomatic of a codebase which is deteriorating into a swamp of glue. Sometimes it makes sense to decompose a complex evaluation into named variables, but in that case locality is preserved. – Quirk Mar 19 '19 at 14:37
  • @Quirk I'd agree 100% then. If the only reason it's split out is simply to be split out ... Your at best planning for further requirements that may never come and wasting cognitive load. – AthomSfere Mar 19 '19 at 14:47
  • @Quirk Comments are not for answers or opinions. If you want to provide your own answer, write it as an answer; there's very little quality control over comments, and they get buried (or deleted) easily. I'm sure you've already written more words in the comments than would be necessary for a good answer :) – Luaan Mar 20 '19 at 08:51
6

Considering the fact that the "is email valid" condition you currently have would accept the very much invalid email address "@", I think you have every reason to abstract out an EmailValidator class. Even better, use a good, well-tested library to validate email addresses.

Lines of code as a metric is meaningless. The important questions in software engineering are not:

  • Do you have too much code?
  • Do you have too little code?

The important questions are:

  • Is the application as a whole designed correctly?
  • Is the code implemented correctly?
  • Is the code maintainable?
  • Is the code testable?
  • Is the code adequately tested?

I've never given a thought to LoC when writing code for any purpose but Code Golf. I have asked myself "Could I write this more succinctly?", but for purposes of readability, maintainability, and efficiency, not simply length.

Sure, maybe I could use a long chain of boolean operations instead of a utility method, but should I?

Your question actually makes me think back on some long chains of booleans I've written and realize I probably should have written one or more utility method(s) instead.

Clement Cherlin
  • 1,243
  • 9
  • 13
6

Clean Code is an excellent book, and well worth reading, but it is not the final authority on such matters.

Breaking code down into logical functions is usually a good idea, but few programmers do it to the extent that Martin does - at some point you get diminishing returns from turning everything into functions and it can get hard to follow when all the code is in tiny pieces.

One option when it's not worth creating a whole new function is to simply use an intermediate variable:

boolean isEmailValid = (contact.email != null && contact.emails.contains('@');

if (isEmailValid) {
...

This helps keep the code easy to follow without having to jump around the file a lot.

Another issue is that Clean Code is getting quite old as a book now. A lot of software engineering has moved in the direction of functional programming, whereas Martin goes out of his way to add state to things and create objects. I suspect he would have written quite a different book if he'd written it today.

Rich Smith
  • 241
  • 1
  • 4
  • Some are concerned about the extra line of code near the condition (I am not, at all), but perhaps address that in your answer. – Peter Mortensen Mar 19 '19 at 11:11
3

On one level, they are right - less code is better. Another answer quoted Gate, I prefer:

“If debugging is the process of removing software bugs, then programming must be the process of putting them in.” – Edsger Dijkstra

“When debugging, novices insert corrective code; experts remove defective code.” – Richard Pattis

The cheapest, fastest, and most reliable components are those that aren’t there. - Gordon Bell

In short, the less code you have, the less can go wrong. If something isn't necessary, then cut it.
If there is over-complicated code, then simplify it until the actual functional elements are all that remain.

What is important here, is that these all refer to functionality, and only having the minimum required to do it. It doesn't say anything about how that is expressed.

What what you are doing by attempting to have clean code isn't against the above. You are adding to your LOC but not adding unused functionality.

The end goal is to have readable code but no superfluous extras. The two principles should not act against each other.

A metaphor would be building a car. The functional part of the code is the chassis, engine, wheels... what makes the car run. How you break that up is more like the suspension, power steering and so on, it makes it easier to handle. You want your mechanics as simple as possible while still performing their job, to minimise the chance of things going wrong, but that doesn't prevent you from having nice seats.

Baldrickk
  • 714
  • 5
  • 12
2

There's lots of wisdom in the existing answers, but I'd like to add in one more factor: the language.

Some languages take more code than others to get the same effect.  In particular, while Java (which I suspect is the language in the question) is extremely well-known and generally very solid and clear and straightforward, some more modern languages are much more concise and expressive.

For example, in Java it could easily take 50 lines to write a new class with three properties, each with a getter and setter, and one or more constructors — while you can accomplish exactly the same in a single line of Kotlin* or Scala.  (Even greater saving if you also wanted suitable equals(), hashCode(), and toString() methods.)

The result is that in Java, the extra work means you're more likely to reuse a general object that doesn't really fit, to squeeze properties into existing objects, or to pass a bunch of ‘bare’ properties around individually; while in a concise, expressive language, you're more likely to write better code.

(This highlights the difference between the ‘surface’ complexity of the code, and the complexity of the ideas/models/processing it implements.  Lines of code isn't a bad measure of the first, but has much less to do with the second.)

So the ‘cost’ of doing things right depends on the language.  Perhaps one sign of a good language is one that doesn't make you choose between doing things well, and doing them simply!

(* This isn't really the place for a plug, but Kotlin is well worth a look IMHO.)

gidds
  • 789
  • 3
  • 8
1

Let's assume that you are working with class Contact currently. The fact that you are writing another method for validation of email address is evidence of the fact that the class Contact is not handling a single responsibility.

It is also handling some email responsibility, which ideally, should be its own class.


Further proof that your code is a fusion of Contact and Email class is that you will not be able to test email validation code easily. It will require a lot of maneuverings to reach email validation code in a big method with the right values. See the method viz below.

private void LargeMethod() {
    //A lot of code which modifies a lot of values. You do all sorts of tricks here.
    //Code.
    //Code..
    //Code...

    //Email validation code becoming very difficult to test as it will be difficult to ensure 
    //that you have the right data till you reach here in the method
    ValidateEmail();

    //Another whole lot of code that modifies all sorts of values.
    //Extra work to preserve the result of ValidateEmail() for your asserts later.
}

On the other hand, if you had a separate Email class with a method for email validation, then to unit test your validation code you would just make one simple call to Email.Validation() with your test data.


Bonus Content: MFeather's talk about the deep synergy between testability and good design.

displayName
  • 137
  • 3
1

Reduction in LOC has been found to be correlated with reduced defects, nothing else. Assuming then that anytime you reduce LOC, you have reduced the chance of defects is essentially falling into the trap of believing that correlation equals causation. The reduced LOC is a result of good development practices and not what makes code good.

In my experience, people who can solve a problem with less code (at a macro level) tend to be more skilled than those who write more code to do the same thing. What these skilled developers do to reduce the lines of code is use/create abstractions and reusable solutions to solve common problems. They don't spend time counting lines of code and agonizing over whether they can cut a line here or there. Often the code they do write is more verbose than is necessary, they just write less of it.

Let me give you an example. I've had to deal with logic around time periods and how they overlap, whether they are adjacent, and what gaps exist between them. When I first started working on these problems, I would have blocks of code doing the calculations everywhere. Eventually, I built classes to represent the time periods and operations that calculated overlaps, complements etc. This immediately removed big swaths of code and turned them into a few method calls. But those classes themselves were not written tersely at all.

Plainly stated: if you are trying to reduce LOC by trying to cut a line of code here or there with more terse, you are doing it wrong. It's like trying to lose weight by reducing the amount of vegetables you eat. Write code that is easy to understand, maintain, and debug and reduce LOC through reuse and abstraction.

JimmyJames
  • 24,682
  • 2
  • 50
  • 92
1

You have identified a valid trade-off

So there is indeed a trade-off here and it is inherent to abstraction as a whole. Whenever anyone tries to pull N lines of code into its own function in order to name it and isolate it, they simultaneously make reading the calling site easier (by referring to a name rather than to all of the gory details that underlie that name) and more complex (you now have meaning that is entangled across two different parts of the codebase). "Easy" is the opposite of "hard," but it is not a synonym for "simple" which is the opposite of "complex." The two are not opposites, and abstraction always increases complexity in order to insert some form or another of ease.

We can see the added complexity directly when some change in the business requirements makes the abstraction start to leak. Maybe some new logic would have gone most naturally in the middle of the pre-abstracted code, say for instance if the abstracted code traverses some tree and you'd really like to collect (and perhaps act on) some sort of information while you are traversing the tree. Meanwhile, if you have abstracted this code, then there may be other call sites, and adding the required logic into the middle of the method might break those other call sites. See, whenever we change a line of code we only need to look at the immediate context of that line of code; when we change a method we must Cmd-F our entire source code looking for anything which may break as a result of changing the contract of that method, or hope that tests catch the breakage for us.

The greedy algorithm can fail in these cases

The complexity has also made the code in a certain sense less readable rather than more. In a prior job I dealt with an HTTP API which was very carefully and precisely structured into several layers, every endpoint is specified by a controller which validates the shape of the incoming message and then hands it to some "business-logic-layer" manager, which then made some request of some "data layer" which was responsible for making several queries to some "data access object" layer, which was responsible for creating several SQL Delegates which would actually answer your question. The first thing I can say about it was, something like 90% of the code was copy-and-paste boilerplate, in other words it was no-ops. So in many cases reading any given passage of code was very "easy", because "oh this manager just forwards the request to that data access object." But the fact that you had to jump between all of these different layers and read through a bunch of boilerplate to figure out if anything was customized to this particular case, meant that you were doing a lot of context-switching and finding files and trying to track information that you should never have been tracking, "this is called X at this layer, it becomes called X' at this other layer, then it is called X'' in this other other layer."

I think when I left off, this simple CRUD API was at the stage where if you printed it at 30 lines per page, it would take up 10-20 five-hundred-page textbooks on a shelf: it was a whole encyclopedia of repetitive code. In terms of essential complexity, I am not sure that there was even half of a textbook of essential complexity in there; we only had maybe 5-6 database diagrams to handle it. Making any slight change to it was a mammoth undertaking, learning it was a mammoth undertaking, adding new functionality got to be so painful that we actually had boilerplate template files that we would use to add new functionality.

So I have seen first-hand how making each part very readable and obvious can make the whole very unreadable and non-obvious. This means that the greedy algorithm can fail. You know the greedy algorithm, yes? "I am going to do whatever step locally improves the situation the most, and then I will trust that I find myself in a globally improved situation." It is often a beautiful first attempt but it can also miss in complex contexts. For example in manufacturing you might try to increase the efficiency of every particular step in a complex manufacturing process -- do bigger batches, yell at people on the floor who appear to not be doing anything to get their hands busy with something else -- and this can often destroy the global efficiency of the system.

Best practice: use DRY and lengths to make the call

(Note: This section title is somewhat of a joke; I often tell my friends that when someone is saying "we should do X because best practices say so" they are 90% of the time not talking about something like SQL injection or password hashing or whatever -- unilateral best practices -- and so the statement can be translated in that 90% of the time to "we should do X because I say so." Like they may have some blog article from some business which did a better job with X rather than X' but there is generally no guarantee that your business resembles that business, and there is generally some other article from some other business which did a better job with X' rather than X. So please do not take the title too seriously.)

What I would recommend is based on a talk by Jack Diederich called Stop Writing Classes (youtube.com). He makes several great points in that talk: for instance, that you can know a class is really just a function when it only has two public methods, and one of them is the constructor/initializer. But in one case he is talking about how a hypothetical library that he has string-replaced for the talk as "Muffin" declared its own class "MuffinHash" which was a subclass of the builtin dict type that Python has. The implementation was utterly empty—someone had just thought, "we might need to add custom functionality to Python dictionaries later, let's introduce an abstraction right now just in case."

And his defiant response was simply, "we can always do that later, if we need to."

I think we sometimes pretend like we're going to be worse programmers in the future than we are right now, so we might want to insert some sort of little thing that might make us happy in the future. We anticipate future-us's needs. "If traffic is 100 times larger than we think it will be, that approach won't scale, so we have to put the up-front investment into this harder approach that will scale." Very suspicious.

If we take that advice seriously then we need to identify when “later” has come. Probably the most obvious thing would be to establish an upper limit on the length of things for style reasons. And I think the remaining best advice would be to use DRY—don't repeat yourself—with these heuristics about line lengths to patch up a hole in the SOLID principles. Based on the heuristic of 30 lines being a "page" of text and an analogy with prose,

  1. Refactor a check into a function/method when you want to copy-paste it. Like there are occasional valid reasons to copy-paste but you should always feel dirty about it. Real authors do not make you reread a big long sentence 50 times throughout the narrative unless they're really trying to highlight a theme.
  2. A function/method should ideally be a "paragraph". Most functions should be about half a page long, or 1-15 lines of code, and only maybe 10% of your functions should be allowed to range to a page and a half, 45 lines or more. Once you're at 120+ lines of code and comments that thing needs to be broken down into parts.
  3. A file should ideally be a "chapter". Most files should be 12 pages long or less, so 360 lines of code and comments. Only maybe 10% of your files should be allowed to range to 50 pages long, or 1500 lines of code and comments.
  4. Ideally, most of your code should be indented with the baseline of the function or one level deep. Based on some heuristics about the Linux source tree, if you're religious about it, only maybe 10% of your code should be indented 2 levels or more within the baseline, less than 5% indented 3 levels or more. This means in particular that things which need to "wrap" some other concern, like error handling in a big try/catch, ought to be pulled out of the actual logic.

Like I mentioned up there, I tested these statistics against the current Linux source tree to find those approximate percentages, but they also sort of stand to reason in the literary analogy.

CR Drost
  • 179
  • 3