How do huge open source libraries get maintained while having code far from "clean code" practices?

Question

I'm still inexperienced to write high quality code, so I read books addressing the issue such as Clean Code by Robert C. Martin, and keep checking code of well-known libraries to improve my skills.

Although many open source libraries have been maintained for years, which means that it's very unlikely that they aren't on the right path, I found the code in many of them to be far from the principles addressed to write clean code – e.g methods containing hundreds of lines of code.

So my question is: Are the principles of clean code too restricted, and we can do without them in many libraries like these? If not, how are huge libraries being maintained without considering many of these principles?

I'll appreciate any brief clarification. I apologize if the question seems to be silly from a newbie guy.

EDIT

Check this example in Butterknife library – one of the most well know libraries in Android community.

You are suffering from a biased sample. You say you check code of "well-known" libraries. Well, the libraries that collapsed under their own weight because they weren't following best practices aren't "well-known", they vanished into obscurity. — Jörg W Mittag, Jul 11 '18 at 10:30
oof that is bad. but its not a particularly long lived. how are you judging its popularity — Ewan, Jul 11 '18 at 11:09
I think it's one of the most know libraries in android. I saw it almost every where. It also has around 21k stars on github. — Islam Salah, Jul 11 '18 at 11:15
The primary measure for a piece of software's value isn't how "clean" the code is, it's how well it fulfills some particular task. While some people like to write software for the sake of just writing something, for most people, the code is just a means to an end. — whatsisname, Jul 11 '18 at 14:56
No one disagrees with you. The question is how to maintain poor code for years? Why didn't it have been cleaned over those many iterations of evolving? — Islam Salah, Jul 11 '18 at 14:58
The premise of the question (that long-maintained open-source projects must inherently adhere to one particular book author's notion of best practices) is completely false and I don't know where you got it from. Could you expand on the premise of your question, please? — Lightness Races in Orbit, Jul 12 '18 at 13:56
Comments are not for extended discussion; this conversation has been [moved to chat](https://chat.stackexchange.com/rooms/80190/discussion-on-question-by-islam-salah-how-do-huge-open-source-libraries-get-main). — yannis, Jul 15 '18 at 13:29
The butterknife example might look bad to you, but it's downright wonderful compared to some of the real-life code that I've seen. I'm talking about functions spanning several *thousand* lines, variable names down the lines of `vpg` (each individual letter has a meaning, this is a made-up example), functions taking *several tens of arguments*, not to mention copy-paste programming, use of global variables, etc. **We've come a long way from what code looked like 40 years ago, and while most current code is far from perfect, it's actually much farther from rotten-beyond-redemption.** — cmaster - reinstate monica, Jul 15 '18 at 13:57
Check what the author of Butterknife [said about this](https://github.com/JakeWharton/butterknife/issues/1308) — Islam Salah, Aug 21 '18 at 08:33

JacquesB · Answer 1 · 2018-07-11T14:02:58.660

159

The principles stated in "Clean Code" are not always generally agreed upon. Most of it is common sense, but some of the author's opinions are rather controversial and not shared by everybody.

In particular, the preference for short methods is not agreed on by everybody. If the code in a longer method is not repeated elsewhere, extracting some of it to a separate method (so you get multiple shorter methods) increases overall complexity, since these methods are now visible for other methods which should not care about them. So it is a trade-off, not an objective improvement.

The advice in the book is also (like all advice) geared towards a particular type of software: Enterprise applications. Other kinds of software like games or operating systems have different constraints than enterprise software, so different patterns and design principles are in play.

The language is also a factor: Clean Code assumes Java or a similar language - if you use C or Lisp a lot of the advice does not apply.

In short, the book is a single persons opinions about a particular class of software. It will not apply everywhere.

As for open source projects, code quality ranges from abysmal to brilliant. After all, anyone can publish their code as open source. But if you look at a mature and successful open source project with multiple contributors, you can be fairly sure they have consciously settled on a style that works for them. If this style is in contradiction to some opinion or guideline, then (to put it bluntly) it is the guideline that is wrong or irrelevant, since working code trumps opinions.

edited Jul 11 '18 at 14:02

answered Jul 11 '18 at 11:35

JacquesB

57,310
21
127
176

Totally agree it isn't a Bible. Could you please share any other organized rules followed in other situations as sometimes it seems that things aren't organized at all. check [this](https://github.com/JakeWharton/butterknife/blob/master/butterknife-compiler/src/main/java/butterknife/compiler/ButterKnifeProcessor.java#L1020) for instance. – Islam Salah Jul 11 '18 at 13:34
4

@IslamSalah: Numerous books (probably thousands) have been written about this subject, so this is far to big and broad a subject to answer in a comment. As for the specific example, you could write and ask the developer in question if this is a deliberate design. – JacquesB Jul 11 '18 at 13:55
19

+1 for "geared towards a particular type of software". This can be extended to most books on this and similar topics. Take everything you read with a grain of salt, it may be biased by the time it's written, the target environment, the development methodology, and all kinds of other factors. – Reginald Blue Jul 11 '18 at 15:19
16

Following that book strictly produces what many call "garbage code." – Frank Hileman Jul 11 '18 at 16:06
3

"If the code in a longer method is not repeated elsewhere, extracting some of it to a separate method (so you get multiple shorter methods) increases overall complexity, since these methods are now visible for other methods which should not care about them." -- That's generally true, but depends on the language features. For example, C# 7 has local functions (functions inside functions) which are not accessible outside the current method. – Nelson Rothermel Jul 12 '18 at 02:56
16

@FrankHileman: following none of the recommendations of that book even more. – Doc Brown Jul 12 '18 at 07:55
@NelsonRothermel right, and I daresay it's pretty obnoxious for a modern high-level language to not allow local functions in some way. Python and most other dynamic languages support them without much discussion, so do all functional languages. As you say C# has added support for this, and in C++ it can be done with lambdas. – leftaroundabout Jul 12 '18 at 10:20
2

I think "Understanding Software" is a much less controversial guide to maintainability. The author doesn't give hard and fast rules, he talks about the *purpose* behind rules such as those in Clean Code, and different ways to accomplish those purposes in the real world. *Very* useful guide that I think more people should read. – Wildcard Jul 12 '18 at 22:05
4

@IslamSalah "Could you please share any other organized rules..." This isn't the right way to make your code clean. There is no set of rules that automatically lead to good software. The principles in [this recent answer](https://softwareengineering.stackexchange.com/a/373662/92517) are good for all kinds of code. [This famous comic](https://www.reddit.com/r/ProgrammerHumor/comments/1f9df7/the_only_valid_measurement_of_code_quality/) has the right idea. Focus on using structures to *solve problems* in ways that match the intended use and on making your code obvious to readers, not on rules. – jpmc26 Jul 13 '18 at 01:35
5

@jpmc26 - Your linked answer pertains to a field that I am intimately familiar with, scientific programming. I recently got a wish list item granted, which was to make the gravitational model used in several Johnson Space Center simulations relativistically correct. Counting comments and blank lines, the code I wrote that calculates the relativistic perturbation to Newtonian gravity is 145 lines long, and it's all in one function. Normally I would cringe at seeing that I myself wrote a function that is 45 lines long, let alone 145. But not in this case. ... – David Hammen Jul 13 '18 at 04:09
12

... The function in question implements a single equation, equation X in journal paper Y, so it definitely follows the single purpose rule. (That the equation covers a quarter of a page is in the details.) There is no meaningful place to split this function into parts, and no meaningful reason to do so. The comments, which Uncle Bob despises? They are absolutely necessary in this case, and this is typical in scientific programming. While it's good to see the relevant journal references in the TeX documentation of the model, it's also good to see them in the implementation. – David Hammen Jul 13 '18 at 04:13
1

@leftaroundabout And before lambdas (introduced in C++11), one could already define a local class. Which could have any number of functions. – Deduplicator Feb 09 '19 at 15:52

Doc Brown · Accepted Answer · 2018-07-11T17:19:35.907

Good answer here already, but let me say a word about your butterknife example: though I have no idea what the code does, at a first glance, it does not look really unmaintainable to me. Variables and method names seem to be chosen deliberately, the code is properly indented and formatted, it has some comments and the long methods at least show some block structure.

Yes, it does in no way follow Uncle Bob's "clean code" rules, and some of the methods are sure too long (probably the whole class). But looking at the code I still see enough structure so that they could be easily "cleaned up" by extracting those blocks into methods on their own (with a low risk of introducing bugs when using refactoring tools).

The real problem with such code is, adding one block and another block and another block works to some degree, sometimes over years. But every day the code gets harder to evolve a little bit, and it takes a little bit longer to modify and test it. And when you really have to change something which cannot be solved by "adding another block", but requires restructuring, then you will wish someone had started to clean up the code more early.

Comments are not for extended discussion; this conversation has been [moved to chat](https://chat.stackexchange.com/rooms/80189/discussion-on-answer-by-doc-brown-how-do-huge-open-source-libraries-get-maintain). — yannis, Jul 15 '18 at 13:28

Jens Bannmann · Answer 3 · 2018-07-14T10:24:32.863

Summary

As JacquesB writes, not everybody agrees with Robert C. Martin's "Clean Code".

The open source projects that you found to be "violating" the principles you expected are likely to simply have other principles.

My perspective

I happen to oversee several code bases that adhere very much to Robert C. Martin's principles. However, I do not really claim that they are right, I can only say they work well for us - and that "us" is in fact a combination of at least

the scope and architecture of our products,
the target market / customer expectations,
how long the products are maintained,
the development methodology we use,
the organizational structure of our company and
our developers' habits, opinions, and past experience.

Basically, this boils down to: each team (be it a company, a department or an open source project) is unique. They will have different priorities and different viewpoints, and of course they will make different tradeoffs. These tradeoffs, and the code style they result in, are largely a matter of taste and cannot be proven "wrong" or "right". The teams can only say "we do this because it works for us" or "we should change this because it doesn't work for us".

That said, I believe that to be able to successfully maintain large codebases over years, each team should agree on a set of code conventions they think are suitable for the aspects given above. That may mean adopting practices by Robert C. Martin, by another author, or inventing their own; it may mean writing them down formally or documenting them "by example". But they should exist.

Example

Consider the practice of "splitting code from a long method into several private methods".

Robert C. Martin says that this style allows for limiting the contents of each method to one level of abstraction - as a simplified example, a public method would probably only consist of calls to private methods like verifyInput(...), loadDataFromHardDisk(...), transformDataToJson(...) and finally sendJsonToClient(...), and these methods would have the implementation details.

Some people like this because readers can get a quick overview of the high-level steps and can choose which details they want to read about.
Some people dislike it because when you want to know all the details, you have to jump around in the class to follow along the execution flow (this is what JacquesB likely refers to when he writes about adding complexity).

The lesson is: all of them are right, because they are entitled to have an opinion.

score 13 · Answer 4 · answered Jul 12 '18 at 16:21

Many open source libraries do in fact suffer from objectively poor coding practices and are maintained with difficulty by a small group of long-term contributors who can deal with the poor readability because they are very familiar with the parts of the code that they most frequently maintain. Refactoring code to improve readability after the fact is often a Herculean effort because everyone needs to be on the same page, it's not fun and it doesn't pay because no new features are implemented.

As others have said, any book about clean code stating anything at all necessarily contains advice that is not universally agreed upon. In particular, almost any rule can be followed with excessive zeal, replacing a readability problem with another one.

Personally, I avoid creating named functions if I don't have a good name for them. And a good name has to be short and describe faithfully what the function does to the exterior world. This is also tied with trying to have as few function arguments as possible and no globally writable data. Trying to cut down a very complex function into smaller functions often results in very long argument lists when the function was genuinely complex. Creating and maintaining readable code is an exercise in equilibrium between mutually conflicting common sense rules. Reading books is good, but only experience will teach you how to find false complexity, which is where the real readability gains are made.

I would add: simply because something is “open source” doesn't mean that just anybody is a contributor. Oftentimes many open–source projects are maintained by cliques, for better or for worse, who isolate their project from other contributors — and, unless it gets forked, nobody else contributes. If it doesn't get forked, whether because nobody needs to modify it or because nobody can understand how to, then the conventional style of the code will probably stay unchanged. — can-ned_food, Jul 14 '18 at 11:50

score 8 · Answer 5 · answered Jul 12 '18 at 10:10

Most open source projects are badly managed. There are obviously exceptions to that, but you will find a lot of junk in the open-source world.

This is not a critique of all the project owners/managers whose projects I am talking about, it is simply a matter of time used. These people have better things to do with their time, like their actual paying job.

In the beginning the code is the work of one person and is probably small. And small code doesn't need to be clean. Or rather, the effort needed to make the code clean is larger than the benefit.

As time goes by, the code is more a pile of patches by a lot of different people. The patch writers feel no ownership of the code, they just want this one feature added or this one bug fixed in the easiest way possible.

The owner does not have the time to clean things up and nobody else cares.

And the code is getting big. And ugly.

As it gets harder and harder to find your way around the code, people start adding features in the wrong place. And instead of fixing bugs, they add workarounds other places in the code.

At this point it isn't just that people don't care, they no longer dare clean up since they are afraid of breaking things.

I have had people describing code bases as "cruel and unusual punishment".

My personal experiences aren't quite that bad, but I have seen a few very odd things.

If you eliminate the words "open" and "source" in this answer it will continue to be just as true. — Stephen M. Webb, Jul 12 '18 at 14:10
I would say that this is equally true for closed-source software. — Mark Rotteveel, Jul 14 '18 at 06:27

score 4 · Answer 6 · answered Jul 13 '18 at 06:25

It seems to me, you are asking how does this stuff even work if nobody is doing what they are supposed to be doing. And if it does work, then why are we supposed to be doing these things?

The answer, IMHO, is that it works "good enough", also known as the "worse is better" philosophy. Basically, despite the rocky history between open source and Bill Gates, they both de-facto adopted the same idea, that most people care about features, not bugs.

This of course also leads us to "normalization of deviance" which leads to situations like Heartbleed, where, precisely as if to answer your question, a massive, overgrown spaghetti pile of open source code called OpenSSL went "uncleaned" for something like ten years, winding up with a massive security flaw affecting thousands of millions of people.

The solution was to invent a whole new system called LibreSSL, which was going to use clean-ish code, and of course almost nobody uses it.

So how are huge badly coded open source projects maintained? The answer is in the question. A lot of them aren't maintained in a clean state. They are patched randomly by thousands of different people to cover use cases on various strange machines and situations the developers will never have access to test on. The code works "good enough" until it doesn't, when everyone panics and decides to throw money at the problem.

So why should you bother doing something 'the right way' if nobody else is?

The answer is you shouldn't. You either do or you don't, and the world keeps turning regardless, because human nature doesn't change on the scale of a human lifetime. Personally, I only try to write clean code because I like the way it feels to do it.

Sooo many links... at first glance I thought this answer might have been laced with hover advertising or that it was a Wikipedia page. — Jonny Henly, Jul 14 '18 at 16:56

score 2 · Answer 7 · answered Jul 11 '18 at 18:51

What constitutes good code depends on the context, and classic books guiding you on that are, if not too old to discuss open-source, at least part of a tradition waging the neverending war against bad in-house codebases. So it's easy to overlook the fact that libraries have completely different aims, and they're written accordingly. Consider the following issues, in no particular order:

When I import a library, or from a library, I'm probably not enough of an expert in its internal structure to know exactly which tiny fraction of its toolkit I need for whatever I'm working on, unless I'm copying what a Stack Exchange answer told me to do. So I start typing from A import (if it's in Python, say) and see what comes up. But that means what I see listed needs to reflect the logical tasks I'll need to borrow, and that's what has to be in the codebase. Countless helper methods that make it shorter will just confuse me.
Libraries are there for the most inexpert programmer trying to use some algorithm most people have only vaguely heard of. They need external documentation, and that needs to precisely mirror the code, which it can't do if we keep refactoring everything to make short-method and do-one-thing adherents happy.
Every library method people borrow could break code the world over with disastrous consequences if it's taken down or even renamed. Sure, I wish sklearn would correct the typo in Calinski-Harabasz, but that could cause another left-pad incident. In fact, in my experience the biggest problem with library evolution is when they try too hard to adopt some good-code new "improvement" to how they structure everything.
In-house, comments are largely a necessary evil at best, for all manner of reasons I needn't regurgitate (although those points do exaggerate somewhat). A good comment says why the code works, not how. But libraries know their readers are competent programmers who couldn't, say, write-linear-algebra their way out of a paper bag. In other words, everything needs commenting re: why it works! (OK, that's another exaggeration.) So that's why you see signature line, 100-line comment block, 1 line of code that could literally have gone on the signature line (language permitting, of course).
Let's say you update something on Github and wait to see whether your code will be accepted. It must be clear why your code change works. I know from experience that refactoring to make the campsite cleaner as part of a functional commit often means a lot of line-saving, rearrangement and renaming, which makes your salaryless reviewer's job harder, and causes other aforementioned problems.

I'm sure people with more experience than me can mention other points.

About first bullet point. That is why you have public/private methods. You expose public api that internally calls private or internal methods. Second bullet point is also inaccurate. I see no reason why you can't have documentation on a short public method and then call many small ones. — FCin, Jul 12 '18 at 06:53
@FCin That's s viable approach, as long as the maintainers remember to always use the correct keyword in front of every single method as they come and go. Or they could just do something that's easier and less error-prone. — J.G., Jul 12 '18 at 06:57
In languages such as C#, Java (which Uncle Bob usually talks about), access modifiers are the most basic tool used for writing any code really. Using correct keyword is part of writing any code. — FCin, Jul 12 '18 at 07:09
@FCin They're less frequently made explicit in some other languages, but I've worked even on in-house C# codebases where people didn't necessarily use the modifiers they should have. — J.G., Jul 12 '18 at 07:44
_"if not too old to discuss open-source"_ do you think open-source is some new thing? Open-source most certainly predates any books I know about clean code, refactoring, etc. — Mark Rotteveel, Jul 14 '18 at 06:32
@MarkRotteveel You're absolutely right. I'm no expert on the history of coding advice, but I suspect people started worrying about it when it was hard to search for and download open-source code, and software was small enough people frequently found it quicker to reinvent wheels. — J.G., Jul 14 '18 at 07:06

score 2 · Answer 8 · answered Jul 13 '18 at 12:35

There are already a lot of good answers - I want to give the perspective of an open source maintainer.

My perspective

I'm a maintainer of a lot of such projects with less than great code. Sometime I am even prevented from improving such code because of compatibility concerns since the libraries are downloaded millions of times every week.

It does make maintaining harder - as a Node.js core member there are parts of the code I'm afraid to touch but there is a lot of work to do regardless and people use the platform successfuly and enjoy it. The most important thing is that it works.

On readable code

When you say:

I found the code in many of them to be far from the principles addressed to write clean code – e.g methods containing hundreds of lines of code.

Lines of code are not a great measure of how readable it is. In the study I linked to the linux kernel was analyzed and a survey of programmers found "regular" code (code that people expect basically) and consistent code to be better than "clean" code in understandability. This also aligns with my personal experience.

Some open source projects aren't too welcoming

Linus "famously" said that linux shouldn't have a built in debugger because people using debuggers aren't good enough to work on linux and he doesn't want to attract more of them.

Personally I absolutely disagree with his stance there - but it is also something people do.

score 1 · Answer 9 · answered Jul 13 '18 at 13:07

Open source software does not necessarily mean that multiple authors are involved. When a software (or unit of software) is written by a single author, long functions appear frequently.

This comes from the nature of the development process. A simple method gets extended over time, new features are being added and bug fixed.

Long methods severely reduce the understanding of functionality for new authors. However, with a single author this is rarely a problem and the problem tends to be overlooked. Another nature of open source is the fact that a lot of software is not actively developed therefore there is no refactoring work that would, for example, split complex methods into multiple simple methods.

You haven't shown any examples but from my understanding this is also often connected to the development language. Some languages enforce strict linting rules from the beginning and heavy unit testing (or even TDD). Both linting and unit tests usually prevent that issue (it's hard to unit test complex/long methods).

In general, it's harder to make code clean if software is developed by a single author and other contributors are only fixing small issues.

How do huge open source libraries get maintained while having code far from "clean code" practices?

9 Answers9

My perspective

On readable code

Some open source projects aren't too welcoming