184

Clean Code suggests avoiding protected variables in the "Vertical Distance" section of the "Formatting" chapter:

Concepts that are closely related should be kept vertically close to each other. Clearly this rule doesn't work for concepts that belong in separate files. But then closely related concepts should not be separated into different files unless you have a very good reason. Indeed, this is one of the reasons that protected variables should be avoided.

What is the reasoning?

Thomas Owens
  • 79,623
  • 18
  • 192
  • 283
Matsemann
  • 1,918
  • 3
  • 16
  • 21
  • 5
    I think what he was saying was that having a lot of protected variables (not private variables) could be a sign of not keeping "closely related concepts" in one file. – programmer Aug 28 '12 at 15:10
  • 8
    show an example where you think you _need_ it – gnat Aug 28 '12 at 15:11
  • 7
    If you need protected variables OFTEN, then you're not prefering composition over inheritance, which is a whole other problem. If you need them occasionally then that's fine; he only says "avoid," not "never use." – pdr Aug 28 '12 at 15:21
  • @pdr Removed the 'need' part of my question as that's not what I'm really looking for. I'm wondering about the quote from the book. – Matsemann Aug 28 '12 at 15:27
  • 69
    I wouldn't take much in that book as gospel. – Rig Aug 28 '12 at 15:30
  • 40
    @Rig you are bordering on blasphemy in these parts with a comment like that. – Ryathal Aug 28 '12 at 15:47
  • 42
    I'm ok with that. I've read the book and can have my own opinion on it. – Rig Aug 28 '12 at 15:54
  • 12
    I've read the book and although I do agree with _most_ of what's in there, I wouldn't say I consider it gospel either. I think that every situation is different.. and being able to stick to the exact guidelines represented in the book are quite difficult for lots of projects. – Simon Whitehead Aug 29 '12 at 02:36
  • 1
    Why are public variables bad? Protected is just a more restricted kind of public. It's for the same reasons [getters and setters](http://stackoverflow.com/a/12108025/140719) [are bad](http://stackoverflow.com/questions/1568091/why-use-getters-and-setters#comment16169586_1568230). – sbi Aug 29 '12 at 13:45
  • 6
    @Rig I wouldn't take it as gospel either, on the other hand I thought the majority of his observations were reasonable, I'm interested in what your opinion is of the book? – Antonio2011a Aug 30 '12 at 08:19
  • 1
    @Jubbat unless it's a public API, you may not need documentation to see what the method should do. For instance, the method name should be descriptive enough. Or if the method is well written, reading the implementation should be as easy as reading some lines of documentation – Matsemann Jul 04 '13 at 13:10
  • 1
    @Jubbat if the code has a bug, the comment will probably lie/be wrong, and thus useless. – Matsemann Jul 04 '13 at 13:52
  • 1
    I think the emphasis is not on "protected" but on "variables". He's not saying you shouldn't use protected *anything*; he's saying to avoid protected *variables*. Variables don't have any validation built around them like methods or properties can. It's similar to why you shouldn't use public *variables*, which isn't to say you shouldn't use public methods or properties. – Kyralessa Jun 25 '15 at 23:50
  • @Jubbat In TDD, the tests perform the function that you would use a doc comment for. Exactly the same workflow, exactly the same use cases. (At least that's the idea). – Jørgen Fogh Mar 11 '16 at 13:27
  • 5
    "I don't get this one. I would think code can become outdated, not documentation." -- What a bizarre thought. It is code that is checked for consistency by the compiler, tested by unit and system tests and QA, observed by the client ... it is the code that *behaves*. The code cannot possibly be out of date with respect to the behavior, because they are one and the same thing (barring compiler bugs, hardware failures, etc.) Comments are just passive text. "In other words, comments never lie by definition." -- Ironically, that's a quite dishonest assertion. – Jim Balter Apr 16 '16 at 05:47

10 Answers10

296

Protected variables should be avoided because:

  1. They tend to lead to YAGNI issues. Unless you have a descendant class that actually does stuff with the protected member, make it private.
  2. They tend to lead to LSP issues. Protected variables generally have some intrinsic invariance associated with them (or else they'd be public). Inheritors then need to maintain those properties, which people can screw up or willfully violate.
  3. They tend to violate OCP. If the base class makes too many assumptions about the protected member, or the inheritor is too flexible with the behavior of the class, it can lead to the base class' behavior being modified by that extension.
  4. They tend to lead to inheritance for extension rather than composition. This tends to lead to tighter coupling, more violations of SRP, more difficult testing, and a slew of other things that fall within the 'favor composition over inheritance' discussion.

But as you see, all of these are 'tend to'. Sometimes a protected member is the most elegant solution. And protected functions tend to have fewer of these issues. But there are a number of things that cause them to be treated with care. With anything that requires that sort of care, people will make mistakes and in the programming world that means bugs and design problems.

samthebrand
  • 368
  • 2
  • 12
  • 27
Telastyn
  • 108,850
  • 29
  • 239
  • 365
  • Thanks for many good reasons. However, the book mentions it specifically in a context of formatting and vertical distance, it's that part I'm wondering about. – Matsemann Aug 28 '12 at 16:12
  • 2
    It seems that Uncle Bob is referring to vertical distance when someone is referring to a protected member *between* classes, and not in the same class. – Robert Harvey Aug 28 '12 at 16:18
  • 5
    @Matsemann - sure. If I remember the book accurately, that section focuses on the readability and discover-ability of code. If variables and functions work on a concept they should be close in code. Protected members will be used by derived types, which necessarily cannot be close to the protected member since they're in a completely different file. – Telastyn Aug 28 '12 at 16:32
  • @Telastyn - not "necessarily". Sometimes, an inheritance hierarchy is defined all in one go and within a single file (where the language doesn't force different classes into different files, at least). This tends to be a design pattern thing rather than an extensibility thing - of course OOP design patterns are intended to be extensible, but they can also be used in cases where extensibility seems unlikely to be an issue, for familiarity or other reasons. Vertical distance is still an issue if the classes are complex, of course - and huge files are a problem too, at least to some people. –  Aug 28 '12 at 21:46
  • 3
    @steve314 I can't imagine a scenario where the base class and _all_ of its inheritors will only ever live in one file. Without an example I am inclined to think it an abuse of inheritance or poor abstraction. – Telastyn Aug 28 '12 at 22:08
  • @Telastyn - one example I had was a multiway tree data structure. Nodes are either branch nodes or leaf nodes. Using late binding and recursion leads to neat implementations of the basic algorithms - no need for explicit is-this-a-leaf-node checks. It's not necessarily a great idea - in fact the multiway tree library I actually use is implemented quite differently - but it's within the realm of reasonable trade-offs. And of course a multiway tree data structure is highly unlikely to suddenly need a new node type next month. –  Aug 28 '12 at 22:41
  • @Talastyn - of course the vertical distance thing could only be respected because C++ allows method definitions to be separated from their declarations, so the insert method for the branch and leaf classes can be kept together etc. And I'm not advocating protected data here, so the relevance is limited. –  Aug 28 '12 at 22:43
  • I agree with this and upvoted it. My corollary is that I start with an implementation and use the "push up" refactoring when I need it—so I only end up with `protected` variables in the case that the need can be demonstrated, not that I thought it'd be useful. –  Aug 29 '12 at 06:44
  • You can rephrase point #4 as "They tend to violate [Composition over inheritance Principle](http://en.wikipedia.org/wiki/Composition_over_inheritance)" – abatishchev Aug 29 '12 at 14:50
  • @abatishchev I am not inclined to. That article is a bit too strong towards composition IMO. Favoring composition isn't (imo) a principle to follow as strongly as the SOLID principles, since there are a number of scenarios where inheritence is clearly better. Those scenarios tend to be exceptional cases for SOLID. – Telastyn Aug 29 '12 at 15:01
  • @Telastyn: Can't disagree. But I can't say though that CoI is a principle unreservedly to lead. It's sort of strategy you may want to review against each particular scenario. Shortly, a "nice to know" thing a thing to keep in mind when designing classes. – abatishchev Aug 29 '12 at 15:05
  • 4
    YAGNI=You Aint Gonna Need it LSP=Liskov Substitution Principle OCP=Open/Close Principle SRP=Single Responsibility Principle – Bryan Glazer Dec 11 '13 at 01:48
  • @Telastyn The inheriting approach is the base for accomplishing OCP, I'm interested in understanding how inheritance could break that principle. – guido Nov 17 '19 at 12:41
60

It's really the same reason you avoid globals, just on a smaller scale. It's because it's hard to find everywhere a variable is being read, or worse, written to, and hard to make the usage consistent. If the usage is limited, like being written in one obvious place in the derived class and read in one obvious place in the base class, then protected variables can make the code more clear and concise. If you're tempted to read and write a variable willy-nilly throughout multiple files, it's better to encapsulate it.

Karl Bielefeldt
  • 146,727
  • 38
  • 279
  • 479
27

I haven't read the book, but I can take a stab at what Uncle Bob meant.

If you put protected on something, that means that a class can inherit it. But member variables are supposed to belong to the class in which they are contained; this is part of basic encapsulation. Putting protected on a member variable breaks encapsulation because now a derived class has access to the implementation details of the base class. It's the same problem that occurs when you make a variable public on an ordinary class.

To correct the problem, you can encapsulate the variable in a protected property, like so:

protected string Name
{
    get { return name; }
    private set { name = value; }
}

This allows name to be safely set from a derived class using a constructor argument, without exposing implementation details of the base class.

Robert Harvey
  • 198,589
  • 55
  • 464
  • 673
  • You created a public property with protected setter. The protected field should be solved with a protected property with private setter. – Andy Aug 28 '12 at 22:32
  • 2
    +1 now. Setter could be protected too I suppose, but the point ids the base can enforce if the new value is valid or not. – Andy Aug 28 '12 at 22:39
  • Just to offer an opposing point of view - I make all my variables protected in a base class. If they are only to be accessed through a public interface then it shouldn't be a derived class, it should just contain a baseclass object. But I also avoid hierarchies more than 2 classes deep – Martin Beckett Aug 30 '12 at 04:42
  • 2
    There is a *huge* difference between `protected` and `public` members. If `BaseType` exposes a public member, that implies that all derived types must have that same member work the same way, since a `DerivedType` instance may be passed to code which receives a `BaseType` reference and expects to use that member. By contrast, if `BaseType` exposes a `protected` member, then `DerivedType` may expect to access that member in `base`, but `base` can't be anything *other* than a `BaseType`. – supercat Oct 08 '12 at 23:33
20

Uncle Bob's argument is primarily one of distance: if you have a concept that is important to a class, bundle the concept together with the class in that file. Not separated across two files on the disk.

Protected member variables are scattered in two places, and, kinda looks like magic. You reference this variable, yet it isn't defined here... where is it defined? And thus the hunt begins. Better to avoid protected altogether, his argument.

Now I don't believe that rule needs to be obeyed to the letter all the time (like: thou shalt not use protected). Look at the spirit of what he is getting at... bundle related things together into one file - use programming techniques and features to do that. I would recommend that you don't over-analyze and get caught up in the details on this.

samthebrand
  • 368
  • 2
  • 12
  • 27
J. Polfer
  • 545
  • 4
  • 7
9

Some very good answers already here. But one thing to add:

In most modern OO languages it is a good habit (in Java AFAIK its necessary) to put each class into its own file (please don't get nitty about C++ where you often have two files).

So it is virtually impossible to keep the functions in a derived class acessing a protected variable of a base class "vertically close" in source code to the variable definition. But ideally they should be kept there, since protected variables are intended for internal use, which makes functions acessing them often "a closely related concept".

Doc Brown
  • 199,015
  • 33
  • 367
  • 565
  • 2
    Not in Swift. Interestingly, Swift doesn’t have “protected” but it has “fileprivate” which replaces both “protected” and C++ “friend”. – gnasher729 Feb 27 '18 at 15:25
  • @gnasher729 - of course, Swift has a lot of Smalltalk in its heritage. Smalltalk doesn't have anything other than private variables and public methods. And private in Smalltalk means private: two objects even of the same class can't access each other's variables. – Jules Feb 27 '18 at 15:46
  • In Flutter (in Dart) (which postdates the answer by some time) "class MyHomePage extends StatefulWidget" and "class _MyHomePageState extends State" would always go in the same file, to give another example of a language/system that puts multiple classes in one file. Modern Ada is a fully OO language, and still thinks in terms of packages; a package has to go in its own file, and doesn't necessarily contain only one class. (Actually interface and implementation files, like C++, but as you said, that's beside the point.) – prosfilaes Feb 23 '22 at 16:00
5

The basic idea is that a "field" (instance-level variable) that is declared as protected is probably more visible than it has to be, and less "protected" than you might like. There is no access modifier in C/C++/Java/C# that is the equivalent of "accessible only by child classes within the same assembly", thus granting you the ability to define your own children that can access the field in your assembly, but not allowing children created in other assemblies the same access; C# has internal and protected modifiers, but combining them makes the access "internal or protected", not "internal and protected". So, a field that is protected is accessible by any child, whether you wrote that child or someone else did. Protected is thus an open door to a hacker.

Also, fields by their definition have pretty much no validation inherent in changing them. in C# you can make one readonly, which makes value types effectively constant and reference types unable to be reinitialized (but still very mutable), but that's about it. As such, even protected, your children (which you cannot trust) have access to this field and can set it to something invalid, making the object's state inconsistent (something to be avoided).

The accepted way to work with fields is to make them private and access them with a property, and/or a getter and setter method. If all consumers of the class need the value, make the getter (at least) public. If only children need it, make the getter protected.

Another approach that answers the question is to ask yourself; why does code in a child method need the ability to modify my state data directly? What does that say about that code? That's the "vertical distance" argument on its face. If there is code in a child that must directly alter parent state, then maybe that code should belong to the parent in the first place?

KeithS
  • 21,994
  • 6
  • 52
  • 79
5

It's an interesting discussion.

To be frank, good tools can mitigate many of these "vertical" issues. In fact, in my opinion the notion of the "file" is actually something that hinders software development - something the Light Table project is working on (https://chris-granger.com/2012/04/12/light-table-a-new-ide-concept/).

Take files out of the picture, and the protected scope becomes a lot more appealing. The protected concept becomes most clear in langauges like SmallTalk - where any inheritance simply extends the original concept of a class. It should do everything, and have everything, that the parent class did.

In my opinion, class hierarchies should be as shallow as possible, with most extensions coming from composition. In such models, I don't see any reasons for private variables to not be made protected instead. If the extended class should represent an extension including all behaviour of the parent class (which I believe it should, and therefore believe private methods are inherently bad), why should the extended class not represent the storage of such data as well?

To put this another way - in any instance where I feel a private variable would suit better than a protected one, inheritence is not the solution to the problem in the first place.

Solomon Ucko
  • 408
  • 3
  • 10
spronkey
  • 169
  • 2
1

I find using protected variables for JUnit tests can be useful. If you make them private then you cannot access them during testing without reflection. This can be useful if you are testing a complex process through many state changes.

You could make them private, with protected getter methods, but if the variables are only going to be used externally during a JUnit test, I'm not sure that's a better solution.

Nick
  • 119
  • 1
0

As it says there, this is only one of the reasons, that is, logically linked code should be placed inside physical linked entities (files, packages and others). On a smaller scale this is the same as the reason you do not put a class which makes DB queries inside the package with classes that displays the UI.

The main reason however that protected variables are not recommended is because they tend to break encapsulation. Variables and methods should have as limited visibility as possible; for further reference see Joshua Bloch's Effective Java, Item 13 - "Minimize the accessibility of classes and members".

However, you should not take all of this ad litteram, only as a guildline; if protected variables are very bad they would not have been placed inside the language in the first place. A reasonable place for protected fields imho is inside the base class from a testing framework (integration test classes extend it).

Random42
  • 10,370
  • 10
  • 48
  • 65
  • 1
    Don't assume that because a language allows it, it's ok to do. I'm not sure there really is a god reason to allow protected fields; we actually disallow them at my employer. – Andy Aug 28 '12 at 22:35
  • 2
    @Andy You haven't read what I said; I said that protected fields are not recommended and reasons for this. There clearly are scenarios in which protected variables are needed; you may want a constant final value only in the subclasses. – Random42 Aug 29 '12 at 09:33
  • You may want to reword some of your post then, as you seem to use variable and field interchangably and make explicit that you'd consider things like protected consts an exception. – Andy Aug 29 '12 at 11:31
  • 3
    @Andy It's pretty obvious that since I was referring to protected variables I was not talking about local or parameter variables but to fields. I also mentioned a scenario where non-constant protected variables are useful; imho it's wrong for a company to enforce the way the programmers write code. – Random42 Aug 30 '12 at 06:41
  • 1
    If it was obvious I wouldn't have been confused and downvoted you. The rule was made by our senior developers, as was our decision to run StyleCop, FxCop and require unit tests and code reviews. – Andy Aug 30 '12 at 14:39
-1

Yes, I agree that is somewhat strange given that (as I recall) the book also discusses all non-private variables as something to be avoided.

I think the issue with the protected modifier is that not only is the member exposed to subclasses it's also made visible to the entire package. I think it is this dual aspect that Uncle Bob is taking exception to here. Of course, one can't always get around having protected methods, and thus the protected variable qualification.

I think a more interesting approach is to make everything public, that'll force you to have small classes, which in turn renders the whole vertical distance metric somewhat moot.

CurtainDog
  • 334
  • 1
  • 6