Why are data classes considered a code smell?

Question

This article claims that a data class is a "code smell". The reason:

It's a normal thing when a newly created class contains only a few public fields (and maybe even a handful of getters/setters). But the true power of objects is that they can contain behavior types or operations on their data.

Why is it wrong for an object to contain only data? If the core responsibility of the class is to represent data, wouldn't add methods that operate on the data break the Single Responsibility Principle?

This is going to depend strongly on language features. In Python, for example, there's no distinction between the "field" and its accessors, unless [you go out of your way to write Java in Python](http://www.python-course.eu/python3_properties.php). — jscs, Dec 15 '16 at 11:39
I think having some data only classes is not a code smell per se, but if most classes are like that then we are talking about the "anemic domain" antipattern https://en.wikipedia.org/wiki/Anemic_domain_model — Tulains Córdova, Dec 15 '16 at 12:03
I don't see how this question is a duplicate. The other questions is about the use of data classes in OO, while this one is about the downsides of data classes - entirely different subjects. — Milos Mrdovic, May 18 '18 at 10:04
You may want to read this answer on stackoverflow which is much more differentiated than the top voted answer here that postulates the inferiority of the rich domain model and present it like a proven fact. https://stackoverflow.com/questions/23314330/rich-vs-anemic-domain-model — Julian Sievers, Mar 28 '19 at 13:58

David Arno · Accepted Answer · 2016-12-15T11:29:59.997

62

There is absolutely nothing wrong with having pure data objects. The author of the piece quite frankly doesn't know what he's talking about.

Such thinking stems from an old, failed, idea that "true OO" is the best way to program and that "true OO" is all about "rich data models" where one mixes data and functionality.

Reality has shown us that actually the opposite is true, especially in this world of multi-threaded solutions. Pure functions, combined with immutable data-objects, is a demonstrably better way to code.

edited Dec 15 '16 at 11:29

answered Dec 15 '16 at 11:23

David Arno

38,972
9
88
121

1

Just wanted to add that pure data objects can be invaluable for modeling relationships, validation, controlling access/changes. – Adrian Dec 15 '16 at 11:28
3

Although it is nice if those pure functions that only take an instance of the immutable data object as argument are implemented as methods on the data object. – RemcoGerlich Dec 15 '16 at 11:41
@RemcoGerlich, Completely disagree. All that does is needlessly couple those functions to the data. – David Arno Dec 15 '16 at 11:47
15

If the function takes an argument of that data type, it is already coupled to the data. – RemcoGerlich Dec 15 '16 at 11:48
@RemcoGerlich I would argue that it depends on what responsibilities those methods have. If they relate to validation, transformation (ex. toXML), or manipulation of the underlying data you certainly right but beyond that and other abstractions are likely needed. – Adrian Dec 15 '16 at 11:51
@Aliester: I'm trying to think of other types of pure functions that only take the data object as a parameter, but can't think of any. – RemcoGerlich Dec 15 '16 at 12:21
@RemcoGerlich I was thinking of functions that are written to a common interface or abstract class implemented by the data-object. Example could be an HTTP client that takes an "ISerializable" object and sends a POST request to an endpoint. – Adrian Dec 15 '16 at 12:27
@Aliester: yes but sending a POST request is a clear side effect, so not a pure function. – RemcoGerlich Dec 15 '16 at 12:33
@RemcoGerlich missed the "pure" part of your assertion. Will have to give this more thought. – Adrian Dec 15 '16 at 14:20
1

@Aliester How about a HTTP client that takes an "ISerializable" and returns a "IO unit" representing the POST – Caleth Dec 15 '16 at 14:34
15

Downvoting because this is incorrect, or at best a matter of opinion. In an OO language, it usually makes sense to have an object contain both the data (which can still be immutable!) and the methods that act on it. Pure functions and separate data are great in other language paradigms, but if you’re doing OO, do OO fully. – Marnen Laibow-Koser Mar 25 '19 at 23:36
12

Reality has shown us a lot of things. -1 for dogmatic know-it-all "the others have failed" opinion. Also, the author does not say that pure data objects are "wrong", just that they are a "code smell" and worthy of questioning. I only regret that I have one downvote to give for my country. :-) – user949300 Mar 26 '19 at 00:07
4

-1 Load of rubbish because A) Using a lower level of abstraction has ALWAYS been the answer when high performance is needed & B) Data classes are a code smell exactly because classes are much more than just a "bag of data" abstraction. And using them like that will only result in fragmented code tightly coupled with your data rather than the operations and behaviours you would apply to that data. – ZombieTfk Nov 13 '19 at 13:03
@RemcoGerlich I know this is an ANCIENT thread, but this is not true for structurally typed languages. Imagine something like strongly-typed duck-typing. It's pretty awesome. You should checkout TypeScript. It makes JavaScript bearable. – Julian Jan 03 '21 at 02:40
1

@Julian: in fact I have since switched from Python (in backends) to TypeScript (in frontends), I agree completely :-) – RemcoGerlich Jan 03 '21 at 18:27
@RemcoGerlich Haha, I made the same switch. It's awesome, isn't it!?! – Julian Jan 05 '21 at 03:06
1

Given that OP's data object are **expressly mutable**, this is a false answer. Let alone the functional fanaticism. – user949300 Mar 29 '21 at 15:45
1

I think for "domain logic" objects are really good but for communication you'll want to have DTOs (data classes). I think that comment is very absolute. In my opinion it depends on the use case. – JSAN L. Mar 29 '21 at 16:53

score 17 · Answer 2 · edited Mar 29 '21 at 17:23

17

There is absolutely nothing wrong with having pure data objects. The author has an opinion not shared by the software developers I know.

Especially for database mapping you in general have entity classes which only contain the fields stored in the data base and getters and setters. Wikipedia on the Hibernate framework

The whole idea of Java beans used by a lot of tools / frameworks is based on data classes called beans that only contain fields and the related getters and setters. Wikipedia on JavaBeans

Bottom line:
If someone claims that something is 'bad' or 'a code smell' you should always look for the reasons given. If the reasons do not convince you ask someone else for better reasons or a different opinion. (Like you did here.)

edited Mar 29 '21 at 17:23

Deduplicator

8,591
5
31
50

answered Dec 15 '16 at 11:37

MrSmith42

1,041
7
12

1

The author does not say that pure data objects are "wrong". They say that pure data objects are a "code smell", which means that you should think twice about using them. – user949300 Mar 26 '19 at 00:09
1

@user949300, you seem confused. If the author refers to them as a code smell, then he indicates there might be something wrong with them. As they are widely recognised these days as very good practice, they clearly aren't a code smell. Thus MrSmith42 is correct: there is absolutely nothing wrong with them. – David Arno Mar 26 '19 at 08:46
@david arno Since OP's data objects are explicitly noted as mutable, they are a smell in both OOP and FP. – user949300 Mar 30 '21 at 00:28

score 6 · Answer 3 · answered Mar 27 '19 at 14:06

What you need to understand is that there are two kinds of objects:

Objects that have behavior. These should refrain from giving public access to most/any of their data members. I expect only very few accessor methods defined for these.

An example would be a compiled regex: The object is created to provide a certain behavior (to match a string against a specific regex, and to report the (partial) matches), but how the compiled regex does its work is none of the user's business.

Most classes that I write are in this category.
Objects that are really just data. These should just declare all of their members public (or provide the full set of accessors for them).

An example would be a class Point2D. There is absolutely no invariant that needs to be ensured for the members of this class, and users should be able to just access the data via myPoint.x and myPoint.y.

Personally, I don't use such classes much, but I guess there is no larger piece of code that I've written that doesn't use such a class somewhere.

Becoming proficient with object orientation includes realizing that this distinction exists, and learning to classify a class' function into one of these two categories.

If you code in C++, you can make this distinction explicit by using class for the first category of objects, and struct for the second. Of course, the two are equivalent, except that class means that all members are private by default, while struct declares all members public by default. Which is exactly the sort of information you want to communicate.

Thanks for the answer. I hate it when people downvote for no reason. If you have a problem with an answer, **explain why**. — Michael Haddad, Nov 30 '19 at 17:28
@OlleHärstedt That's just a special case of objects that have behavior, the special case that has no accessors by definition, actually. — cmaster - reinstate monica, Apr 20 '21 at 11:00

score 5 · Answer 4 · answered Mar 25 '19 at 23:30

5

A good argument why by Martin Fowler:

"Tell-Don't-Ask is a principle that helps people remember that object-orientation is about bundling data with the functions that operate on that data. It reminds us that rather than asking an object for data and acting on that data, we should instead tell an object what to do. This encourages to move behavior into an object to go with the data."

https://martinfowler.com/bliki/TellDontAsk.html

answered Mar 25 '19 at 23:30

Curtis Yallop

217
3
3

2

The problem here is that Fowler artificially restricts "tell don't ask" by changing functions from asking a wider scope to just asking the object scope. They are still asking. "Tell don't ask" can in fact be taken a step further by truly telling those functions via their argument lists. And thus we arrive at data objects and separate (data wise) functions being the true implementation of "tell don't ask". So rather than being a good argument for the claim that data classes are a code smell, it in fact further proves the opposite. – David Arno Mar 26 '19 at 09:59
2

@DavidArno You’re forgetting about encapsulation and hiding. You tell an object to execute a member method, and the member method goes into the black box of the object and does whatever it has to to get the answer. If you ask an object from outside, you don’t have access to its private state, and so either the object exposes more state than is wise, or the asker has to jump through more hoops than should be necessary. I don’t see why you’d ever “ask” an object in an OO environment. (Other programming paradigms may call for different approaches, of course.) – Marnen Laibow-Koser Mar 26 '19 at 22:09
@MarnenLaibow-Koser, no Marnen, you may rest assured that I've in no way forgotten about encapsulation. Let's take an example. I have a public method, `Foo`. It calls a private method, `Bar`. `Bar` needs access to a piece of data, `baz`. With Fowler's take on "tell, don't ask", `baz` will be a private field inside the object that was passed in via the constructor and `Bar` will access it, ie `Bar` *asks* the object for `baz`. There's no true "tell" going on beyond the object being told. In this obsession with thinking objects we've forgotten the poor method. It's relegated to 2nd class. – David Arno Mar 27 '19 at 08:14
@MarnenLaibow-Koser, But that method need not be relegated in that way. We can arrange things so that `Bar` is *told* about `baz`. In Java and C# for example, we do this by marking the method `static` and passing `baz` in as a parameter. And suddenly we now have "tell, don't ask" operating all the way down, rather than stopping at the object boundery. But we've also done something else by marking it `static`, we've removed it from the object. We have broken free of the "*if you’re doing OO, do OO fully*" dogma and in the process we have increased encapsulation. – David Arno Mar 27 '19 at 08:14
@MarnenLaibow-Koser, Having taken that first step to removing those "do OO fully" chains from ourselves, we can then take thing further. We can start questioning some really odd behaviour that some OO folk do, such as instantiating an object, passing in some data to the constructor and then calling a method, all to get a result and then discard the object. Why have the object? Make it a static method. But keep it pure of course, with immutable data of course, and - oh look - we've caught up with my answer. – David Arno Mar 27 '19 at 08:14
@MarnenLaibow-Koser, And that is why my answer is the most upvoted and the accepted one: because "*Pure functions, combined with immutable data-objects, is a demonstrably better way to code.*" is the *correct* answer. – David Arno Mar 27 '19 at 08:15
2

@DavidArno Of course you *can* pass `baz` in as a parameter to a static method, but to do that, you first need to *ask* the object for it. Perhaps in a programming paradigm where methods were primary (like, say, functional programming) this makes sense, but in an OO environment, it absolutely does not, because objects are primary and should contain both the data and the functions to act on it. Your claim that removing the method from the object has increased encapsulation is also *exactly backwards*, as far as I can tell, because it means that you now have `baz` appearing outside the object. – Marnen Laibow-Koser Mar 27 '19 at 11:48
@MarnenLaibow-Koser, and thus we get to the core problem: "*... but in an OO environment, it absolutely does not, because objects are primary and should contain both the data and the functions to act on it ...*". The fingers go in the ears and you start chanting "la la, la la; I'm not listening". You have closed your mind to alternatives and thus anyone suggesting those alternatives is wrong. Sad. – David Arno Mar 27 '19 at 11:52
@DavidArno ...and there’s no good reason to do that, at least if you claim to still be doing OO: it is not “demonstrably better” as you claim, it’s simply a technique from a different programming paradigm. What *is* generally demonstrably better, I believe, is picking a paradigm—OO, functional, procedural, whatever—and committing to it as fully as possible...and that’s where my “do OO fully” comment comes from. Although it’s useful to get inspiration from other paradigms, it’s generally best not to half-ass the paradigm you choose to work with. That just leads to confusing, inconsistent code. – Marnen Laibow-Koser Mar 27 '19 at 11:54
@DavidArno: No, I haven’t closed my mind to alternatives. I know there are many programming paradigms out there, and I use multiple ones myself. But you appear to be advocating working in an OO environment, but doing so in ways that ignore and contradict the basic principles of that environment, to no benefit that I can see. If you’re going to do that, why not just use a different type of environment to begin with? – Marnen Laibow-Koser Mar 27 '19 at 11:57
1

@MarnenLaibow-Koser, I don't claim to "be doing OO". I write code and I use good techniques to do so. Whether those techniques come from the functional paradigm, or the OO paradigm, or who-gives-a-damn paradigm is of no interest to me. Picking a paradigm and sticking to it as fully as possible is pure dogma. It is bad. It is ridiculous. It stifles you and results in inferior code. Don't do it. – David Arno Mar 27 '19 at 11:58
1

@DavidArno On the contrary, if you commit fully to a paradigm (any decent paradigm, not just OO), you get powerful high-level abstractions and logically consistent, maintainable code. I’m not saying this to be dogmatic, but rather pragmatic. I’ve seen and maintained too much code that was apparently produced with an attitude like yours, where the author didn’t really commit to the logical consistency of the system being used. It’s hard to understand, hard to maintain, and hard to modify. No paradigm is perfect, but generally a mixture (unless carefully considered) is harder to understand. – Marnen Laibow-Koser Mar 27 '19 at 12:05

score 5 · Answer 5 · answered Dec 16 '22 at 20:25

In Robert Martin's (Uncle Bob) book "Clean Code", he provides a great argument supporting data classes. He argues that "Data Structure" objects and "Data Transfer Objects" can be a good. They have data only and no functions.

Objects: hide their data (be private) and have functions to operate on that data.

Data Structures: show their data (be public) and have no functions.

The two concepts are opposites:

Procedural code (code using data structures)

Makes it easy to add new functions without changing the existing data structures.

Makes it hard to add new data structures because all the functions must change.

OO code (code using object oriented)

Makes it hard to add new functions because all the existing classes must change.

Makes it easy to add new classes without changing existing functions.

The Law of Demeter(LoD)

A method M of class C should only have access to C and M parameters. It should not access parameter1.getSubItem().getSubSubItem(). It should not know about the inner workings of its parameter classes.

Data Transfer Objects

This is a form of a data structure which is a class with public variables and no functions and sometimes called DTO. DTOs are very useful structures, especially when communicating with databases or parsing messages from sockets and so on.

Source: Clean Code | Chapter(6) | Objects and Data Structures

Source: https://www.linkedin.com/pulse/clean-code-chapter6-objects-data-structures-mahmoud-ibrahim

Uncle Bob requires that you not mix data classes and OO classes. So if your class has any logic, it becomes an OO class and if it also exposes it's internals via getters/setters, that is bad.

An interesting case of "Data Structure" classes is static inner classes. I regularly use "Data Structure" static-inner classes which are only accessible by the containing class. They are used to construct the data structures for my class. For example HashNode, ListNode, Pair, Tuple.

I would potentially even extend this static-inner-class argument to a module. There might be some "module-private" data-structure classes. They are not part of the module's public api. They are only for internal use inside the module (by the function-classes or services with a public api). But to another developer reading the code who is averse to data-structure classes, they might have trouble seeing the distinction that the class is a "module-private" class (not accessible by the module api) and just see a plain class among many classes in the repo which exposes all its internals publicly (this situation happened to me once in a PR review). So this kind of design can be slightly controversial/problematic.

I often program MVC code-bases. We have model objects or database ORM data objects. Should these be data-structures? Or should they be OO classes and have all their internals hidden? I find this is a common difficult situation hit by a lot of people. And these ORM classes commonly have both OO methods and getters/setters to access the database data. I don't have a great, definite answer to this. I don't necessarily think that all model objects should be locked down with zero getters, with a zealous fanaticism. But Tell-Don't-Ask can be a good principle to try to follow. Whatever you do, I really believe in simple, readable code. I feel like ORM classes are a special example because they are like the api to accessing the database. (Note also that the database can be thought of as a store of globals!)

What is definitely bad is when every object everywhere can freely reach into any other object freely without constraints, across a huge code-base. Particularly if there are lots of globals/singletons/globally-injected-services. This descends into a spaghetti mess. You want to reduce the scope of what-depends-on-what. If I change the structure of this class, what will break? Nothing outside the module should break if you have not changed the module public api. More importantly, when you are troubleshooting, it is hard to reason about code if an object's internals can be fiddled with all across the code-base.

score 4 · Answer 6 · answered Mar 29 '21 at 15:41

Here's what Martin Fowler has to say about data classes in his book "Refactoring":

Such classes are dumb data holders and are often being manipulated in far too much detail by other classes.

Data classes are often a sign of behavior in the wrong place, ...

Note the use of the word "often". Data classes are not always problematic. The way I understand it, such classes are a "code smell" in the sense that you should find out if there is any behavior related to them that is implemented somewhere outside the classes. If there is no such behavior, you are good. Continue using them. Otherwise, refactor them by moving the behavior to those classes.

Mike Robinson · Answer 7 · 2021-03-29T22:45:46.437

In my experience, "data-only classes" are often used to ensure that the object will never accept an incorrect value – nor, deliver one. The class is brimming with "trip wires," at least in development mode, which will throw exceptions if any of the values they're being assigned, or that they are asked to produce, are wrong.

And, in my humble experience, "that is a life-saver!"

The benefit of this strategy is simply that it gives you a way to detect the problem – to realize that the boat is sinking in the first place – and do so the moment that it happens. "Gotcha!! There's the culprit ... line #2 in the traceback." Without this, you never have known that the problem existed. And you certainly wouldn't [yet ...] know where.

Conversely: "if the exception didn't go off, the bug that you're looking for isn't right here."

"Always write code that is suspicious – looking for trouble."

Why are data classes considered a code smell?

7 Answers7