42

I am just designing my application and I am not sure if I understand SOLID and OOP correctly. Classes should do 1 thing and do it well but from the other hand they should represent real objects we work with.

In my case I do a feature extraction on a dataset and then I do a machine learning analysis. I assume that I could create three classes

  1. FeatureExtractor
  2. DataSet
  3. Analyser

But the FeatureExtractor class doesnt represent anything, it does something what makes it more of a routine than a class. It will have just one function that will be used : extract_features()

Is it correct to create classes that do not represent one thing but do one thing?

EDIT: not sure if it matters but I'm using Python

And if extract_features() would look like that: is it worth to create a special class to hold that method?

def extract_features(df):
    extr = PhrasesExtractor()
    extr.build_vocabulary(df["Text"].tolist())

    sent = SentimentAnalyser()
    sent.load()

    df = add_features(df, extr.features)
    df = mark_features(df, extr.extract_features)
    df = drop_infrequent_features(df)
    df = another_processing1(df)
    df = another_processing2(df)
    df = another_processing3(df)
    df = set_sentiment(df, sent.get_sentiment)
    return df
mkrieger1
  • 109
  • 6
Alicja Głowacka
  • 531
  • 1
  • 4
  • 6
  • What are you extracting features from? Maybe you need a FeatureSet with an extract method? – HorusKol Apr 15 '18 at 15:06
  • 13
    This looks perfectly fine as a function. Considering the three things you listed as *modules* is ok, and you might want to place them in different files, but that doesn't mean they need to be classes. – Bergi Apr 15 '18 at 15:37
  • 31
    Be aware that it's quite common and acceptable to use non-OO approaches in Python. – jpmc26 Apr 16 '18 at 01:34
  • In this particular case, it looks like you're building a *pipeline*. Perhaps what you actually want is a list of functions, each already curried into a single-argument form, which you then simply call in a loop. You *could* build a class to represent a `Pipeline` as a concept, but a free-standing function would probably work just as well. – Daniel Pryden Apr 16 '18 at 14:22
  • 13
    You may be interested in Domain Driven Design. The "classes should represent objects from the real world" is actually false... they should represent objects *in the domain*. The domain is often strongly linked to the real world, but depending on the application some things may or may not be considered objects, or some things that "in reality" are separate may end up either linked or identical inside the domain of the application. – Bakuriu Apr 16 '18 at 18:14
  • 1
    As you become more familiar with OOP, I think you'll find that classes very rarely correspond one-to-one with real-world entities. For example, here's an essay that argues trying to cram all the functionality associated with a real-world entity in a single class is very frequently an anti-pattern: http://programmer.97things.oreilly.com/wiki/index.php/The_Single_Responsibility_Principle – Kevin Apr 16 '18 at 21:20
  • If you cannot distinguish between a class and an instance of that class, then you don't need a class and *should not* use one if the language allows you to avoid it (which Python thankfully does). Don't get class happy. – zxq9 Apr 17 '18 at 15:46
  • 1
    "they should represent real objects we work with." not necessarily. A lot of languages have a stream class representing a stream of bytes, which is an abstract concept rather than a 'real object'. Technically a file system isn't a 'real object' either, it's just a concept, but sometimes there are classes representing a file system or a part of it. – Pharap Apr 17 '18 at 16:53
  • This function looks like it should be a method of whatever class your variable `df` is. – Brian H. Apr 18 '18 at 09:04

8 Answers8

97

Classes should do 1 thing and do it well

Yes, that is generally a good approach.

but from the other hand they should represent real object we work with.

No, that is a IMHO common misunderstanding. A good beginner's access to OOP is often "start with objects representing things from the real world", that is true.

However, you should not stop with this!

Classes can (and should) be used to structure your program in various ways. Modeling objects from the real world is one aspect of this, but not the only one. Creating modules or components for a specific task is another sensible use case for classes. A "feature extractor" is probably such a module, and even it contains only one public method extract_features(), I would be astonished if if does not also contain a lot of private methods and maybe some shared state. So having a class FeatureExtractor will introduce a natural location for these private methods.

Side note: in languages like Python which support a separate module concept one can also use a module FeatureExtractor for this, but in the context of this question, this is IMHO a negligible difference.

Moreover, a "feature extractor" can be imagined as "a person or bot which extracts features". That is an abstract "thing", maybe not a thing you will find in the real world, but the name itself is a useful abstraction, which gives everyone a notion of what the responsibility of that class is. So I disagree that this class does not "represent anything".

Doc Brown
  • 199,015
  • 33
  • 367
  • 565
  • I added a sample code of this function, could you please take a look at that? – Alicja Głowacka Apr 15 '18 at 12:44
  • 33
    I especially like that you mentioned the issue of internal state. I often find that to be the deciding factor for whether I make something a class or a function in Python. – David Z Apr 15 '18 at 14:54
  • @AlicjaGłowacka: you need also a location for `add_features`, `mark_features` etc., so a class `FeatureExtractor` could be a good place for this. Of course, in Python your `FeatureExtractor` could also be a Python module, which in this case probably won't make a big difference. See this older [SE post](https://softwareengineering.stackexchange.com/q/329348/9113). Note other programming languages like Java or C# don't have a comparable module concept, since it typically won't bring any benefits over the class concept. – Doc Brown Apr 15 '18 at 16:36
  • 17
    You seem to recommend “classitis”. **Do not** make a class for this, it’s a Java-style antipattern. If it’s a function, make it a function. Internal state is irrelevant: functions can have that (via closures). – Konrad Rudolph Apr 15 '18 at 19:41
  • 25
    @KonradRudolph: you seem to have missed that this is not about making *one* function. This is about a piece of code which requires *several* functions, a common name and maybe some shared state. Using a module for this might be sensible in languages with a module concept apart from classes. – Doc Brown Apr 15 '18 at 20:19
  • 8
    @DocBrown I’m gonna have to disagree. It’s a class with one public method. API wise, that’s *indistinguishable* from a function. See my answer for details. Using single-method classes can be justified if you need to create a specific *type* to represent a state. But this isn’t the case here (and even then, functions can sometimes be used). – Konrad Rudolph Apr 15 '18 at 20:20
  • 4
    @KonradRudolph: I am not just talking about the API design, and you know that, so please don't play with words. – Doc Brown Apr 15 '18 at 20:23
  • 1
    @DocBrown I’m not. Not sure what to add here. Read my answer. – Konrad Rudolph Apr 15 '18 at 20:24
  • 1
    @KonradRudolph: sorry, but your answer does not seem to give an explanation where to put the private functions if not in a class or module. – Doc Brown Apr 15 '18 at 20:29
  • 6
    @DocBrown I’m starting to see what you mean: you’re talking about the functions that are called from inside `extract_features`? I kind of just assumed that they were public functions from another place. Fair enough, I agree that if these are private then they should probably go into a module (but still: *not* a class, unless they share state), together with `extract_features`. (That said, you could of course declare them locally inside that function.) – Konrad Rudolph Apr 15 '18 at 20:32
  • 1
    @DocBrown: Are you aware that Python *has* modules? – user2357112 Apr 16 '18 at 04:40
  • 3
    @user2357112: didn't you read my comments above? However, I tried to give an answer which does not exclusively fit to Python. – Doc Brown Apr 16 '18 at 08:08
  • I find Doc Brown and Konrad Rudolph's discussion here fairly interesting. I've been a little bit divided on it myself, and I kind of wonder why local functions aren't more customary for things like this, in languages other than JavaScript and such. – Panzercrisis Apr 17 '18 at 15:49
  • @Panzercrisis: I hoped to stop this overthinking by adding the side note above. – Doc Brown Apr 18 '18 at 05:35
43

Doc Brown is spot-on: classes don’t need to represent real-world objects. They just need to be useful. Classes are fundamentally merely additional types, and what does int or string correspond to in the real world? They are abstract descriptions, not concrete, tangible things.

That said, your case is special. According to your description:

And if extract_features() would look like that: is it worth to create a special class to hold that method?

You absolutely right: if your code is as shown, there’s no use making it into a class. There’s a famous talk that argues that such uses of classes in Python are code smell, and that simple functions are often sufficient. Your case is a perfect example of this.

Overuse of classes is due to the fact that OOP became mainstream with Java in the 1990s. Unfortunately Java at the time lacked several modern language features (such as closures), which mean that many concepts were hard or impossible to express without the use of classes. For instance, it was impossible in Java until recently to have methods that carried state (i.e. closures). Instead, you had to write a class to carry the state, and which exposed a single method (called something like invoke).

Unfortunately this style of programming became popular far beyond Java (partly due to an influential software engineering book that’s otherwise very useful), even in languages that don’t require such workarounds.

In Python, classes are obviously a very important tool and should be used liberally. But they’re not the only tool, and there’s no reason to use them where they don’t make sense. It’s a common misconception that free functions have no place in OOP.

Konrad Rudolph
  • 13,059
  • 4
  • 55
  • 75
  • It would be useful to add that if the functions called in the example are in fact private functions, encapsulating them in a module or class would be entirely appropriate. Otherwise, I agree entirely. – Leliel Apr 15 '18 at 22:49
  • 11
    Also useful to remember that "OOP" does not mean "write a bunch of classes". Functions are objects in Python, so there's no need to consider this as "not OOP". Rather, it's simply *reusing* the built-in types/classes, and "reuse" is one of the holy grails in programming. Wrapping this in a class would be preventing reuse, since nothing would be compatible with this new API (unless `__call__` is defined, in which case just use a function!) – Warbo Apr 16 '18 at 00:35
  • +1 "classes don’t need to represent real-world objects" Often they cannot or should not. – Alan Larimer Apr 16 '18 at 01:55
  • 3
    Also on the subject of "classes vs. free-standing functions": https://eev.ee/blog/2013/03/03/the-controller-pattern-is-awful-and-other-oo-heresy/ – Joker_vD Apr 16 '18 at 10:52
  • @Warbo: While functions are objects according to Python terminology (since all values are objects including integers, booleans etc), this does not mean that any program written in Python is OOP even if it doesn't use classes. Python is multiparadigm which means that you don't need to use OOP. – JacquesB Apr 16 '18 at 11:01
  • @Joker_vD Yes, Eevee is always worth reading (so is Armin, but Armin is simply wrong on this particular matter). Funny enough, but understandably, Eevee also made a mistake here: you don’t need `this` for OOP. All you need is closures. – Konrad Rudolph Apr 16 '18 at 11:03
  • @JacquesB While true, a program written without Python classes *can* be OOP (it would just be terribly convoluted and against the spirit of Python). Other languages without formal classes do OOP just fine. – Konrad Rudolph Apr 16 '18 at 11:13
  • @JacquesB Yes, it's simply a matter of which description is more appropriate. I've worked with people who instinctively reject "non-OOP" solutions outright, and have found this argument (calling it "OOP" because the values are technically objects) is useful for getting past this initial barrier to discussion :) – Warbo Apr 16 '18 at 11:55
  • 1
    Isn't it the case, though that in Python "free-functions" are also objects with a type that expose a method `__call__()`? Is that really so different from an anonymous inner class instance? Syntactically, sure but from a language design, it seems like a less significant distinction than you present here. – JimmyJames Apr 16 '18 at 20:07
  • 1
    @JimmyJames Right. The whole point is that they offer the same functionality for the specific purpose but are simpler to use. – Konrad Rudolph Apr 16 '18 at 23:57
36

I am just designing my application and I am not sure if I understand SOLID and OOP correctly.

Been at this over 20 years and I'm not sure either.

Classes should do 1 thing and do it well

Hard to go wrong here.

they should represent real objects we work with.

Oh really? Let me introduce you to the single most popular and successful class of all time: String. We use it for text. And the real world object it represents is this:

Fisherman holds 10 fish suspended from a string of cord

Why no, not all programmers are obsessed with fishing. Here we are using something called a metaphor. It's OK to make models of things that don't really exist. It's the idea that must be clear. You're creating images in the minds of your readers. Those images don't have to be real. Just understood easily.

A good OOP design clusters messages (methods) around data (state) so that the reactions to those messages can vary depending on that data. If doing that models some real world thing, spiffy. If not, oh well. So long as it makes sense to the reader, it's fine.

Now sure, you could think of it like this:

festive letters suspended from a string read "LETS DO THINGS!"

but if you think this has to exist in the real world before you can make use of the metaphor, well your programming career is going to involve lots of arts and crafts.

candied_orange
  • 102,279
  • 24
  • 197
  • 315
  • 1
    A pretty string of pictures... – Zev Spitz Apr 17 '18 at 04:30
  • At this point I'm not sure "string" is even metaphorical. It just has a specific meaning in the programming domain, as do words like `class` and `table` and `column`... – Kyralessa May 06 '18 at 13:16
  • @Kyralessa you ether teach the newbie the metaphor or you let it be magic to them. Please save me from coders that believe in magic. – candied_orange May 06 '18 at 15:39
  • @candied_orange Everyone starts with magic. Understanding comes later. When you were a kid, did you have to understand that helium is lighter than other gases in order to understand that if you let go of your balloon, it'll fly away forever? It's OK to start with magic and gradually move toward understanding. – Kyralessa Dec 04 '20 at 15:33
  • @Kyralessa don’t conflate belief with wonder. – candied_orange Dec 04 '20 at 15:46
7

Beware! Nowhere does SOLID say a class should only "do one thing". If that was the case, classes would only ever have a single method, and there wouldn't really be a difference between classes and functions.

SOLID says a class should represent a single responsibility. These are kind of like the responsibilites of persons in a team: The driver, the lawyer, the pickpocket, the graphic designer etc. Each of these persons can perform multiple (related) tasks, but all pertaining to a single responsibility.

The point of this is - if there is a change in the requirements, you ideally only need to modify a single class. This just makes the code easier to understand, easier to modify and reduces risk.

There is no rule that an an object should represent "a real thing". This is just cargo-cult lore since OO was initially invented for the use in simulations. But your program is not a simulation (few modern OO applications is), so this rule does not apply. As long as each class have a well-defined responsibility, you should be fine.

If a class really only have a single method and the class does not have any state, you could consider making it a stand-alone function. This is certaily fine and follows the KISS and YAGNI principles - no need to make a class if you can solve it with a function. On the other hand, if you have reason to believe you might need internal state or multiple implementations, you might as well make it a class up-front. You will have to use your best judgment here.

JacquesB
  • 57,310
  • 21
  • 127
  • 176
  • +1 for "no need to make a class if you can solve it with a function". Sometimes somebody needs to speak the truth. – tchrist Apr 17 '18 at 11:43
5

Is it correct to create classes that do not represent one thing but do one thing?

In general that is OK.

Without a little bit more specific description what the FeatureExtractor class is supposed to do exactly it is hard to tell.

Anyways, even if the FeatureExtractor exposes only a public extract_features() function, I could think of configuring it with a Strategy class, wich determines how exactly the extraction should be done.

Another example is a class with a Template function.

And there are more Behavioral Design Patterns, which are based on class models.


As you added some code for clarification.

And if extract_features() would look like that: is it worth to create a special class to hold that method?

The line

 sent = SentimentAnalyser()

exactly comprises what I meant that you could configure a class with a Strategy.

If you have an interface for that SentimentAnalyser class, you can pass it to the FeatureExtractor class at its point of construction, instead of directly coupling to that specific implementation in your function.

πάντα ῥεῖ
  • 1,540
  • 3
  • 12
  • 19
  • 2
    I don't see a reason to add complexity (a `FeatureExtractor` class) just to introduce even more complexity (an interface for the `SentimentAnalyser` class). If decoupling is desirable, then the `extract_features` can take the `get_sentiment` function as an argument (the `load` call seems to be independent of the function and only called for its effects). Also note that Python doesn't have/encourage interfaces. – Warbo Apr 16 '18 at 00:28
  • 1
    @warbo - even if you supply the function as an argument, by making it a function, you restrict potential implementations that ones that will fit into the format of a function, but if there is a need to manage persistent state between one invocation and the next (e.g. a `CacheingFeatureExtractor` or a `TimeSeriesDependentFeatureExtractor`) then an object would be a much better fit. Just because there's no need for an object *currently* doesn't mean there never will be. – Jules Apr 16 '18 at 08:56
  • 3
    @Jules Firstly You Ain't Gonna Need It (YAGNI), secondly Python functions can reference persistent state (closures) if you need it (you ain't gonna), thirdly using a function doesn't restrict anything since any object with a `__call__` method will be compatible if you need it (you ain't gonna), fourthly by adding a wrapper like `FeatureExtractor` you're making the code incompatible with all other code ever written (unless you provide a `__call__` method, in which case a function would clearly simpler) – Warbo Apr 16 '18 at 11:49
0

Patterns and all the fancy language/concepts aside: what you have stumbled across is a Job or a Batch Process.

At the end of the day, even a pure OOP program needs to somehow be driven by something, to actually perform work; there must be an entry point somehow. In the MVC pattern, for example, the "C"ontroller receives click etc. events from the GUI and then orchestrates the other components. In classic command line tools, a "main" function would do the same.

Is it correct to create classes that do not represent one thing but do one thing?

Your class represents an entity that does something and orchestrates everything else. You can name it Controller, Job, Main or whatever comes to mind.

And if extract_features() would look like that: is it worth to create a special class to hold that method?

That depends on circumstances (and I'm not familiar with the usual way this is done in Python). If this is just a small one-shot command line tool, then a method instead of a class should be fine. The first version of your program can get away with a method, for sure. If, later, you find that you end up with dozens of such methods, maybe even with global variables mixed in, then it's time to refactor into classes.

AnoE
  • 5,614
  • 1
  • 13
  • 17
  • 3
    Note that in scenarios like this, calling standalone procedures "methods" can be confusing. Most languages, including Python, call them "functions". "Methods" are functions/procedures which are bound to a particular instance of a class, which is the opposite of your usage of the term :) – Warbo Apr 16 '18 at 00:31
  • True enough, @Warbo. Or we could call them procedure or defun or sub or ...; and they may be class methods (sic) not related to an instance. I hope the gentle reader will be able to abstract away the intended meaning. :) – AnoE Apr 16 '18 at 05:57
  • @Warbo that's good to know! Most learning material I've come across states that the terms function and method are interchangeable, and that it's merely a language-dependent preference. – Dom Apr 16 '18 at 19:23
  • @Dom In *general* a ("pure") "function" is a mapping from input values to output values; a "procedure" is a function which can also cause effects (e.g. deleting a file); both are statically dispatched (i.e. looked up in the lexical scope). A "method" is a function or (usually) procedure which is dynamically dispatched (looked up) from a value (called an "object"), which is automatically bound to an (implicit `this` or explicit `self`) argument of the method. An object's methods are mutually "openly" recursive, so replacing `foo` causes all `self.foo` calls to use this replacement. – Warbo Apr 17 '18 at 13:43
0

We can think of OOP as modelling the behaviour of a system. Note that the system doesn't have to exist in the 'real world', although real-world metaphors can sometimes be useful (e.g. "pipelines", "factories", etc.).

If our desired system is too complicated to model all at once, we can break it down into smaller pieces and model those (the "problem domain"), which may involve breaking down further, and so on until we get to pieces whose behaviour matches (more or less) that of some built-in language object like a number, a string, a list, etc.

Once we have those simple pieces, we can combine them together to describe the behaviour of larger pieces, which we can combine together into even larger pieces, and so on until we can describe all of the components of the domain that are needed for a whole system.

It is this "combining together" phase where we might write some classes. We write classes when there isn't an existing object which behaves in the way we want. For example, our domain might contain "foos", collections of foos called "bars", and collections of bars called "bazs". We might notice that foos are simple enough to model with strings, so we do that. We find that bars require their contents to obey some particular constraint which doesn't match anything Python provides, in which case we might write a new class to enforce this constraint. Perhaps bazs have no such peculiarities, so we can just represent them with a list.

Note that we could write a new class for every one of those components (foos, bars and bazs), but we don't need to if there's already something with the correct behaviour. In particular, for a class to be useful it needs to 'provide' something (data, methods, constants, subclasses, etc.), so even if we have many layers of custom classes we must eventually use some built-in feature; for example, if we wrote a new class for foos it would probably just contain a string, so why not forget the foo class and have the bar class contain those strings instead? Keep in mind that classes are also a built-in object, they're just a particularly flexible one.

Once we have our domain model, we can take some particular instances of those pieces and arrange them into a "simulation" of the particular system that we want to model (e.g. "a machine learning system for ...").

Once we have this simulation, we can run it and hey presto, we have a working (simulation of a) machine learning system for ... (or whatever else we were modelling).


Now, in your particular situation you're trying to model the behaviour of a "feature extractor" component. The question is, are there any built-in objects which behave like a "feature extractor", or will you need to break it up into simpler things? It looks like feature extractors behave very much like function objects, so I think you'd be fine to use those as your model.


One thing to keep in mind when learning about these sorts of concepts is that different languages can provide different built-in features and objects (and, of course, some don't even use terminology like "objects"!). Hence solutions which make sense in one language might be less useful in another (this can even apply to different versions of the same language!).

Historically, a lot of the OOP literature (especially "design patterns") has focused on Java, which is quite different from Python. For example, Java classes are not objects, Java didn't have function objects until very recently, Java has strict type checking (which encourages interfaces and subclassing) whilst Python encourages duck-typing, Java doesn't have module objects, Java integers/floats/etc. aren't objects, meta-programming/introspection in Java requires "reflection", and so on.

I'm not trying to pick on Java (as another example, a lot of OOP theory revolves around Smalltalk, which is again very different from Python), I'm just trying to point out that we must think very carefully about the context and constraints in which solutions were developed, and whether that matches the situation we're in.

In your case, a function object seems like a good choice. If you're wondering why some "best practice" guideline doesn't mention function objects as a possible solution, it might simply be because those guidelines were written for old versions of Java!

Warbo
  • 1,205
  • 7
  • 11
0

Pragmatically speaking, when I have a "miscellaneous thing that does something important and should be separate", and it doesn't have a clear home, I put it in a Utilities section and use that as my naming convention. ie. FeatureExtractionUtility.

Forget about the number of methods in a class; a single method today may need to grow to five methods tomorrow. What matters is a clear and consistent organisational structure, such as a utilities area for miscellaneous collections of functions.

Dom
  • 570
  • 3
  • 7