41

This question is somewhat language-agnostic, but not completely, since Object Oriented Programming (OOP) is different in, for example, Java, which doesn't have first-class functions, than it is in Python.

In other words, I feel less guilty for creating unnecessary classes in a language like Java, but I feel like there might be a better way in the less boilerplate-y languages like Python.

My program needs to do a relatively complex operation a number of times. That operation requires a lot of "bookkeeping", has to create and delete some temporary files, etc.

That's why it also needs to call a lot of other "suboperations" - putting everything into one huge method isn't very nice, modular, readable, etc.

Now these are approaches that come to my mind:

1. Make a class that has one public method only and keeps the internal state needed for the suboperations in its instance variables.

It would look something like this:

class Thing:

    def __init__(self, var1, var2):
        self.var1 = var1
        self.var2 = var2
        self.var3 = []

    def the_public_method(self, param1, param2):
        self.var4 = param1
        self.var5 = param2
        self.var6 = param1 + param2 * self.var1
        self.__suboperation1()
        self.__suboperation2()
        self.__suboperation3()


    def __suboperation1(self):
        # Do something with self.var1, self.var2, self.var6
        # Do something with the result and self.var3
        # self.var7 = something
        # ...
        self.__suboperation4()
        self.__suboperation5()
        # ...

    def suboperation2(self):
        # Uses self.var1 and self.var3

#    ...
#    etc.

The problem I see with this approach is that the state of this class makes sense only internally, and it can't do anything with its instances except call their only public method.

# Make a thing object
thing = Thing(1,2)

# Call the only method you can call
thing.the_public_method(3,4)

# You don't need thing anymore

2. Make a bunch of functions without a class and pass the various internally needed variables between them (as arguments).

The problem I see with this is that I have to pass a lot of variables between functions. Also, the functions would be closely related to each other, but they wouldn't be grouped together.

3. Like 2. but make the state variables global instead of passing them.

This would be no good at all, since I have to do the operation more than once, with different input.

Is there a fourth, better, approach? If not, which one of these approaches would be better, and why? Is there something I'm missing?

Aaron Hall
  • 5,895
  • 4
  • 25
  • 47
iCanLearn
  • 1,311
  • 3
  • 13
  • 17
  • 10
    "The problem I see with this approach is that the state of this class makes sense only internally, and can't do anything with its instances except call their only public method." You have articulated this as a problem, but not why you think it is. – Patrick Maupin Sep 12 '15 at 15:07
  • @Patrick Maupin - You're right. And I don't really know, that's the problem. It just feels like I'm using classes for a thing for which something else should be used, and there are a lot of things in Python I haven't explored yet so I thought maybe someone would suggest something more suitable. – iCanLearn Sep 12 '15 at 15:28
  • Maybe it's about being clear about what I'm trying to do. Like, I don't see anything particulary wrong with using ordinary classes in place of enums in Java, but still there are things for which enums are more natural. So this question is really about whether there's a more natural approach to what I'm trying to do. – iCanLearn Sep 12 '15 at 15:30
  • Personally, I would try it out, and if I didn't find myself creating a bunch of globals or writing too much boilerplate, I would be happy. – Patrick Maupin Sep 12 '15 at 15:37
  • 3
    possible duplicate of [What is the real responsibility of a class?](http://programmers.stackexchange.com/questions/220230/what-is-the-real-responsibility-of-a-class) – gnat Sep 12 '15 at 16:05
  • If the operation makes more sense as a single function, then make the class anyway, then create a wrapper function that creates an instance and calls the public method. – user253751 Sep 13 '15 at 02:32
  • Related: [Don't Get Class Happy](http://zxq9.com/archives/877). Your confusion is Nature telling you that you really want to write a small module of *functions*, not an actual *class*. You are trying to *do* something, not *instantiate* a thing. In their own module they will be in their own namespace, be semantically linked together, and understandable on their own without introducing the lie of long-lived state. Looked at another way, classes are merely a funky syntax over dispatching closures -- is that the solution you would have reached for in a functional language? – zxq9 Sep 13 '15 at 11:01
  • 1
    If you just disperse your existing code into methods and instance variables and change nothing about its structure then you win nothing but lose clarity. (1) is a bad idea. – usr Sep 13 '15 at 11:43
  • *Extract Method Object* for the win! – Michael Deardeuff Sep 14 '15 at 03:14
  • I tried replying but couldn't so here it is in a comment: Option 4: Use a closure. If you ever find yourself writing a class with only one public method, you probably want a closure instead. I like this video: https://www.youtube.com/watch?v=o9pEzgHorH0 – Átila Neves Sep 14 '15 at 09:53
  • You are in Python. So you can just make the object look like a function by implementing `__call__`. That will do away with the single public method. – Siyuan Ren Sep 16 '15 at 09:37
  • As an aside (and to clear up a misconception), Java *does* have first class functions. It's had them since Java 8. It's arguably a bit different from how some languages do it, but the end results are more or less the same. – Kat Sep 16 '15 at 18:18

6 Answers6

46
  1. Make a class that has one public method only and keeps the internal state needed for the suboperations in its instance variables.

The problem I see with this approach is that the state of this class makes sense only internally, and can't do anything with its instances except call their only public method.

Option 1 is a good example of encapsulation used correctly. You want the internal state to be hidden from outside code.

If that means your class only has one public method, then so be it. It'll be that much easier to maintain.

In OOP if you have a class that does exactly 1 thing, has a small public surface, and keeps all its internal state hidden, then you are (as Charlie Sheen would say) WINNING.

  1. Make a bunch of functions without a class and pass the various internally needed variables between them (as arguments).

The problem I see with this is that I have to pass a lot of variables between functions. Also, the functions would be closely related to each other, but wouldn't be grouped together.

Option 2 suffers from low cohesion. This will make maintenance more difficult.

  1. Like 2. but make the state variables global instead of passing them.

Option 3, like option 2, suffers from low cohesion, but much more severely!

History has shown that the convenience of global variables is outweighed by the brutal maintenance cost it brings. That's why you hear old farts like me ranting about encapsulation all the time.


The winning option is #1.

MetaFight
  • 11,549
  • 3
  • 44
  • 75
  • 6
    #1 is a pretty ugly API, though. If I decided to make a class for this, I'd probably make a single public function that delegates to the class and make the whole class private. `ThingDoer(var1, var2).do_it()` vs. `do_thing(var1, var2)`. – user2357112 Sep 12 '15 at 19:06
  • 1
    Sure, that might be a very sensible thing to do in Python. My answer was mainly about OO principles. – MetaFight Sep 12 '15 at 19:17
  • 7
    While option 1 is a clear winner, I would go one step further. Instead of using internal object state, use local variables and method parameters. This makes the public function reentrant, which is safer. –  Sep 12 '15 at 19:30
  • 4
    One extra thing you could do in extension of option 1: make the resulting class protected (adding an underscore in front of the name), and then define a module-level function `def do_thing(var1, var2): return _ThingDoer(var1, var2).run()`, to make the external API a bit prettier. – Sjoerd Job Postmus Sep 12 '15 at 19:43
  • 4
    I don't follow your reasoning for 1. The internal state is already hidden. Adding a class does not change that. I therefore don't see why you recommend (1). In fact it is not necessary to expose the existence of the class at all. – usr Sep 13 '15 at 11:44
  • To be honest, I was only looking at this from an OO perspective. The primary tool for encapsulation in my main language (C#) is the class. I'm not too familiar with Python anymore so I just assumed it was the same. If proper encapsulation can be achieved without a class, then you're absolutely right, the class isn't necessarily the best or only option. – MetaFight Sep 13 '15 at 11:59
  • @MetaFight I think usr's point was that if you have a function with some local variables, then calling that function twice doesn't have the potential for any conflicts between the two. E.g., (in C-ish) `int foo() { int x = 3; x++; return x; }` can be called at the same time by as many threads as you want. The local state `x` is encapsulated in the function. However, once you do (in Java-ish) `class Fooer { private int x; int foo() { x = 3; x++; return x; } }`, you can end up with unexpected results if the same Fooer's foo() method is called concurrently (e.g., both assignments are executed, – Joshua Taylor Sep 14 '15 at 00:06
  • 1
    then both increments, and then both methods return 5 instead of 4. So @usr's point is that the function with local variables actually probably provides *safer* encapsulation than an implementation with a class here. I haven't written any C#, but I'd expect that this kind of behavior exists there, too, unless all method/function "local" variables in C# methods/functions are actually instance fields, which seems unlikely. – Joshua Taylor Sep 14 '15 at 00:08
  • 3
    Any class that you instantiate and then immediately call a single method on and then never use again is a major code smell, to me. It's isomorphic to a simple function, only with obscured data flow (as the broken out pieced of the implementation communicate by assigning variables on the throwaway instance instead of passing parameters). If your internal functions take so many parameters that the calls are complex and unwieldy the code doesn't get *less* complicated when you hide that dataflow! – Ben Sep 14 '15 at 08:13
  • See my last comment. – MetaFight Sep 14 '15 at 08:22
  • Dude "winning" is such a cringeworthy word... even if intended to be used humorously I suppose – xji Sep 15 '15 at 04:13
  • What can I say? It just felt right at the time :) – MetaFight Sep 15 '15 at 11:24
24

I think #1 is actually a bad option.

Let's consider your function:

def the_public_method(self, param1, param2):
    self.var4 = param1
    self.var5 = param2 
    self.var6 = param1 + param2 * self.var1
    self.__suboperation1()
    self.__suboperation2()
    self.__suboperation3()

Which pieces of data does suboperation1 use? Does it collect data used by suboperation2? When you pass data around by storing it on self, I cannot tell how the pieces of functionality are related. When I look at self, some of the attributes are from the constructor, some from the call to the_public_method, and some randomly added in other places. In my opinion, its a mess.

What about number #2? Well, firstly, let's look at the second problem with it:

Also, the functions would be closely related to each other, but wouldn't be grouped together.

They would be in a module together, so they would totally be grouped together.

The problem I see with this is that I have to pass a lot of variables between functions.

In my opinion, this is good. This makes explicit the data dependencies in your algorithm. By storing them whether in global variables or on a self, you hide the dependencies and make them seem less bad, but they are still there.

Usually, when this situation arises, it means that you've not found the right way to decompose your problem. You are finding it awkward to split into multiple functions because you are trying to split it the wrong way.

Of course, without seeing your actual function its hard to guess at what would be a good suggestion. But you give a bit of hint of what you are dealing with here:

My program needs to do a relatively complex operation a number of times. That operation requires a lot of "bookkeeping", has to create and delete some temporary files etc.

Let me pick an example of something that sorta fits your description, an installer. An installer has to copy a bunch of files, but if you cancel it part way through, you need to rewind the whole process including putting back any files that you replaced. The algorithm for this looks something like:

def install_program():
    copied_files = []
    try:
        for filename in FILES_TO_COPY:
           temporary_file = create_temporary_file()
           copy(target_filename(filename), temporary_file)
           copied_files = [target_filename(filename), temporary_file)
           copy(source_filename(filename), target_filename(filename))
     except CancelledException:
        for source_file, temp_file in copied_files:
            copy(temp_file, source_file)
     else:
        for source_file, temp_file in copied_files:
            delete(temp_file)

Now multiply that logic out for having to do registry settings, program icons, etc, and you've got quite a mess.

I think that your #1 solution looks like:

class Installer:
    def install(self):
        try:
            self.copy_new_files()
        except CancellationError:
            self.restore_original_files()
        else:
            self.remove_temp_files()

This does make the overall algorithm clearer, but it hides the way the different pieces communicate.

Approach #2 looks something like:

def install_program():
    try:
       temp_files = copy_new_files()
    except CancellationError as error:
       restore_old_files(error.files_that_were_copied)
    else:
       remove_temp_files(temp_files)

Now its explicit how pieces of the data move between functions, but it is very awkward.

So how should this function be written?

def install_program():
    with FileTransactionLog() as file_transaction_log:
         copy_new_files(file_transaction_log)

The FileTransactionLog object is a context manager. When copy_new_files copies a file it does it through the FileTransactionLog which handles making the temporary copy and keep track of which files have been copied. In the case of exception it copies the original files back, and in the case of success it deletes the temporary copies.

This works because we've found a more natural decomposition of the task. Previously, we were intermixing the logic about how to install the application with the logic about how to recover a cancelled install. Now the transaction log handles all of the details about temporary files and bookkeeping, and the function can focus on the basic algorithm.

I suspect that your case is in the same boat. You need to extract out the bookkeeping elements into some sort of object so that your complex task can expressed more simply and elegantly.

Winston Ewert
  • 24,732
  • 12
  • 72
  • 103
10

Since the only apparent drawback of method 1 is its suboptimal usage pattern, I think the best solution is to drive encapsulation one step further: Use a class, but also provide a free standing function, that only constructs the object, calls the method and returns:

def publicFunction(var1, var2, param1, param2)
    thing = Thing(var1, var2)
    thing.theMethod(param1, param2)

With that, you have the smallest possible interface to your code, and the class that you use internally really becomes just an implementation detail of your public function. Calling code never needs to know about your internal class.

4

On the one hand, the question is somehow language agnostic; but on the other hand, the implementation depends on the language and its paradigms. In this case it is Python, which supports multiple paradigms.

Besides your solutions, there is also a possibility, to do the operations complete stateless in a more functional way, e.g.

def outer(param1, param2):
    def inner1(param1, param2, param3):
        pass
    def inner2(param1, param2):
        pass
    return inner2(inner1(param1),param2,param3)

It all comes down to

  • readability
  • consistency
  • maintainability

But -If your codebase is OOP, it violates consistency if suddenly some parts are written in a (more) functional style.

Thomas Junk
  • 9,405
  • 2
  • 22
  • 45
4

Why Design when I can be Coding?

I offer a contrarian view to the answers I've read. From that perspective all the answers and frankly even the question itself focus on coding mechanics. But this is a design issue.

Should I create a class if my function is complex and has a lot of variables?

Yes, as it makes sense for your design. It may be a class unto itself or part of some other class or it's behavior may be distributed among classes.


Object Oriented Design is about Complexity

The point of OO is to successfully build and maintain large, complex systems by encapsulating the code in terms of the system itself. "Proper" design says everything is in some class.

OO design naturally manages complexity primarily via focused classes that adhere to the single responsibility principle. These class give structure and functionality throughout the breath and depth of the system, interact and integrate these dimensions.

Given that, it is often said that functions dangling about the system - the all too ubiquitous general utility class - is a code smell suggesting an insufficient design. I tend to agree.

radarbob
  • 5,808
  • 18
  • 31
  • *OO design naturally manages complexity* OOP naturally introduces unnecessary complexity. – Miles Rout Sep 15 '15 at 21:32
  • "Unnecessary complexity" reminds me of priceless comments from co-workers such as "oh, that's too many classes." Experience tells me that the juice is worth the squeeze. At it's best I see virtually entire classes with methods 1-3 lines long and with every class doing it's part complexity of any given method is minimized: A single _short_ LOC comparing two collections and returning the duplicates - of course there is code behind that. Don't confuse "a lot of code" to complex code. In any case, nothing is free. – radarbob Sep 15 '15 at 23:57
  • 1
    Lots of "simple" code that interacts in a complicated way is WAY worse than a small amount of complex code. – Miles Rout Sep 16 '15 at 00:00
  • 1
    The small amount of complex code has *contained complexity*. There's complexity, but only there. It doesn't leak out. When you have a whole lot of individually simple pieces that work together in a really complex and difficult to understand way that's much more confusing because there are no borders or walls to the complexity. – Miles Rout Sep 16 '15 at 00:01
3

Why not compare your needs to something that exists in the standard Python library, and then see how that is implemented?

Note, if you don't need an object, you can still define functions within functions. With Python 3 there is the new nonlocal declaration to allow you to change variables in your parent function.

You might still find it useful to have some simple private classes inside your function to implement abstractions and tidying up operations.

Peter Mortensen
  • 1,050
  • 2
  • 12
  • 14
meuh
  • 151
  • 1
  • 5
  • Thanks. And do you know anything in the standard library which sounds like what I have in the question? – iCanLearn Sep 12 '15 at 15:34
  • I'd find it difficult to quote something using nested functions, indeed I don't find `nonlocal` anywhere in my currently installed python libs. Perhaps you can take heart from `textwrap.py` which has a `TextWrapper` class, but also a `def wrap(text)` function which simply creates a `TextWrapper` instance, calls the `.wrap()` method on it and returns. So use a class, but add some convenience functions. – meuh Sep 12 '15 at 15:51