12

In python I often see functions with a lot of arguments. For example:

def translate(a, b, c, d, e, f, g, h, i, j, k, l, m, n, o, p):
    // some code
    return(x, y, z)

I like this pattern in some cases. I think it makes a ton of sense in library type situations where the variables are optional keyword arguments (e.g. pd.DataFrame declarations).

However, I also see instances (locally developed custom functions) where where all of the inputs are essentially mandatory. In these cases the function is typically called elsewhere in the program and will be formatted something along the lines of:

x, y, z = translate(a, b, c, d, e, f, g, h, i, j, k, l, m, n, o, p)

I dislike several things about this:

  • Readability - the function call ends up being very long, which makes it hard to read/digest and can sometimes obstruct the readability of the script it is sitting in
  • Re-use of variable names - the local variable a in translate() is not the same entity as the variable a in the script
  • Jumbled variables - it is very easy to accidentally write translate(b, a, c, d, e, f, g, h, i, j, k, l, m, n, o, p) because most variable names don't have an inherent/obvious order. This can be avoided by specifying the keywords but this makes the function call even longer. Imagine translate(a=a, b=b, c=c, ...) with real variable names.

To resolve/avoid the above problem I started to use dictionaries to pass large numbers of variables between functions. Then I noticed that I could also use the dictionaries to return variables....

Using the above example:

def translate(dict_of_values):
    // some code
    dict_of_values['x'] = something
    dict_of_values['y'] = something_else
    dict_of_values['z'] = something_other

// and if I want to call the function I state:
some_dict['a'] = 1 // also populate values for b, c, d, ..., p
translate(some_dict)

My question is as follows:

  1. Does this coding pattern have a name?
  2. Will other programmers easily understand the format?
  3. What problems am I introducing that will bite me in the future?
  4. Is there a better alternative, assuming that I can't avoid functions that have a large number of mandatory variables?

I understand that I could be using **kwargs by writing defining the function as translate(**dict_of_values): and then calling translate(**some_dict) but I can't see any particular advantage to doing so. If anything, it would make the code slightly more verbose as I'd have to add return and assignment statements to achieve the same end point.

P. Hopkinson
  • 239
  • 2
  • 5
  • 57
    A lot of people would argue that you are solving the wrong problem. Instead of the problem "how can I avoid confusing the many parameters", you should be solving "Why do I have so many parameters in the first place?" IOW, they would challenge the underlying assumption in your question #4 – Jörg W Mittag Oct 27 '21 at 12:12
  • I'm writing simulations/models that inherently require a lot of disparate parameters. – P. Hopkinson Oct 27 '21 at 12:19
  • 6
    In a statically typed language, the suggestion would to define a [parameter object](https://refactoring.guru/introduce-parameter-object). A dictionary is one kind of parameter object – Caleth Oct 27 '21 at 13:18
  • 9
    @P.Hopkinson Even if what you're doing in that function "inherently requires" all the parameters that's currently being passed to it, that doesn't necessarily mean that you can't split up the function, combine some parameters that fit together into objects, put the function in a class and pass some parameters in the constructor or something else. Although with modelling things it's quite common to use just DataFrames and Series since the benefit of explicitly defining all the parameters usually isn't significant enough to make it worth it (although it depends what those parameters are). – NotThatGuy Oct 27 '21 at 21:24
  • 3
    Look at what your parameters represent and how they can be combined into a single object. Perhaps a, b and c are coordinates for something - so create a Coordinate object and combine the 3 parameters into one. In particular look for combinates of parameters which are always used together in your code - that's a sign they could be made properties of some other object. – David Waterworth Oct 28 '21 at 03:09
  • 3
    Also the dictionary patterns is somewhat common in machine learning for example, where you're building a pipeline. For example most `allennlp` modules return a dictionary contain all the various outputs computed by the module. But most modules accept specific arguments along with kwargs. This makes it easy to chain the output from one module to the input of another - `def f(x, y, **kwargs):` then `outputs = f(**inputs)` where inputs is a dictionary from the previous step, calling f with inputs will unpack x and y from the dict and put the rest into kwargs which the module may ignore. – David Waterworth Oct 28 '21 at 03:25
  • 1
    @DavidWaterworth so you can use the **kwargs to absorb excess dictionary elements a little bit like a sponge/bin? Interesting! I knew you could unpack dictionaries as keyword arguments and knew you could declare an arbitrary number of kwargs but hadn't joined the dots between those two ideas. – P. Hopkinson Oct 28 '21 at 09:02
  • 1
    I love how the question makes total sense (and describes a real problem) if you take it outside the context of programming. – Michael Borgwardt Oct 28 '21 at 10:02

6 Answers6

27

Does this coding pattern have a name?

  1. This is a refactoring called "Introduce Parameter Object". A dictionary is used here as a "poor man's DTO". Note there are other, less error prone means to introduce DTOs in Python, like dataclasses, named tuples or typed dicts

Will other programmers easily understand the format?

  1. Surely not if you call those DTO just dict_of_values or the keys just x,y, and z. But same holds in your orginal function's signature when the parameters are just called a,b,c,d. My point is, not the fact of using a dictionary as a DTO makes the differences between "easy" or "hard to understand", but the naming, commenting and the separation into easy-to-grasp units.

What problems am I introducing that will bite me in the future?

  1. When introducing DTOs, make sure their names give a clear, readable indication of what those DTOs represent. Otherwise your growing code will end up in an unreadable mess. Using an untyped dictionary as a DTO has the problem that you won't get immediately an error if you have a typo, bugs will manifest themselves only later during runtime.

Is there a better alternative, assuming that I can't avoid functions that have a large number of mandatory variables?

  1. See #1. If those alternatives are really "better" may depend on the specific context, there are always some trade-offs involved. Some of those alternatives require more code, some of them don't work with older Python versions, some of them require additional dependencies. You have to decide by yourself which variant gives you the best cost/effort relationship.
Doc Brown
  • 199,015
  • 33
  • 367
  • 565
  • With regard to #3, pydantic has nice validation features. Even if the OP isn't serializing data, that alone might make it worth using. There's a decent description in your second link. – JimmyJames Oct 27 '21 at 14:47
  • 1
    Thanks! That gives me plenty of food for thought and some good hints about what tradeoffs I need to consider. – P. Hopkinson Oct 27 '21 at 15:28
  • 1
    What I don't like about passing dictionaries is that the docstring can get out of sync with the parameters in the dictionary. When you explicitly define each parameter, modern IDEs remind you to update the docstring. You would loose that if you use a dictionary. How do DTOs fare in that regard? – Sia Rezaei Jul 30 '22 at 00:50
10

The main problem with using dictionaries as arguments or return values is that they are not well defined. The keys and values of a dictionary, especially in loosely-typed languages, mean something in your problem domain, but are not declared. Other programmers will need the source code for the function in order to call it. This breaks the abstraction the function provides, because consumers of the abstraction need to care about the implementation.

Your concern about getting the argument order wrong is legitimate. There are a few solutions to this problem. You mentioned one already, which is to use named arguments. When the line of code becomes too long, consider breaking the line into multiple lines. My first reaction to this is not too refactor the code into a dictionary. Use named arguments on multiple lines:

translate(a = a,
          b = b,
          c = c
          d = d,
          ...)

Still, mistakes can occur with named arguments when consecutive args are the same type. Adding type hints can help: def translate(a: int, b: int, c: str...), but that will not fix everything.

When arguments get too verbose that named params and type hints do not help, consider defining a class with attributes to encapsulate the arguments. This gives you the flexibility to define a constructor for the required parameters, and allow callers to set attributes individually for optional params.

Either way, be sure to add pydoc comments to the function or parameter class to help guide callers of the function.

Greg Burghardt
  • 34,276
  • 8
  • 63
  • 114
  • Thanks! Some nice pointers to consider and reading your thought process has been very helpful for sanity checking my own. – P. Hopkinson Oct 27 '21 at 15:38
  • Wanted to say this. The multiline pattern is super common when using APIs; if you still see calls with too many (>6 or so) parameters, refactor: make separate functions for common use cases, pack parameters into objects and so on. – Lodinn Oct 28 '21 at 08:41
3

Nobody has mentioned Python's ability to unpack dictionaries to named function parameters. This gives you the best of both worlds. You can even combine dictionary unpacking with standard unnamed and named parameters

https://docs.python.org/3/tutorial/controlflow.html#tut-unpacking-arguments

In brief, the ** operator can be used to assign keys in a dictionary to named function parameters

def fn(a, b, c):
    print("a = {}, b = {}, c = {}".format(a, b, c)) 

d = {"a": 1, "b": 2, "c": 3}
fn(**d)
RUOK_
  • 139
  • 2
  • 4
    Nobody? The OP mentioned it! – JDługosz Oct 27 '21 at 22:07
  • @JDługosz - I mentioned kwargs but didn't mention this particular feature – P. Hopkinson Oct 27 '21 at 22:18
  • @P.Hopkinson I thought that was exactly what it referred to. I can't imagine using it when the parameters are not matched by name from the dictionary! What would it do, just put them in random order and hope it matches what the function expects? – JDługosz Oct 27 '21 at 22:32
  • I think it assembles the keyword arguments into a dict which is accessible from within the function. So: `def fn(**kwargs): print(kwargs)` follwed by `fn(a=1, b=2, c=3)` would print `{'a': 1, 'b': 2, 'c': 3}`..... bit niche but potentially useful. – P. Hopkinson Oct 27 '21 at 23:04
  • It's usually used for optional arguments, you can have a function f(x,y,**kwargs) and call f(x,y,z=1) then in your function `z=kwargs.pop('z',None)'. A common use case is when your function calls some other function and you want to let your caller pass additional arguments on to the inner function without necessarily having to enumerate all of them. – David Waterworth Oct 28 '21 at 03:16
0

Yes, generally it's reasonable. It's done often in other languages such as Javascript as well.

  1. I don't know.
  2. If we're talking about python, most programmers are used to this sort of things. I think that if done properly it would be understandable. But there are better ways of doing it.
  3. The main problem I see, is that it becomes hard to call the function, as the arguments are not defined anywhere. This also can be solved in various ways. With python you often have the problem for normal parameters as well, since the datatype can be unclear.
  4. Let's try:

A more pythonic way would be to use unnamed args or keyword arguments. This way the function calls look native and the caller has the choice:

def translate(*args, **kwargs):
    ...

translate(1, 2, 3)
translate(a=1, b=2, c=3)
# People can still use your approach like this:
params = dict(a=1, b=2, ...)
translate(**params)

You could also leave the parameters named and achieve the same result (from the caller point of view). This is better because arguments are specified (possibly the data type as well with annotations):

def translate(a: float, b: float, c: float):
    ...

translate(1, 2, 3)
translate(a=1, b=2, c=3)
# People can still use your approach like this:
params = dict(a=1, b=2, ...)
translate(**params)

Another possible solution to the argument specification problem is to use a TypedDict to specify the format of your dictionary:

from typing import TypedDict

class TranslateParams(TypedDict):
    a: float
    b: float
    c: float


def translate(params: TranslateParams):
    ...

translate({"a": 1, "b": 2, "c": 3})

Still better would be to refactor your code to avoid having all these parameters. For instance, in your example it seems you are trying to apply a translation by passing the matrix as single parameters. It would be cleaner to use a data structure to hold the matrix. One solution would be to use a list, another a class:

def translate(matrix: list[float]):
    ...

translate([1, 2, 3])

class Matrix:
    def __init__(self, values: list[float]):
        ...

def translate(matrix: Matrix):
    ...

translate(Matrix([1, 2, 3]))

This applies also for other function types, you could for instance use a configuration object:

@dataclass
class TranslateOptions:
    resampling_method: str
    scale_factor: float = 1

def translate(options: TranslateOptions):
    ...

translate(TranslateOptions(resampling_method="bilinear"))

There would also be other solutions such as using the builder pattern.

0

If you are using a dictionary naively, there is no reason not to use keyword args. Your main reason for discarding it is that it makes your function call longer, but that isn't a fair comparison. While the function call may be a few characters longer, the dictionary approach makes it many statements longer. It is not possible to construct a dictionary of arguments in fewer characters than the keyword arguments syntax.

There are times where it can be useful. If you call a function in 50 places, you can have all of the arguments packed into a dictionary up front. Doesn't happen often though. It can also be useful if you need to pack some logic into a few of your arguments. You can isolate those and make it obvious.

In the few times where using a dictionary is useful, I highly recommend using the **args notation. You mention there's no advantage. And, agreed, there's few. But there's two that I see:

  • Give the caller a choice as to whether to use a dictionary or keyword arguments. Other people may come to a different conclusion than you did.
  • Adding requirements. You can do f(a=1, b=2, **rest) to start with the rest dictionary, and then add arguments to it. This pattern can be quite useful because often we find ourselves varying one or two arguments, and the rest are common.

In particular, I find the latter argument extremely useful when using matplotlib to plot things. Often I need to set up some formatting arguments which will be used for 8 or 10 elements on the plot. I pack those up in a dictionary, and then use that dictionary on every plotting function call with the `**rest`` pattern.

Cort Ammon
  • 10,840
  • 3
  • 23
  • 32
0

"Python or otherwise," when I see this "list of individual arguments, just waiting to be have been specified in the wrong order in some isolated piece of source-code that all of us have overlooked," I want to insist that it needs to instead be: "a structure, a class, a whatever-it-is that the language can verify."

Because, ALL of us by now have been hammered by something as accidental as this, somewhere buried among "all those repetitions within the source code, except for this one":

x, y, z = translate(a, b, d, c, e, f, g, h, i, j, k, l, m, n, o, p)

Mike Robinson
  • 1,765
  • 4
  • 10