-1

In my machine learning pipeline, I have all the arguments collected into a dictionary.

args = {'save_model': True,
'learning_rate': 0.01,
'batch_size': 4,
'model': 'my_model',
'momentum': 0.9,
'random_brighness': 0.5,
'random_flipping': 0.5,
...}

Then I have a bunch of functions that take the entire args dictionary as an input. Each of these only uses a small subset of all the arguments. Is there anything wrong with this design?

model = get_model(args)
data = get_data(args)
transformed_data = transform_data(data, args)

from the perspective of the function, it looks like:

def get_trainer(args):
loss_function = args['loss_function']
optimizer = args['optimizer']
class_weights = args['class_weights']
...

versus:

def get_trainer(loss_function, optimizer, class_weights)
...
jss367
  • 253
  • 1
  • 6
  • Why not `def get_trainer(loss_function, optimizer, class_weights, **kwargs):` then `data = get_data(**args)`? – jonrsharpe Jan 22 '21 at 18:46
  • I'm not opposed to that - I'm just asking. Is that a better approach? – jss367 Jan 22 '21 at 18:50
  • 1
    Sounds like you have invented global variables without using globals. – Doc Brown Jan 22 '21 at 20:41
  • 1
    To build on Doc Brown's answer, you might want to read this: [Are global variables bad?](https://stackoverflow.com/a/485020/2791540). *The problem with global variables is that since every function has access to these, it becomes increasingly hard to figure out which functions actually read and write these variables.* – John Wu Jan 22 '21 at 23:40
  • 1
    Also, [Why is it good programming practice to limit scope?](https://softwareengineering.stackexchange.com/questions/307346/why-is-it-good-programming-practice-to-limit-scope) and [Why is global state so evil?](https://softwareengineering.stackexchange.com/questions/148108/why-is-global-state-so-evil) – John Wu Jan 22 '21 at 23:44

1 Answers1

3

The main problem I have with that choice is that it removes one of the benefits of the type system, which is the ability to give you validation at compile/design time, rather than at runtime. When all your methods take a dictionary, they A) do not give the reader of the code information on what parameters they need, and B) do not allow the compiler or interpreter to validate function calls.

With explicit arguments, if you add or remove needed parameters at some point, you, the interpreter and your IDE can easily find the call sites that don't match the new signature. With a generic dict, you have to remember and check them yourself or wait for a runtime error at some point.

This is equivalent, in more statically typed languages, to defining all arguments as "object" or "Any" - it saves you time having to define types, but pushes errors and validations to runtime, which ia always costlier.

Avner Shahar-Kashtan
  • 9,166
  • 3
  • 29
  • 37
  • I do the checking at runtime. Just a wrapper that falls into the debugger and lists all keys that were present that the function didn’t ask about. – gnasher729 Jan 22 '21 at 20:58
  • For example, if a function inquired about “random_brightness” and handled its absence correctly my wrapper would fall into the debugger and tell about the user about the “random_brighness” key. – gnasher729 Jan 23 '21 at 15:02