1

I often have to write a function which may return an output of two kinds: "short" and "long". As an example, consider the subset sum problem: the input is a set of integers S and an integer C, and the output is the largest sum of a subset of S that is at most C. The short output is just the largest sum (an integer); the long output is the subset itself. I can implement both variants in two separate functions, e.g.:

>>> subset_sum_short([1,2,4,9], 8)
7
>>> subset_sum_long ([1,2,4,9], 8)
[1,2,4]

But this causes a lot of code duplication, since the implementation is very similar. Both functions use dynamic progamming to check all possible sums; the long version also keeps information for backtracking.

To avoid code-duplication, I can implement only the long version, since given the optimal subset, the user can easily compute its sum:

>>> subset_sum_long([1,2,4,9], 8)
7

But it is wasteful for users who only need the sum, since it requires keeping unneeded backtracking information. The backtracking information, in this case, can increase the run-time and the memory requirements of the function by a factor of 2 or even 3.

A third solution is to have a single function with a flag:

>>> subset_sum([1,2,4,9], 8, short=True)
7
>>> subset_sum([1,2,4,9], 8, short=False)
[1,2,4]

This avoids the duplication, but the code becomes cluttered with ifs. Moreover, in the future I would like to allow a third output format, for example, giving each item a name, and returning a list of names instead of their sizes:

>>> subset_sum({"a":1, "b":2, "c":4, "d":9}, 8)
["a","b","c"]

Again, the algorithm is the same, but implementing this in a single function will require even more ifs.

Is there a design pattern that allows to write clean, simple and efficient code, while still giving the user a simple way to choose the required output format?

Erel Segal-Halevi
  • 864
  • 1
  • 8
  • 13
  • 4
    Python allows returning multiple values. Why not just return both and let the caller ignore the part of the return value they're not interested in? – The Photon Feb 10 '22 at 16:27
  • See [this Q&A](https://stackoverflow.com/q/431866/1630971) on Stackexchange for ways for the caller to conveniently use only part of the returned value. – The Photon Feb 10 '22 at 16:30
  • @ThePhoton Because it is wasteful for users who only need the sum (it requires to keep unneeded backtracking information). – Erel Segal-Halevi Feb 10 '22 at 17:19
  • In that case you would need to use one of the methods that discards unneeded data. For example, select which data you want from the return value before saving it in to a variable in the calling code. Or if generating that extra data is really expensive, adding a switch to the arguments of the function that tells it what data to generate. These options are both discussed in the Q&A I linked. – The Photon Feb 10 '22 at 18:15
  • "But this causes a lot of code duplication, since the implementation is very similar" - two pieces of code that are *visually similar* may well be *functionally distinct* - you don't need to immediately DRY them out. If the functions have significant differences (one being faster, the other keeping track of additional parameters), the "duplication" may be justified. – jfaccioni Feb 11 '22 at 23:46

3 Answers3

2

The Strategy Pattern allows you to swap out implementations of methods together. The implementation differences are hidden behind method names common to each style. Unneeded work can be replaced with pass.

>>>subset_sum([1,2,4,9], 8, style)

Done this way you can sometimes hide the style of output and differences in how it's calculated from the rest of the algorithm. How well it works depends on the algorithm. While this can be made to work it often fails when a new style is introduced requiring that the algorithm be rewritten to use new style methods that didn't exist before.

Can it be applied to this algorithm? Well if you could have made it work by passing in a boolean or enum there is a refactoring to change that code to code that works with this pattern. It's called Replace Conditional with Polymorphism.

candied_orange
  • 102,279
  • 24
  • 197
  • 315
  • What exactly is the "style" argument - is it an enum? Or a function? – Erel Segal-Halevi Feb 10 '22 at 18:28
  • @ErelSegal-Halevi it's one of a set of objects that each provide different implementations of the same methods you need. The implementations are exchanged based on the style object passed in. The implementation can be as complex as you need or as simple as `pass`. – candied_orange Feb 10 '22 at 18:31
1

Writing similar code twice is not problematic if the duplication is functionally justifiable. That is, these two functions do different things--which is the issue you seem to be hung up on--that's okay! It's still DRY if there's no reasonable room for abstraction. The extra LOC are fully justified by the (apparently very critical) performance advantage of have two distinct functions. This is good (and pythonic) practice, for sure. However, you could maybe look for commonalities between the two functions and refactor those lines as standalone functions, and write them in cython to boot.

mck68w
  • 11
  • 2
0

Is there a design pattern that allows to write clean, simple and efficient code, while still giving the user a simple way to choose the required output format?

Sure, it's called Model View Controller. The idea is to decouple how you present output from your state. Here's a simple example:

>>>sum_of_set( subset_sum_long([1,2,4,9], 8) )

7

If you're looking at that and thinking that this is the late summing you wanted to avoid you're right, kinda, we could implement this that way. Or we could return something from subset_sum_long() that isn't simply a set. We could return a data structure that has a place to accumulate the sum and let sum_of_set() pluck it off and return it. Completely up to you.

This also solves your third output format issue. You can create a data structure that holds the needed information that allows for all three output formats. Then you just need a way to choose which output you want to express.

All of this avoids duplicating code. It also avoids passing in control flags. But it does complicate the algorithm as more and more things get added to the data structure. Where you decide to draw the line here is up to you. But not having to decide exactly how to present the results until after the data structure is created takes a lot of the pressure off since all it needs is the data. No presentation information.

candied_orange
  • 102,279
  • 24
  • 197
  • 315
  • A user who only needs the sum, still needs to run the long version. This is wasteful, since the long version has to keep extra backtracking information, which may take a lot of time and memory. It may double or even triple both the time and the memory requirements. – Erel Segal-Halevi Feb 10 '22 at 17:22