2

I have a module as below:

def normalize(sub_item):
    pass

def process(sub_item):
    pass

def cleanup(sub_item):
    pass

def calculator(item):
    pass

def main(item):
    for index, sub_item in enumerate(item.sub_items):
        normalize(sub_item)
        process(sub_item)

        if index == 0:
            calculator(sub_item)

        cleanup(sub_item)

I have special logic for the first sub_item but this logic can only be applied after the normalize step. In all cases [Educated quess- Editor] the cleanup step will remove the resource from the system. Adding a conditional statement is easy but it doesn't feel clean to me. Any thoughts?

radarbob
  • 5,808
  • 18
  • 31
haitran
  • 129
  • 2
  • Possible duplicate of [How to make sure people call methods in the right order?](https://softwareengineering.stackexchange.com/questions/294455/how-to-make-sure-people-call-methods-in-the-right-order) – gnat Aug 24 '18 at 07:07
  • Condition ls are fine. What about the state design pattern ? – Christophe Aug 24 '18 at 07:18
  • 2
    It all depends on what is meaning of calculator being applied on first item, but not on the rest? What makes first item special? Also, do the methods have some outside state or do they only change the items? I think the smell here is that first item is special. But nothing in the code says what that "special" is. – Euphoric Aug 24 '18 at 08:44

2 Answers2

2

... the logic can only be applied after the normalize step and the cleanup step will remove the resource from the system.

You have rules that aren't perfectly clean, which means the code that implements them isn't going to be, either. Everything has trade-offs and there are no cut-and-dried rules about which ones are okay to make.

Putting the decision inside the loop has a readability benefit because it keeps the rules in one place: normalize, process, calculate if the first item, cleanup. You pay for it in having to make a decision in every iteration of the loop. Whether or not you can tolerate that depends on whether or not the additional per-iteration overhead is going to break your performance budget. If you're doing a dozen items, it might not make a difference, but if you're doing a billion or two, it could. This is something you have to measure and evaluate based on your situation.

If the cost of repeatedly making the decision is too high, you can do what a compiler with a really good optimizer would do and unroll just the first iteration of the loop:

# Requirement ABC123 says the first item gets special treatment.
first_item = item[0]
normalize(first_item)
process(first_item)
calculator(first_item)  # Special treatment
cleanup(first_item)

for index, sub_item in enumerate(item[1:]):
    normalize(sub_item)
    process(sub_item)
    cleanup(sub_item)

(I'm treating item as a list to keep the example simple, but you can do the same with an iterator.)

This spares you the performance hit of making the decision each time at the expense of having to repeat most of the process elsewhere in the code. If the real-world code is this simple and everything sits within ten lines as this does, it's not that big a deal in practical terms. Some of the risks posed by the additional ugliness can be mitigated through documentation in the code:

# WARNING:
#
# This is done separately to meet the performance limit
# set out in requirement OU812.  If the non-special part
# of the algorithm changes, make sure both versions are
# altered at the same time.

...Process first item...

# See warning above about making modifications.

for index, sub_item in enumerate(item[1:]):
    ...
Blrfl
  • 20,235
  • 2
  • 49
  • 75
1

If you need to always do some cleanup, consider using a context manager:

from contextlib import contextmanager

@contextmanager
def normalized(sub_item):
  ... # normalize
  yield resource
  ... # cleanup

If the cleanup is necessary even with an exception:

@contextmanager
def normalized(sub_item):
  ... # normalize
  try:
    yield resource
  finally:
    ... # cleanup

You can change your other functions so that they require the resource from the context as an argument, preventing them to be called outside of that context. Then:

for index, sub_item in enumerate(item.sub_items):
    with normalized(sub_item) as resource:
        process(sub_item)

        if index == 0:
            calculator(sub_item, resource)

I.e., use data flow dependencies to enforce a particular ordering. This isn't perfect in Python because the scope of the resource variable will be quite wide, but it's a lot harder to make an accidental mistake.

Testing for index == 0 is the cleanest way to add extra processing for the first item. There are alternatives such as explicitly using iterators, but they imply some amount of code duplication and are more difficult to read:

sub_items_iter = iter(item.sub_items)

# consume first item from iterator
for sub_item in sub_items_iter:
  with normalized(sub_item) as resource:
    process(sub_item)
    calculator(sub_item, resource)
  break

# consume remaining items
for sub_item in sub_items_iter:
  with normalized(sub_item) as resource:
    process(sub_item)

Using the for-loop to consume a single item from an iterator isn't obvious, but it's still clearer than the de-sugared variant:

try:
  sub_item = next(sub_items_iter)
  has_first = True
except StopIteration:
  has_first = False

if has_first:
  with normalized(sub_item) as resource:
    process(sub_item)
    calculator(sub_item, resource)
amon
  • 132,749
  • 27
  • 279
  • 375