Code reviewing nitpicks for readability [actually for DRY]

Question

I was doing a code review today and noticed that the code would work but could have easily been made more readable.

The example in particular was roughly as follows:

def group(x):
    GROUP_ALPHA = ('a', 'b', 'e', 'f', 'i', 'j', 'k', 'l')
    GROUP_BETA = ('m', 'n', 'o', 'p', 'q')
    GROUP_CHARLIE = ('x', 'y', 'z', 'aa', 'ab', 'ae', 'af')
    GROUP_DELTA = ('ag', 'ah', 'ai', 'aj')
    GROUP_ECHO = ('ak', 'al', 'am', 'an', 'ao', 'ar', 'as')

    if x in GROUP_ALPHA:
        return 'ALPHA'
    elif x in GROUP_BETA:
        return 'BETA'
    elif x in GROUP_CHARLIE:
        return 'CHARLIE'
    elif x in GROUP_DELTA:
        return 'DELTA'
    elif x in GROUP_ECHO:
        return 'ECHO'

Note that every group is disjoint.

To me, this would be more readable with:

def group(x):
    GROUPS = {
        'ALPHA': ('a', 'b', 'e', 'f', 'i', 'j', 'k', 'l'),
        'BETA': ('m', 'n', 'o', 'p', 'q'),
        'CHARLIE': ('x', 'y', 'z', 'aa', 'ab', 'ae', 'af'),
        'DELTA': ('ag', 'ah', 'ai', 'aj'),
        'ECHO': ('ak', 'al', 'am', 'an', 'ao', 'ar', 'as')
    }
    for group_name, group_elements in GROUPS:
        if x in group_elements:
            return group_name

Should I mention this? The original code works and is used in a non-critical capacity. When I was researching, I noticed this answer, which remarks:

Every change costs money. Rewriting code that works and is unlikely to need to be changed could be wasted effort. Focus your attention on the sections that are more subject to change or that are most important to the project.

This would suggest that the answer is no, but it seems wasteful to have a code review and not suggest a better way to do things, even if the original way "works". Personally I would want someone to make such a suggestion to me.

Note: the above examples have been edited since the question was first written. As well, each entry ('a', 'b', etc.) is actually a human readable str related to business logic.

Are you sure it is faster? I know set is supposed to be O(1) where list is O(n), but with so few items constant factors will dwarf this. — JacquesB, Sep 25 '19 at 19:27
@JacquesB forgot to update the title/etc, speed will be the same, but there won't be the duplicated `if x in group ...` part — shayaan, Sep 25 '19 at 23:26
Aside: I suppose one additional advantage of using sets is the guaranteed uniqueness, but this is not a major issue — shayaan, Sep 25 '19 at 23:28
If you're going to use a dict, at least use it properly, so that you can lookup they letters as keys. — Alexander, Sep 26 '19 at 03:35
why do you feel that the first is less readable? The intent of that code seems to be absolutely crystal clear without needing to think about it at all - I can't really see how you can get any more readable than that. — Ben Cottrell, Sep 26 '19 at 10:06
@Alexander Not sure what "properly" means, your approach would entail a very large dictionary (to be fair I didn't originally disclose how large the groups were, edited to reflect that), and would introduce repetition in the group names — shayaan, Sep 26 '19 at 14:16
@BenCottrell The main issue I take is the repetition of `elif in group: return ...`, creating a longer, and not very DRY piece of code, this logic can be generalized fairly simply — shayaan, Sep 26 '19 at 14:17
@Shayn "your approach would entail a very large dictionary" What makes you think it'll be bigger than your str to list dict? Each string is ultimately only stored in memory once. "to be fair I didn't originally disclose how large the groups were" If the groups are big, then not only is this about the same space efficiency as your dict of lists, but it's also *much* faster than the linear search that you're doing through the groups' lists. "would introduce repetition in the group names" nah, I would construct the dict using a dict comprehension from an input similar to your `GROUPS` variable. — Alexander, Sep 26 '19 at 15:59
@Shayn That isn't the same as readability though - making something less DRY is of course a good reason to do something, but it would also be reasonable to point out that generalising something to eliminate repetition generally does make the intent less immediately obvious, therefore while it may be the case that the `for` loop could be a better solution, the "cost" of that solution is making the intent and readability slightly less clear (although not in such a significant way as to be a problem) — Ben Cottrell, Sep 26 '19 at 23:07
@BenCottrell Yep, agreed, retained the original wording in the question so it wouldn't deviate too much from the original idea (and invalidate other good answers) but as the answer I selected pointed out: it's more about DRY than readability — shayaan, Sep 27 '19 at 00:46
If you think your approach is an improvement, it's worth making the suggestion. Code reviews are meant to be a conversation and often an opportunity to learn. The author is under no obligation to carry out your suggestion, but the ensuing conversation may have value. They might learn that your approach is an improvement. You might learn more about their intent and realise your approach is _not_ an improvement. — Burhan Ali, Oct 05 '19 at 21:20

Doc Brown · Accepted Answer · 2019-09-26T10:56:30.977

The given example does IMHO not make a big difference in readability - in fact, it is quite debatable if the second variant is really more readable than the first one. However, it is clearly an improvement in following the DRY principle, and that is a very valid reason to change the code.

Let us assume the intention is to deal with all these groups always in a similar fashion, which I guess is quite likely. Let us also assume there are at least five of these groups. Now you get the new requirement to change the logic to something like

if lower(x) in GROUP_ALPHA:

then in the first example, you will have to change 5 places, make sure you change them identically and don't forget any place. In the second example, there will be only one place to change, which is definitely an improvement. Further, if you are going to write a unit test for this function, or specifically for the change to this function, the first variant will require 5 different test cases for achieving full coverage, and the second only one.

Even simpler - assume you just want to add another group - in the first example, you will have to change two places in code (which includes copying the whole "search" section), in the second example, you will only have to extend the dictionary.

Keeping code DRY is not just a matter of "taste" - it is a matter of bringing code into a state where it is less likely to cause future headaches. And code reviews are a good opportunity for this.

score 9 · Answer 2 · answered Sep 25 '19 at 19:34

One benefit of code reviews is indeed to suggest better ways to do things. Better, of course, means different things in different contexts, but can include code that is more readable, is faster, consumes less memory, follows good design principles, is less complex, etc. Suggesting a change that is more readable is a perfect example of something to note during a code review.

Every change costs money, but so does trying to decipher badly-written code 6 months from now.

Keep your target audience in mind, however, and realize that it may be more readable to you but less readable to others (especially if you work with a lot of junior engineers). Also keep your application environment / needs in mind when you suggest these changes. While your solution may be faster, faster code may not be necessary. If the dictionary is only ever going to have 2 keys in it that map to a list of 3 elements, performance will never be a problem, regardless of how you set it up. It's probably still worth bringing up, as the rest of your team can use it the next time, but realize that it is not something to dig in your heels on.

JacquesB · Answer 3 · 2019-09-26T11:21:33.393

7

The proposed change introduces a subtle bug, so I guess this is a good example of why suggestions should be vetted carefully.

If x is member of multiple groups, the first example will return the first group found in the explicit order of the if's. In the second example, the groups will be checked in an unspecified order (since dictionaries are unordered), so which of the groups it returns is undefined. This is especially insidious since it might be the expected output in unit-tests, but this could change on a different platform or with a version update.

The underlying issue is that the new code doesn't communicate the intention clearly. If the expected result indeed is non-deterministic, then I think it should be clearly communicated in documentation, since this definitely breaks the "principle of least surprise".

If on the other hand you expect an x to appear in exactly one group, then I think the data structure should be changed to communicate this clearly, e.g.

def group(x):
    GROUPS = {
      'a': 'ALPHA', 
      'c': 'ALPHA',
      'd': 'ALPHA',
      'h': 'BETA', 
      'k': 'BETA', 
      'l': 'BETA'
    }
    return GROUPS[x]

This is also much simpler code, since the assumptions are encoded in the data structure rather than in iteration loops.

That said, I agree that it is a good idea to suggest readability improvements in code reviews. But this is really something you should discuss in your team.

edited Sep 26 '19 at 11:21

answered Sep 26 '19 at 11:13

JacquesB

57,310
21
127
176

1

I see where you are coming from, but this solution does not convince me. If "x" is expected to appear in only one group, there is no "subtle bug" in the original code, both variants will behave similar. And this alternative suggestion may enforce that every group content occurs just once, but this comes for a price: one has to repeat the group name several times now, which is not really simpler to maintain than the original variant and introduces its own problems of readibility. – Doc Brown Sep 26 '19 at 11:56
Python dictionaries are now ordered, so this bug doesn't manifest itself. Fair point though, I should mention that the groups are disjoint (so no shared members). However, they each contain a few members, i.e. the actual "alpha" has ~ 10 members, this would seem like a less transparent way to store the grouping and introduces the potential for integrity issues because of the repeated group names – shayaan Sep 26 '19 at 13:25
2

We don't know enough information to call this a bug. It is worth mentioning that this is not a perfect refactoring because there is a potential change in behavior. – candied_orange Sep 26 '19 at 15:45
We do know that if x == 'c' both of these wander off the page looking for an unconditional return. – candied_orange Sep 26 '19 at 15:49

score -1 · Answer 4 · answered Sep 26 '19 at 08:06

-1

For a code review, your goal should be to check if the code is acceptable. Could it be done better? That's a dubious reason for a change. Could it be done better in your opinion? That's a more dubious reason.

Do you prefer it better another way? That's no reason at all, because if you had written the code and I reviewed it, I would prefer it better my way. So the code always ends up different from the author's preference, creating useless extra work.

answered Sep 26 '19 at 08:06

gnasher729

42,090
4
59
119

1

"could it be done better" seems a very pertinent question in code reviews, otherwise every submission would dig the codebase further into technical debt. The issue isn't "preference" like preferring one movie to another, the issues include correctness, readability, maintainability, etc which will affect the long term state of the code. – shayaan Sep 26 '19 at 14:26

Code reviewing nitpicks for readability [actually for DRY]

4 Answers4