5

Something that often comes up in code reviews is the structure of the code, particularly related to the use of control structures, where individuals can be fairly opinionated on what is "right" and what is "wrong".

I have my own opinions on the subject, but at the moment, that's just what it is.

The code snippets below are as an example, and are from a recent discussion. I prefer one of these, and my colleague prefers the other.

Example 1:

  enum {
    failed_conn = 0,
    made_connection = 1
  };
  enum {
    is_not_in_set = 0,
    is_in_set = 1
  };
  int status = Status_Failed;
  in_set = FD_ISSET(socket_desc, &fdset);
  if (in_set) {
    sock_opt_read_result = getsockopt(socket_desc, SOL_SOCKET, SO_ERROR, &result, &result_len);
    if (sock_opt_read_result >= 0) {
      connection_to_target = (result == ECONNSUCCESS) || (result == ECONNREFUSED);
      if (connection_to_target == 1) {
        status = Status_OK;
      }
    }
    return status;
  }

Example 2:

  return ((FD_ISSET(socket_desc, &fdset))                                            /* Select returned the file descriptor (socket connected or failed) */   
       && (getsockopt(socket_desc, SOL_SOCKET, SO_ERROR, &result, &result_len) >= 0) /* We can get the error code (if any) from the connect call */
       && ((result == ECONNSUCCESS) || (result == ECONNREFUSED))) ? Status_OK        /* The connect() call succeded or was refused by target */
                                                                  : Status_Failed;   /* If any of the above are not true, we failed to connect */

These are just as an example of structure. For those of you who are worried about side effects (e.g. the result variable) don't be, none of the variables in use here are used elsewhere.

What are the (if any) hard (not opinion based - I have enough of those already) arguments for coding in either of these styles, that might help cement a coding style that can be generally agreed upon, either one style, the other, or some other form?

Baldrickk
  • 714
  • 5
  • 12
  • related: [Approaches to checking multiple conditions?](http://programmers.stackexchange.com/questions/191208/approaches-to-checking-multiple-conditions) – gnat Apr 11 '16 at 18:32
  • 4
    Which style do you find easier to read? – Robert Harvey Apr 11 '16 at 18:35
  • @RobertHarvey personally, the latter is nicer for me. – Baldrickk Apr 12 '16 at 00:45
  • Then that is the correct choice. – Robert Harvey Apr 12 '16 at 00:46
  • 1
    It's the one that I wrote. Unfortunately, despite the coding standard not dictating usage of structures like this, comments keep getting raised at code review... I actually prefer @5gon12eder's 3 return solution, but that raises comments too, as they don't like multiple returns in a function (and yes, I know that it's a "misunderstanding" of what Dijkstra wrote). – Baldrickk Apr 12 '16 at 00:58
  • 1
    Ironically, the best way to simplify `if` statements is to have multiple returns. It's too bad that so many coders stubbornly cling to outdated misconceptions; the multiple returns argument goes all the way back to early C and Assembly Language... it ceased to be a legitimate concern decades ago. – Robert Harvey Apr 12 '16 at 01:29
  • 3
    As someone who's admittedly not primarily a C programmer, example 1 looks like just plain bad code to me. Example 2 looks like an unnecessarily terse mush that is probably comprehensible to an expert, but will take several minutes of thinking / research for me to decipher. I opine that @5gon12eder's solution is head and shoulders above either. It's fairly straightforward to change the returns to `result=` and check that on subsequent conditions if they really must have a single return, but I prefer the multiple return version. (Single return does score points if there is shared cleanup code.) – GrandOpener Apr 12 '16 at 03:24
  • The old joke is: "*Opinions are like , everyone's got one.*" Opinions seem to be necessary (else we would not have them), but that does not make them determinative of, well, anything, really. (the comma operator is so useful) –  Apr 13 '17 at 15:29

4 Answers4

19

I'm not very excited about either version. The second (single large expression in return statement) makes it hard to follow the control logic. The fact that the author has felt an urge to comment the code to make it clear what it does and that it doesn't fit on any reasonable line length are indicators that the expression is too complex and should be broken up.

The fist example, on the other hand, seems overly complicated to me. For example, how is

enum {
    is_not_in_set = 0,
    is_in_set = 1
};
in_set = FD_ISSET(socket_desc, &fdset);
if (in_set) {
    // …
}

any more readable than simply using the result of the “function call” immediately as the conditional?

if (FD_ISSET(socket_desc, &fdset)) {
    // …
}

Every name you introduce into the current scope adds additional cognitive load on the reader. This might be warranted if it adds additional information but I don't see how the use of the enum helps here. It is clear that a call to FD_ISSET tells me whether the file descriptor is in the set or not. I cannot see how the enum is useful here at all. And what about the other enum?

Likewise here.

connection_to_target = (result == ECONNSUCCESS) || (result == ECONNREFUSED);
if (connection_to_target == 1) {
    // …
}

There is a lot of redundancy here. Where is connection_to_target declared? What is its type? Why is it compared to 1? Why is it not const? Why is it there at all?

Again, using the expression directly

if ((result == ECONNSUCCESS) || (result == ECONNREFUSED)) {
    // …
}

would greatly improve readability and reduce complexity.

In the third case

sock_opt_read_result = getsockopt(socket_desc, SOL_SOCKET, SO_ERROR, &result, &result_len);
if (sock_opt_read_result >= 0) {
    // …
}

an argument could be made for the use of a named variable because the function call is very long. But again, it would reduce cognitive load if the variable were declared const and initialized immediately. Also, the name sock_opt_read_result is really not helpful. The name of a variable should indicated what it means, not where it came from.

Finally, I find code much more readable when it focuses on the “success path” and breaks out of it in case of failure. In programming languages that have exceptions, this style emerges naturally. It helps you reduce the depth of your nested conditionals and makes it easier to see at which point we have success or failure.

So, I would re-write the code somehow like this.

if (!FD_ISSET(socket_desc, &fdset)) {
    return Status_Failed;
}
if (getsockopt(socket_desc, SOL_SOCKET, SO_ERROR, &result, &result_len) < 0) {
    return Status_Failed;
}
if ((result != ECONNSUCCESS) && (result != ECONNREFUSED)) {
    return Status_Failed;
}
return Status_OK;

If you feel a need to explain why a certain condition indicates failure, the beginning of the respective “then” block is a good place to put a short comment.

The last conditional in the suggested code above might be controversial. I have written it like this to stay true with the suggested principle of keeping the “success path” straight. Alternatives like

return ((result == ECONNSUCCESS) || (result == ECONNREFUSED)) ? Status_OK : Status_Failed;

or

switch (result) {
  case ECONNSUCCESS:
  case ECONNREFUSED:
    return Status_OK;
  default:
    return Status_Failed;
}

might be reasonable choices as well.

I'm fairly sure that we can agree that the suggested replacement in this answer is more readable than either of the original alternatives but this answer might still not be as non-opinion-based as you were hoping. I'm afraid that there won't be the one right way to write your code. In the end, I think it is also important to respect the personality of your co-workers during code review. If it is not perfectly clear which way to structure the code is superior, let them have their version and use your preferred way to do it in the code you write. These implementation details are local to each function and don't affect how maintainable the software is at a higher level. If you change your opinion later, you can always safely refactor any function's internals without risking to break anything.

By the way, have a look at Code Review if you want other people to discuss the quality of a piece of your code.

5gon12eder
  • 6,956
  • 2
  • 23
  • 29
  • 4
    Great answer. I would add that your solution is also more debuggable and loggable. Eventually someone's going to wonder which of those conditions caused that function to fail. – Karl Bielefeldt Apr 11 '16 at 20:48
  • Almost downvoted. With a decade of experience writing C, I vastly prefer the author's code (the second block). It's perfectly clear. "If A & B & C then success else failure." Your code repeats "Status_Failed" three times. It's much harder to see when the code returns failure and when it returns success. – kevin cline Apr 12 '16 at 07:29
  • Any modern debugger will be able to break on the lines within the conjunction. – kevin cline Apr 12 '16 at 08:00
  • 2
    This solution is clear, less error prone to construct and modify, able to return more than one status, is easier to debug and supports logging. Fantastic. – Galik Apr 13 '16 at 02:05
4

This code:

  return ((FD_ISSET(socket_desc, &fdset))                                            /* Select returned the file descriptor (socket connected or failed) */   
       && (getsockopt(socket_desc, SOL_SOCKET, SO_ERROR, &result, &result_len) >= 0) /* We can get the error code (if any) from the connect call */
       && ((result == ECONNSUCCESS) || (result == ECONNREFUSED))) ? Status_OK        /* The connect() call succeded or was refused by target */
                                                                  : Status_Failed;   /* If any of the above are not true, we failed to connect */

is perfectly clear idiomatic C programming. I found it immediately understandable and obviously correct. Most expert C programmers would write in this style, but without the explanatory comments, which don't add anything that's not in the documentation of FD_ISSET and getsockopt. The reviewer's "improvements" make good programmers say "WTFF?? Why didn't he just write 'return (A && B && C) ? success : failure'?"

The use of enums and temporary variables in the first block just slows comprehension and increases the likelihood of errors. That is the sort of code you see from programmers who don't really like C and wish they were programming in COBOL instead.

kevin cline
  • 33,608
  • 3
  • 71
  • 142
  • Yeah, this is the one I wrote. Originally without the comments, added in because of code reviewers going "WTF". Thanks for the "expert" comment ☺ – Baldrickk Apr 12 '16 at 00:57
  • Looks like you are the smart one in your team. That's OK if you are senior and the reviewers are junior. But since you are asking, I expect it's the other way around. You should consider looking for new opportunities. Shops like yours are a career killer. – kevin cline Apr 12 '16 at 07:18
  • 1
    I like this because the whitespace is done well. My main concern is that few style guides make clear how to do the whitespace well. I personally love the ternary operator. I've also seen it horribly abused. – candied_orange Apr 13 '16 at 23:12
2

The priority when writing code is maintainability / readability. There are a large number of factors that can affect this.

Breaking it down by example:

Example 1:

  • Positives:
    • named variables: by naming each variable with a readable name, it is easier to understand the intent behind the usage of the variable
    • braced structures: It is clear what operations are linked to which control statements
  • Negatives:
    • "bloat": There are a lot of variables in use. The more there are, the more difficult it is for the reader to keep track. This link (yes, it is wikipedia...) provides either 7 or 4 as "cognitive limits", meaning that to try and remember many values as the code is read becomes more difficult.
      • this means that a reader who needs to check what value a variable represents, or refresh their memory on what has been done to that variable will have to break off to check back
    • end results are separated: return value is set in one location, (possibly) modified in another (you have to read and understand each if separately to confirm) and returned at the end. While not terrible, it's more to keep track of.

Example 2:

  • Positives:

    • temporary variables are temporary: There is no need to store these values for later use, as they are not going to be used later. Where the return value of a function is used as an argument, the function name is sufficient to identify the use of the value without explicitly naming it
    • concise: minimal bloat / higher informational density, there is little boiler-plate around (and amongst) the code, so what remains is more meaningful (yes, this can be taken too far, obfuscating the code. Avoid that)
    • clear meaning: The snippet's only job (at a high level) is to return a value, instead of creating more.
  • Negatives

    • reliance on understanding the boolean operators and their precedence
    • bracket matching can be a visual issue

I'm sure there are more arguments for and against. These are just those that spring to mind.

Baldrickk
  • 714
  • 5
  • 12
  • Example 2/Negative 1 can be mitigated with extra parens, Example 2/Negative 2 can be mitigated with different indentation/splitting over more lines – Izkata Apr 11 '16 at 19:25
  • The second example looks fully bracketed... or at least, I didn't see a place reliant on operator precedence. As such, there are too many brackets for comfort. – Deduplicator Apr 11 '16 at 19:40
  • @Deduplicator I may have used incorrect terminology there then. What I meant was that because the `&&` evaluates what is on the left first, the order of execution explicitly matches the version with the multiple `if` statements, and doesn't execute the following statements if not needed. – Baldrickk Apr 11 '16 at 21:20
  • You "negatives" are not negatives at all. The only operator used is "&&" and "? .. :". Anyone who doesn't understand them is not competent to work as a C programmer. – kevin cline Apr 12 '16 at 07:35
  • @kevincline maybe 'understand' is the wrong word. "find harder to read" maybe? – Baldrickk Apr 14 '16 at 08:09
2

First, C99 introduces stdbool.h and support for bool; I think you can make your code better read-/understandable by using it.

5gon12eder did already draft an appealing approach using subsequent if statements; using bool you could rephrase that like the following snippet - a pattern I apply quite often in my C++ code.

bool ok = false;

ok = (FD_ISSET(socket_desc, &fdset) == 1);

if(ok)
    ok = (getsockopt(socket_desc, SOL_SOCKET, SO_ERROR, &result, &result_len) >= 0);

if(ok)
    ok = (result == ECONNSUCCESS) || (result == ECONNREFUSED);

return (ok ? Status_OK : Status_Failed);

This would also make it easy to add error handling, different return values, multiline blocks and common cleanup code without compromising comprehensibility:

bool ok = false;
enum Status status = Status_Failed;

ok = (FD_ISSET(socket_desc, &fdset) == 1);

if(ok)
    ok = (getsockopt(socket_desc, SOL_SOCKET, SO_ERROR, &result, &result_len) >= 0);

if(!ok) {
    //Error handling for getsockopt() goes here
}
else
    ok = (result == ECONNSUCCESS) || (result == ECONNREFUSED);

if(!ok)
    status = Status_Connection_Failed;
else {
    //Multiple lines of peer sanity checks here
    if(!ok)
        status = Status_Peer_Error;
}

//Cleanup code goes here

assert(ok || status != Status_OK);
return (ok ? Status_OK : status);
Murphy
  • 821
  • 6
  • 15
  • I used to maintain code with chains of "if (ok)". At first it looked a bit idiotic, but eventually it grew on me. If you really have a series of evolving conditions, any one of which could fail, and you can't just exit for some reason, then hey, it works and is seriously readable. It can be expanded, debugged, reworked, etc. –  Apr 13 '17 at 15:34