17

This is the most popular way (it seems to me) of checking if a value is in an array:

for (int x : array)
{
    if (x == value)
        return true;
}
return false;        

However, in a book I’ve read many years ago by, probably, Wirth or Dijkstra, it was said that this style is better (when compared to a while-loop with an exit inside):

int i = 0;
while (i < array.length && array[i] != value)
    i++;
return i < array.length;

This way the additional exit condition becomes an explicit part of the loop invariant, there are no hidden conditions and exits inside the loop, everything is more obvious and more in a structured-programming way. I generally preferred this latter pattern whenever possible and used the for-loop to only iterate from a to b.

And yet I cannot say that the first version is less clear. Maybe it is even clearer and easier to understand, at least for very beginners. So I’m still asking myself the question of which one is better?

Maybe someone can give a good rationale in favor of one of the methods?

Update: This is not a question of multiple function return points, lambdas or finding an element in an array per se. It’s about how to write loops with more complex invariants than a single inequality.

Update: OK, I see the point of people who answer and comment: I mixed-in the foreach loop here, which itself is already much more clear and readable than a while-loop. I should not have done that. But this is also an interesting question, so let's leave it as it is: foreach-loop and an extra condition inside, or a while-loop with an explicit loop invariant and a post-condition after. It seems that the foreach-loop with a condition and an exit/break is winning. I will create an additional question without the foreach-loop (for a linked list).

Danila Piatov
  • 289
  • 2
  • 8
  • 2
    The code examples cited here mix several different issues. Early & multiple returns (which for me go to the size of the method (not shown)), array search (which begs for a discussion involving lambdas), foreach vs. direct indexing... This question would be more clear and easier to answer if it focused on just one of these issues at a time. – Erik Eidt Aug 10 '18 at 15:38
  • 5
    Possible duplicate of [Should methods always return from one place?](https://softwareengineering.stackexchange.com/questions/185671/should-methods-always-return-from-one-place) – gnat Aug 10 '18 at 15:45
  • 1
    I know those are examples, but there are languages that have APIs to handle exactly that use case. I.e. `collection.contains(foo)` – Berin Loritsch Aug 10 '18 at 20:17
  • Note that both of these code styles potentially leave your code open to timing attacks, if you're using it in a context where computer security is important. A better way of doing it would be something like boolean flag = false, then in your for-each loop set the flag to true if a match is found, then return the value of the flag after the loop is complete. – nick012000 Aug 11 '18 at 04:57
  • 2
    You may want to find the book and reread it now to see what it actually said. – Thorbjørn Ravn Andersen Aug 11 '18 at 06:32
  • What "explicit invariant"? The test is not the invariant. What "post-condition"? – philipxy Aug 11 '18 at 08:19
  • 1
    "Better" is a highly subjective word. That said, one can tell at a glance what the first version is doing. That the second version does exactly the same thing takes some scrutiny. – David Hammen Aug 11 '18 at 22:40

6 Answers6

56

This is easy.

Almost nothing matters more than clarity to the reader. The first variant I found incredibly simple and clear.

The second 'improved' version, I had to read several times and make sure all the edge conditions were right.

There is ZERO DOUBT which is better coding style (the first is much better).

Now - what is CLEAR to people may vary from person to person. I'm not sure there are any objective standards for that (though posting to a forum like this and getting a variety of peoples inputs can help).

In this particular case, however, I can tell you why the first algorithm is more clear: I know what the C++ iterate over a container syntax looks like and does. I've internalized it. Someone UNFAMILIAR (its new syntax) with that syntax might prefer the second variation.

But once you know and understand that new syntax, its a basic concept you can just use. With the loop iteration (second) approach, you have to carefully check that the user is CORRECTLY checking for all the edge conditions to loop over the entire array (e.g. less than in stead of less-or-equal, same index used for test and for indexing etc).

Lewis Pringle
  • 2,935
  • 1
  • 9
  • 15
  • 4
    New is relative, as it was already in the 2011 standard. Also, the second demo is obviously not C++. – Deduplicator Aug 10 '18 at 16:30
  • An alternate solution if you want to use a single exit point would be to set a flag `longerLength = true`, then `return longerLength`. – Cullub Aug 10 '18 at 17:48
  • @Deduplicator Why isn't the second demo C++? I don't see why not or am I missing something obvious? – Rakete1111 Aug 10 '18 at 18:59
  • 2
    @Rakete1111 Raw arrays don't have any properties like `length`. If it were actually declared as an array and not a pointer, they could use `sizeof`, or if it were an `std::array`, the correct member function is `size()`, there isn't a `length` property. – IllusiveBrian Aug 10 '18 at 19:14
  • @IllusiveBrian: `sizeof` would be in bytes... The most generic since C++17 is [`std::size()`](https://en.cppreference.com/w/cpp/iterator/size). – Deduplicator Aug 10 '18 at 19:51
  • The code never declared 'array', so we are left to guess. It could easily have been a custom class (not a standard library class) - in which case the example would be perfectly valid (though not perfectly good) C++. It also could be some sort of typo, but this didn't seem central to the question. – Lewis Pringle Aug 10 '18 at 20:40
  • Congrats on getting the populist badge for this answer. – David Hammen Aug 11 '18 at 22:41
  • Re *This is easy.* Well, almost. You did get one downvote (not from me). There remain some people whose coding practices harken from a previous millennium and who still think that multiple returns is absolutely anathema. – David Hammen Aug 11 '18 at 23:08
  • @DavidHammend the constraints on programming - the tools languages etc, and the problems expected to solve evolve. My answer was broad enough to be generally applicable in any age (prefer simple over rules). I've gotten lots of downvotes on things I thought were more clear cut. No biggie. Just try to be helpful and give best answer I can. And thx on congrats – Lewis Pringle Aug 11 '18 at 23:38
19

I think for simple loops, such as these, the standard first syntax is much clearer. Some people consider multiple returns confusing or a code smell, but for a piece of code this small, I do not believe this is a real issue.

It gets a bit more debatable for more complex loops. If the loop's contents cannot fit on your screen and has several returns in the loop, there is an argument to be made that the multiple exit points can make the code more difficult to maintain. For example, if you had to ensure some state maintenance method ran before exiting the function, it would be easy to miss adding it to one of the return statements and you would cause a bug. If all the end conditions can be checked in a while loop, you only have one exit point and can add this code after it.

That said, with loops especially it is good to try and put as much logic as possible into separate methods. This avoids a lot of cases where the second method would have advantages. Lean loops with clearly separated logic will matter more than which of these styles you use. Also, if most of your application's code base is using one style, you should stick with that style.

Nathanael
  • 827
  • 4
  • 11
9
int i = 0;
while (i < array.length && array[i] != value)
    i++;
return i < array.length;

[…] everything is more obvious and more in a structured-programming way.

Not quite. The variable i exists outside the while loop here and is thus part of the outer scope, while (pun intended) x of the for-loop exists only within the scope of the loop. Scope is one very important way to introduce structure to programming.

null
  • 3,546
  • 1
  • 11
  • 21
  • 1
    Please see https://en.wikipedia.org/wiki/Structured_programming – ruakh Aug 10 '18 at 19:34
  • 1
    @ruakh I'm not sure what to take away from your comment. It comes across as somewhat passive-aggressive, as if my answer opposes what's written on the wiki page. Please elaborate. – null Aug 14 '18 at 15:27
  • "Structured programming" is a term of art with a specific meaning, and the OP is objectively correct that version #2 conforms to the rules of structured programming while version #1 does not. From your answer, it seemed that you were not familiar with the term of art, and were interpreting the term literally. I'm not sure why my comment comes across as passive-aggressive; I meant it simply as informative. – ruakh Aug 14 '18 at 16:25
  • @ruakh I disagree that version #2 conforms to the rules more in every aspect and explained that in my answer. – null Aug 14 '18 at 17:25
  • You say "I disagree" as if it were a subjective thing, but it's not. Returning from inside a loop is a categorical violation of the rules of structured programming. I'm sure that many structured-programming enthusiasts are fans of minimally-scoped variables, but if you reduce a variable's scope by departing from structured programming, then you have departed from structured programming, period, and reducing the variable's scope doesn't undo that. – ruakh Aug 14 '18 at 18:27
  • @ruakh hence "_not every aspect_" and "_Not quite._" – null Aug 14 '18 at 19:05
2

I'll suggest a third option altogether:

return array.find(value);

There are many different reasons to iterate over an array: Check if a specific value exists, transform the array into another array, calculate an aggregate value, filter some values out of the array... If you use a plain for loop, it's unclear at a glance specifically how the for loop is being used. However, most modern languages have rich APIs on their array data structures that make these different intents very explicit.

Compare transforming one array into another with a for loop:

int[] doubledArray = new int[array.length];
for (int i = 0; i < array.length; i++) {
  doubledArray[i] = array[i] * 2;
}

and using a JavaScript-style map function:

array.map((value) => value * 2);

Or summing an array:

int sum = 0;
for (int i = 0; i < array.length; i++) {
  sum += array[i];
}

versus:

array.reduce(
  (sum, nextValue) => sum + nextValue,
  0
);

How long does it take you to understand what this does?

int[] newArray = new int[array.length];
int numValuesAdded = 0;

for (int i = 0; i < array.length; i++) {
  if (array[i] >= 0) {
    newArray[numValuesAdded] = array[i];
    numValuesAdded++;
  }
}

versus

array.filter((value) => (value >= 0));

In all three cases, while the for loop is certainly readable, you have to spend a few moments to figure out how the for loop is being used and checking that all of the counters and exit conditions are correct. The modern lambda-style functions make the purposes of the loops extremely explicit, and you know for certain that the API functions being called are implemented correctly.

Most modern languages, including JavaScript, Ruby, C#, and Java, use this style of functional interaction with arrays and similar collections.

In general, while I don't think using for loops is necessarily wrong, and it is a matter of personal taste, I've been finding myself strongly favoring using this style of working with arrays. This is specifically because of the increased clarity in determining what each loop is doing. If your language has similar features or tools in its standard libraries, I suggest you consider adopting this style as well!

Deduplicator
  • 8,591
  • 5
  • 31
  • 50
Kevin
  • 731
  • 1
  • 4
  • 15
  • 2
    Recommending `array.find` begs the question, as we then have to discuss the best way to implement `array.find`. Unless you're using hardware with a built-in `find` operation, we have to write a loop there. – Barmar Aug 11 '18 at 00:44
  • 2
    @Barmar I disagree. As I indicated in my answer, a lot of heavily-used languages provide functions like `find` in their standard libraries. Undoubtedly, these libraries implement `find` and its kin using for loops, but that's what a good function does: it abstracts the technical details away from the function's consumer, allowing that programmer to not need to think about those details. So even though `find` is likely implemented with a for loop, it still helps make code more readable, and since it's frequently in the standard library, using it adds no meaningful overhead or risk. – Kevin Aug 11 '18 at 00:54
  • 4
    But a software engineer has to implement these libraries. Don't the same software engineering principles apply to library authors as application programmers? The question is about writing loops in general, not the best way to search for an array element in a particular language – Barmar Aug 11 '18 at 01:02
  • 4
    To put it another way, searching for an array element is just a simple example that he used to demonstrate the different looping techniques. – Barmar Aug 11 '18 at 01:06
2

The two loops have different semantics:

  • The first loop simply answers a simple yes/no question: "Does the array contain the object I'm looking for?" It does so in the most brief manner possible.

  • The second loop answers the question: "If the array contains the object I'm looking for, what is the index of the first match?" Again, it does so in the most brief manner possible.

Since the answer to the second question does provide strictly more information than the answer to the first, you can choose to answer the second question and then derive the answer of the first question. That is what the line return i < array.length; does, anyway.

I believe that it's usually best to just use the tool that fits the purpose unless you can reuse an already existing, more flexible tool. I.e.:

  • Using the first variant of the loop is fine.
  • Changing the first variant to just set a bool variable and break is also fine. (Avoids second return statement, answer is available in a variable instead of a function return.)
  • Using std::find is fine (code reuse!).
  • However, explicitly coding a find and then reducing the answer to a bool is not.
-2

It all boils down to precisely what is meant by 'better'. For practical programmers, it generally means efficient - i.e in this case, exiting directly from the loop avoids one extra comparison, and returning a Boolean constant avoids a duplicate comparison; this saves cycles. Dijkstra is more concerned with making code that is easier to prove correct. [it has seemed to me that CS education in Europe takes 'proving code correctness' far more seriously than CS education in the US, where economic forces tend to dominate coding practice]

PMar
  • 5
  • 1
  • 3
    PMar, performance-wise both loops are pretty much equivalent — they both have two comparisons. – Danila Piatov Aug 10 '18 at 21:19
  • 1
    If one really cared about performance, one would use a faster algorithm. e.g. sort the array and do a binary search, or use a Hashtable. – user949300 Aug 11 '18 at 16:30
  • Danila - you don’t know what data structure is behind this. An iterator is always fast. Indexed access can be linear time, and length need not even exist. – gnasher729 Aug 15 '18 at 07:52