13

Here is a simplified sample. Basically, it does checks on a string from a string list. If the check passes, it will remove that string (filterStringOut(i);), and it is no longer necessary to continue any other checks. Thus continue to the next string.

void ParsingTools::filterStrings(QStringList &sl)
{
    /* Filter string list */
    QString s;
    for (int i=0; i<sl.length(); i++) {
        s = sl.at(i);

        // Improper length, remove
        if (s.length() != m_Length) {
            filterStringOut(i);
            continue; // Once removed, can move on to the next string
        }          
        // Lacks a substring, remove
        for (int j=0; j<m_Include.length(); j++) {
            if (!s.contains(m_Include.at(j))) { 
                filterStringOut(i); 
                /* break; and continue; */ 
            }
        }
        // Contains a substring, remove
        for (int j=0; j<m_Exclude.length(); j++) {
            if (s.contains(m_Exclude.at(j))) { 
                filterStringOut(i); 
                /* break; and continue; */ 
            }
        } 
    }
}

How ought one continue the outer loop from inside a nested loop?

My best guess is to use goto and place a label at the end of the outer loop. That prompted me to ask this question, given how taboo goto can be.

In the c++ IRC chat, it was suggested that I place the for loops in bool functions, that return true if a check passed. thus

if ( containsExclude(s)) continue;
if (!containsInclude(s)) continue;

or that I simply create a local boolean, set it to true break, check bool and continue if true.

Given that I am using this in a parser, I actually do need to prioritize performance in this example. Is this a situation where goto is still useful, or is it a case where I need to restructure my code?

Anon
  • 3,565
  • 3
  • 27
  • 45
  • Introducing the boolean function will make the code more readable and DRY. So why don't you just do it and try out if this has some noteable performance impact (which I doubt)? – Doc Brown May 04 '18 at 06:11
  • 3
    Possible duplicate of [Is micro-optimisation important when coding?](https://softwareengineering.stackexchange.com/questions/99445/is-micro-optimisation-important-when-coding) – Doc Brown May 04 '18 at 06:11
  • @DocBrown its for a parser that gets called millions of times, so yes, it is important. As to it making the code more readable, take note that `goto` is considered a good practice for breaking out of nested loops: https://stackoverflow.com/a/9695942/2158002 -- this is a similar but not identical situation, and for all I know, there might be a c++ feature that handles this situation. – Anon May 04 '18 at 06:21
  • 3
    C++ doesn't have labelled breaks, therefore the canonical and accepted practice is to emulate them via `goto`, despite its bad reputation. Don't fear names - fear concepts. – Kilian Foth May 04 '18 at 06:24
  • 2
    @Akiva: so did you actually measure the performance difference? And just because you have heard "goto" is an acceptable way of breaking out of a nested loop does not imply the alternative of introducing a function with a clear and concise name would not be more readable. – Doc Brown May 04 '18 at 06:29
  • @DocBrown Need to be honest; I have not developed the proper skills or tools to properly benchmark my code. That is something I should definitely look into. – Anon May 04 '18 at 06:32
  • 3
    @Akiva: benchmarking this is pretty simple, you don't need special tools or skills: set up a small program which calls this function with some test data in a loop several times (maybe several million times), and measure the running time with a stopwatch. Do the same with the cleaned-up code. I bet the difference will be negligible (of course, don't forget to use compiler optimizations). – Doc Brown May 04 '18 at 06:41
  • 5
    Why are you reinventing [the wheel](http://www.cplusplus.com/reference/algorithm/remove_if/)? – Alexander May 04 '18 at 17:30
  • @Alexander Amazing. I am going to try to incorporate this with Qt. – Anon May 05 '18 at 06:17
  • @dagneiles and @DavidArno have great answers. However, if for some reason you must keep the code about the same, you could add a Boolean flag `hasBeenRemoved`. Set it in the inner tests, but in the outer, main loop, test it before starting the next test. – user949300 May 05 '18 at 18:08

7 Answers7

17

Don't nest: convert to functions instead. And have those functions return true if they perform their action and the subsequent steps can be skipped; false otherwise. That way you completely avoid the whole problem of how to break out of one level, continue within another, etc as you just chain the calls with || (this assumes C++ stops processing an expression on a true; I think it does).

So your code might end up looking like the following (I haven't written C++ in years, so it likely contains syntax errors, but should give you the general idea):

void ParsingTools::filterStrings(QStringList &sl)
{
    QString s;
    for (int i=0; i<sl.length(); i++) {
        s = sl.at(i);

        removeIfImproperLength(s, i) ||
        removeIfLacksRequiredSubstring(s, i) ||
        removeIfContainsInvalidSubstring(s, i);
    }
}

bool removeIfImproperLength(QString s, int i) {
    if (s.length() != m_Length) 
    {
        filterStringOut(i);
        return true;
    }
    return false;
}          

bool removeIfLacksSubstring(QString s, int i) {
    for (int j=0; j<m_Include.length(); j++) {
        if (!s.contains(m_Include.at(j))) { 
            filterStringOut(i);
            return true; 
        }
    }

    return false;
}

bool removeIfContainsInvalidSubstring(QString s, int i) {
    for (int j=0; j<m_Exclude.length(); j++) {
        if (s.contains(m_Exclude.at(j))) { 
            filterStringOut(i); 
            return true;
        }
    } 

    return false;
}
David Arno
  • 38,972
  • 9
  • 88
  • 121
  • 1
    "reoveIfImproperLength" Typo. :) – Neil May 04 '18 at 08:04
  • 5
    It is better if the three condition-checking functions remain side-effect free (i.e. instead of doing "remove-if", just return the boolean condition and let the caller (`ParsingTools::filterStrings`) call the `filterStringOut(i)` function, as shown in dagnelies's answer. – rwong May 04 '18 at 09:28
  • So you're using function call semantics as the basis for C++'s missing break statements. Very clever. – Ryan Reich May 15 '18 at 05:54
13

From a more bird view perspective, I would refactor the code so that it looks like this... (in pseudo code, it's too long ago I touched C++)

void filterStrings(sl)
{
    /* Filter string list */
    for (int i=0; i<sl.length(); i++) {
        QString s = sl.at(i);
        if(!isProperString(s)) {
           filterStringOut(i);
        }
     }
}

bool isProperString(s) {

        if (s.length() != m_Length)
            return false; // Improper length

        for (int j=0; j<m_Include.length(); j++) {
            if (!s.contains(m_Include.at(j))) { 
                return false; // Lacks a substring
            }
        }

        for (int j=0; j<m_Exclude.length(); j++) {
            if (s.contains(m_Exclude.at(j))) { 
                return false; // Contains a substring
            }
        }

        return true; // all tests passed, it's a proper string
}

This is IMHO cleaner because it clearly separates what constitutes a proper string, and what you do when it isn't.

You could even go one step further and use built-in filter methods like myProperStrings = allMyStrings.filter(isProperString)

dagnelies
  • 5,415
  • 3
  • 20
  • 26
10

I really like how @dagnelies starts. Short and to the point. A good use of high level abstraction. I'm only tweaking it's signature and avoiding a needless negative.

void ParsingTools::filterStrings(QStringList &sl)
{
    for (int i=0; i<sl.length(); i++) {
        QString s = sl.at(i);
        if ( isRejectString(s) ) {
            filterStringOut(i);
        }
    }
}

However, I like how @DavidArno breaks out the requirement tests as individual functions. Sure the whole thing becomes longer but every function is wonderfully tiny. Their names avoid the need for comments to explain what they are. I just don't like that they take on the extra responsibility of calling filterStringOut().

By the way, yes C++ will stop evaluation of a || chain on a true so long as you haven't overloaded the || operator. This is called short circuit evaluation. But this is a trivial micro optimization that you are free to ignore as your read the code so long as the functions are side effect free (such as the ones below).

The following should make the definition of a reject string clear without dragging you through needless details:

bool isRejectString(QString s) {
    return isDifferentLength(s, m_Length) 
        || sansRequiredSubstring(s, m_Include)
        || hasForbiddenSubstring(s, m_Exclude)
    ;
}

Relieved of the need to call filterStringOut() the requirement test functions become shorter and their names are much simpler. I've also put everything they're dependent on in their parameter list to make it easier to understand them without looking inside.

bool isDifferentLength(QString s, int length) {
    return ( s.length() != length );
}

bool sansRequiredSubstring(QString s, QStringList &include) {
    for (int j=0; j<include.length(); j++) {
        QString requiredSubstring = include.at(j);
        if ( !s.contains(requiredSubstring) ) { 
            return true; 
        }
    }
    return false;
}

bool hasForbiddenSubstring(QString s, QStringList &exclude) {
    for (int j=0; j<exclude.length(); j++) {
    QString forbiddenSubstring = exclude.at(j);
        if ( s.contains(forbiddenSubstring) ) { 
            return true; 
        }
    }
    return false;
}

I added requiredSubstring and forbiddenSubstring for the humans. Will they slow you down? Test and find out. It's easier to make readable code actually fast then to make prematurely optimized code readable or actually fast.

If the functions turn out to slow you down look into inline functions before you subject the humans to unreadable code. Again, don't assume this will give you speed. Test.

I think you'll find any of these more readable than nested for loops. Those, combined with the if's, were starting to give you a real arrow anti-pattern. I think the lesson here is that tiny functions are a good thing.

candied_orange
  • 102,279
  • 24
  • 197
  • 315
  • 1
    Although it’s a conbination of other two answers, this adds a lot of value. Make it “for humans”, test performance before you decide to “obscure” the code, and make easy to test. Great stuff! – carlossierra May 11 '18 at 11:26
  • 1
    Actually, my use of `! isProperString` rather than `isImproperString` was intentional. I tend to avoid negations in function names. Imagine you need to check if it's indeed a proper string later on, you'll need `!isImproperString` which is IMHO more prone to confusion because of the double negation. – dagnelies May 12 '18 at 08:36
  • @dagnelies better? – candied_orange May 15 '18 at 01:37
4

Just use a lambda for the predicate, and then use the power of standard algorithms and short-circuiting. No need for any convoluted or exotic control-flow:

void ParsingTools::filterStrings (QStringList& list)
{
    for (int i = list.size(); i--;) {
        const auto& s = list[i];
        auto contains = [&](const QString& x) { return s.contains(x); };
        if (s.size() != m_Length
                || !std::all_of(m_Include.begin(), m_Include.end(), contains)
                || std::any_of(m_Exclude.begin(), m_Exclude.end(), contains))
            filterStringOut(i);
    }
}
Deduplicator
  • 8,591
  • 5
  • 31
  • 50
1

There is also the option to make the content of the outer loop (the one you want to continue) a lambda, and simply use return.
It's surprisingly easy if you know lambdas; you basically start your loop interior with [&]{ and end it with }(); inside you can use return; at any time to leave it:

void ParsingTools::filterStrings(QStringList &sl)
{
    /* Filter string list */
    QString s;
    for (int i=0; i<sl.length(); i++) {

      [&]{    // start a lamdba defintion

        s = sl.at(i);

        // Improper length, remove
        if (s.length() != m_Length) {
            filterStringOut(i);
            // continue; // Once removed, can move on to the next string
            return; // happily return here, this will continue 
        }          
        // Lacks a substring, remove
        for (int j=0; j<m_Include.length(); j++) {
            if (!s.contains(m_Include.at(j))) { 
                filterStringOut(i); 
                /* break; and continue; */  return;  // happily return here, this will continue the i-loop
            }
        }
        // Contains a substring, remove
        for (int j=0; j<m_Exclude.length(); j++) {
            if (s.contains(m_Exclude.at(j))) { 
                filterStringOut(i); 
                /* break; and continue; */  return; // happily return here, this will continue the i-loop
            }
        } 

      }()   // close/end the lambda definition and call it

    }
}
Aganju
  • 1,453
  • 13
  • 15
  • 3
    (1) To actually *call* the lambda *right away*, at the end of the closing curly brace, one must do the call (with a pair of parentheses, potentially with the list of arguments or without). (2) all three places that use `continue` and `break` must be replaced with `return`. Your code appears to leave the first place (which uses `continue`) unchanged, but that must be changed as well, because the code is inside the lambda and the `continue` statement couldn't find a scope that is a loop. – rwong May 05 '18 at 06:18
  • I wrote that while waiting on red lights. Corrected. – Aganju May 05 '18 at 15:27
1

I think @dganelies has the right idea as a starting point, but I think I'd consider going a step further: write a generic function that can carry out the same pattern for (almost) any container, criteria and action:

template <class Container, class Action, class Condition>
void map_if(Container &container, Action action, Condition cond) {
    for (std::size_t i = 0; i < container.length(); i++) {
        auto s = container.at(i);
        if (cond(s))
            action(i);
    }
}

Your filterStrings would then just define the criteria, and pass the appropriate action:

void ParsingTools::filterStrings(QStringList const &sl)
{
    auto isBad = [&](QString const &s) {

        if (s.length() != m_Length)
            return true;

        for (int j = 0; j < m_Include.length(); j++) {
            if (!s.contains(m_Include.at(j))) {
                return true;
            }
        }

        for (int j = 0; j < m_Exclude.length(); j++) {
            if (s.contains(m_Exclude.at(j))) {
                return true;
            }
        }
        return false;
    };

    map_if(sl, filterStringOut, isBad);
}

Of course, there are other ways to approach that basic problem as well. For example, using the standard library, you seem to want something on the same general order as std::remove_if.

Jerry Coffin
  • 44,385
  • 5
  • 89
  • 162
1

Several answers suggest a major refactor of the code. This is probably not a bad way to go, but I wanted to provide an answer which is more in line with the question itself.

Rule #1: Profile before optimizing

Always profile the results before attempting an optimization. You may find yourself wasting a great deal of time if you don't.

That being said...

As it is, I have personally tested this sort of code on MSVC. The booleans are the way to go. Name the boolean something semantically meaningful like containsString.

    ...
    boo containsString = true; // true until proven false
    // Lacks a substring, remove
    for (int j=0; j<m_Include.length(); j++) {
        if (!s.contains(m_Include.at(j))) { 
            filterStringOut(i); 
            /* break; and continue; */ 
            containsString = false;
        }
    }
    if (!containsString)
        continue;

On MSVC (2008), in release mode (typical optimizer settings), the compiler optimized a similar loop down to exactly the same set of opcodes as the goto version. It was smart enough to see that the value of the boolean was directly tied to control flow, and elided everything away. I have not tested gcc, but I presume it can do similar types of optimization.

This has the advantage over goto of simply not raising any concern by purists who consider goto to be harmful, without sacrificing a single instruction's worth of performance.

Cort Ammon
  • 10,840
  • 3
  • 23
  • 32