1

Let's consider I have an std::string instance filled with textual data and an std::set<std::string> instance with keywords. I would like to know whether the text stored inside the std::string contains all keywords from std::set<std::string>. To achieve this I considered two possible implementations:

Returning true or false explicitly

bool isContainsAllKeywords(const std::string& textualData,
                           const std::set<std::string>& keywords) {
    for(const auto& keyword : keywords) {
        if(textualData.find(keyword) == std::string::npos)
            return false;
    }
    return true;
}

Returning the result of comparison

bool isContainsAllKeywords(const std::string& textualData,
                           const std::set<std::string>& keywords) {
    auto keywordIterator = keywords.begin();
    for(; keywordIterator != keywords.end(); ++keywordIterator) {
        if(textualData.find(*keywordIterator) == std::string::npos)
            break;
    }
    return keywordIterator == keywords.end();
}

From my examples above which one is cleaner and which one is preferred? Because this question may result opinion-based answers, I would add, that I expect backed up answers.

Anyhow, maybe both of my examples are unclean, so primarily I would like to know how to solve such problems in clean way.


Edit: my question aims to readability of loop-based comparisons where one container is being compared to another one, and not generally to: "How would you know if you've written readable and easily maintainable code?".

Akira
  • 231
  • 1
  • 3
  • 9
  • 1
    Possible duplicate of [How would you know if you've written readable and easily maintainable code?](https://softwareengineering.stackexchange.com/questions/141005/how-would-you-know-if-youve-written-readable-and-easily-maintainable-code) – gnat Nov 25 '17 at 09:01
  • 2
    id votes for the first one. you could also add a string.contains function that returns bool to make it even more clear – Ewan Nov 25 '17 at 10:35

2 Answers2

5

The note in Christophe's answer is important - your approach is possibly inefficient (especially for long search strings), and might not be correct (does the keyword "if" appear if the search string contains the word "difficult"?) Also, the fact that you have a set suggests that you're supposed (assuming this is an exercise) to use its lookup functionality. This means they probably want you to tokenize the string (split it into words), and then check for each word whether it appears in the keywords set.

Disregarding that, the best way to write this code is

bool contains(const std::string& haystack, const std::string& needle) {
  return haystack.find(needle) != std::string::npos;
}
bool containsAllKeywords(const std::string& haystack,
                         const std::set<std::string>& keywords)
{
  return std::all_of(keywords.begin(), keywords.end(),
   [&](const std::string& keyword) { return contains(haystack, keyword); });
}

The standard library has a lot of algorithms that give names to complex loop constructs. This makes your code a lot clearer. (Search for the "no raw loops" principle for more information.) The triple all_of, none_of and any_of is capable of replacing pretty much any loop with an early boolean return.

In addition, for readability I recommend always making small wrapper functions for slightly weird interactions - string doesn't allow you to ask "do you contain another string", instead you have to write s.find(t) != std::string::npos, or in words, "is the position of another string within you not equal to 'no position'".

Finally, perhaps you are under some short-sighted company policy that says that all boolean-returning functions must start with "is", but unless you are, never write a function with the name "isContainsAllKeywords", or any other name that does not expand to a meaningful phrase.

Sebastian Redl
  • 14,950
  • 7
  • 54
  • 51
  • Thank you for your suggestions which go beyond my question, they were very instructive. – Akira Nov 25 '17 at 13:50
1

Both variants are correct.

In the first one, the return within the loop shows clearly that only one failure is sufficient to get false, and that you get true only if none of the search failed. This is very self explanatory.

In the second one, when you see the return, you first have to link the condition with the iterator loop, then you have to see which conditions make you leave the loop earlier. It's not difficult to understand, but it requires a higher mental charge than the first one.

Note: you iterate many times through your string for looking for the different keywords. Have you considered different approaches such as iterating through the string and flagging keywords, or using a trie (not a tree !) ?

Christophe
  • 74,672
  • 10
  • 115
  • 187
  • Thank you for your detailed answer and for pointing out the inefficiency of my example codes. – Akira Nov 25 '17 at 13:35