2

I am creating an app where the user enters 8 characters. After he enters the string I have to see if it is an eight letter word. If not, check if contains a seven letter word etc.

I am checking against a given pool of 150k+ words. I only care about the longest possible match. Is there a better way then this one:

  • WordPool.Contains(word.substring(0,8))
  • WordPool.Contains(word.substring(0,7)),
    WordPool.Contains(word.substirng(1,7))
  • WordPool.Contains(word.substring(0,6)),
    WordPool.Contains(word.substirng(1,6)),
    WordPool.Contains(word.substirng(2,6))
  • etc...

Edit

I forgot to add that I am checking against an english dictionary.

So far I am using this:

for(i = 8; i >= 3; i--)
  for(j = 0; j <= 8 - i; j++)
      if(words.contains(word.substring(j, i))
         //do something

Edit 2 I have been using the approach defined above just with a minor change. I am using a few background agents which all search for a word of a certain length. They all then return a result and I just pick the one which gives the user the highest score.

  • While [Searching integer sequences](http://programmers.stackexchange.com/questions/218222/searching-integer-sequences) is about integers, would a modification on that work for you? –  Nov 26 '13 at 19:51
  • You also have to check all possible sub strings? What language is this? Do you have any performance guarantees about the Contains method? – Ampt Nov 26 '13 at 19:55
  • @Ampt I need to find the longest possible match. For example if they enter the word "Carpooler" I want to stop right there despite having some smaller substrings like car and pool. – Ivan Crojach Karačić Nov 26 '13 at 20:03
  • But would matching `pool` to `ajfpoolz` be allowed? – Ampt Nov 26 '13 at 20:07
  • @Ampt forgot to mention that I am checking it for the english (dictionary) – Ivan Crojach Karačić Nov 26 '13 at 20:09
  • 1
    @IvanCrojachKaračić that still doesn't answer my question. If the user types in garbage like `afjpoolz` should it match to `pool`? – Ampt Nov 26 '13 at 20:10
  • There is no garbage. It's a dictionary with all English words (or at least most of them) – Ivan Crojach Karačić Nov 26 '13 at 20:12
  • @IvanCrojachKaračić care to join us in the programmers chat real quick for some clarification? http://chat.stackexchange.com/rooms/21/the-whiteboard – Ampt Nov 26 '13 at 20:13
  • @IvanCrojachKaračić you wrote *where the user enters 8 characters*. Did you mean just that or *up to 8 characters* or *for example 8 characters*? – Wolf Dec 02 '13 at 12:21
  • What do you care about? Fast execution or low memory footprint? Both lead to different answers. – Christian Dec 03 '13 at 20:07

1 Answers1

2

Preprocessing your word pool into a trie would make the longest search easy. Just traverse the trie until you can't go any further. You'd still have to try it for each starting position, though. For example:

wordPool.longestMatch("deadbeef");
wordPool.longestMatch("eadbeef");
wordPool.longestMatch("adbeef");
wordPool.longestMatch("dbeef");
wordPool.longestMatch("beef");
wordPool.longestMatch("eef");
wordPool.longestMatch("ef");
wordPool.longestMatch("f");

You could also short-circuit if you already found a match longer than the length of the remaining subsequences. The example would find "dead" on the first line, and "beef" on the fifth line, so you could automatically skip the last three subsequences.

Karl Bielefeldt
  • 146,727
  • 38
  • 279
  • 479