2

This is a very simple function to find if a string is composed of unique characters. I believe that this is a O(n) time complexity as it loops once and has a single if condition. Am I correct? Is there any way it can be optimized further?

public class Program
 {
    public static void Main(string[] args)
    {
        DuplicateChars(args[0].ToLower());
    }

    public static bool DuplicateChars(string str)
    {
        bool[] charfound = new bool[26];
        int index = 0;

        for(int i = 0; i <= str.Length ; ++i)
        {
            index = (byte)(str[i]);
            //subtract by ascii value of small a 97.
            index -= 97;
            if(charfound[index])
                return true;
            else
                charfound[index] = true;
        }  

        return false;
    }
}
Pradeepl
  • 33
  • 6
  • 2
    str.ToLower is O(N) anyway – Caleth Dec 01 '16 at 12:58
  • @Caleth . Thanks. I made a change to remove ToLower() and move it to the calling function. I am only looking at the time complexity of DuplicateChars function. – Pradeepl Dec 01 '16 at 13:02
  • 3
    It will access the array`charfound` out of bounds if the string has non-alhpa chars. –  Dec 01 '16 at 13:19
  • @ThomasKilian Thanks. One of assumptions is the we are passing characters in the alphabet (ascii range of 65 to 97 ) – Pradeepl Dec 01 '16 at 13:33
  • 1
    see [What is O(…) and how do I calculate it?](http://softwareengineering.stackexchange.com/questions/132331/what-is-o-and-how-do-i-calculate-it) – gnat Dec 01 '16 at 13:34
  • @PradeepLoganathan You might want to move ToLower back to your function, because if you start with a check if the input is not longer then 26 characters and then call ToLower because otherwise you will still have an O(N) operation in your process anyhow. – Pieter B Dec 01 '16 at 13:42
  • why (byte)? index is int. No check for out of bounds. – paparazzo Dec 06 '16 at 12:18
  • Are you sure that you want `i` and `I` to be different when on a Turkish computer? Also it will crash when the string contains any characters apart from ASCII letters. – CodesInChaos Dec 06 '16 at 12:42
  • Actually, a comment of mine got me thinking... Your original solution already is O(1), since it will loop at most 27 times. My answer just highlighted that fact and used it as a shortcut ;) – hoffmale Dec 07 '16 at 10:50

3 Answers3

5

Given your sample code I take it that the following assumption is true:

  • str only contains character values from 'a' to 'z'

Given that, we can immediately see an optimization opportunity: if str.Length is greater than charfound.Length, there will be a duplicated char, so we can include a check for that at the beginning of the function.

public class Program
{
    public static void Main(string[] args)
    {
        DuplicateChars(args[0].ToLower());
    }

    public static bool DuplicateChars(string str)
    {
        bool[] charfound = new bool[26];
        int index = 0;

        if(str.Length > charfound.Length) return true;

        for(int i = 0; i <= str.Length ; ++i)
        {
            index = (byte)(str[i]);
            //subtract by ascii value of small a 97.
            index -= 97;
            if(charfound[index])
                return true;
            else
                charfound[index] = true;
        }  

        return false;
    }
}

After this change, the worst case input would be a string consisting of a permutation of "abcdefghijklmnopqrstuvwxyz", which would mean that the function is O(1) in the worst case. (Proof of this is left as an exercise for the reader.)

EDIT: As pointed out by @Pieter B in the comments, you'd probably want to move the call to ToLower() from Main to right after the line if(str.Length > charfound.Length) return true; so you don't spend O(n) time in total.

hoffmale
  • 777
  • 5
  • 10
  • Either you accept your algorithm handles 'a' and 'A' as separate letters (which may or may not be a bug), or `args[0].ToLower()` should be part of the algorithm, which makes it O(n) – Vincent Savard Dec 01 '16 at 13:42
  • .ToLower should be called after the length check. But for the rest I like this answer and the explanation. – Pieter B Dec 01 '16 at 13:43
  • @PieterB That would make more sense indeed. – Vincent Savard Dec 01 '16 at 13:43
  • 1
    @PieterB: well, it's only O(1) because of the optimization based on my assumption on the input alphabet, which could possibly be wrong, so I'd rather be careful before I make a too general statement, since his original code was O(n). (I guess the assumption was correct since my answer got accepted, but still...) – hoffmale Dec 01 '16 at 13:44
  • @Vincent: how you get a valid input sequence for this function is up to you (i.e. the call of `args[0].ToLower()` is outside of the scope of the given question). That said, you could move it back in, but that changes the contract of `DuplicateChars`, since `str`can now contain chars in the range [a-zA-Z]. – hoffmale Dec 01 '16 at 13:47
  • @hoffmale I don't believe it's out of scope, it's explicitly in OP's program. I just think your answer is slightly misleading because if OP replaces his function with yours, he'll still never achieve better than O(n) performance due to the call to `.ToLower`, and as PieterB said, it can easily be fixed by moving it under the length check. I understand your point and maybe it should be a pre-condition of the function, but nonetheless I feel like adding a disclaimer would be beneficial to everybody. – Vincent Savard Dec 01 '16 at 13:50
  • @Vincent: It's out of scope given the question as it is currently worded and a clarifying comment from the OP. (Though I agree that it would be a sensible addition.) – hoffmale Dec 01 '16 at 13:55
  • I missed OP's comments. You're definitely right, my bad. – Vincent Savard Dec 01 '16 at 14:02
  • It is still O(n) just a bounded O(n). 20 will be roughly twice as long as 10. – paparazzo Dec 06 '16 at 14:20
  • @Paparazzi: there is no such thing as "bounded O(n)", since the big-O notation is about describing the limiting behavior. Generally, f(x) is O(g(x)) if there exists a c > 0 and x0 such that f(x) is smaller or equal to c*g(x) whenever x is greater than x0. You could use c=26 and x0=0 or c=1 and x0=27 for my solution. (Also, runtimes for length 20 should not really be double the runtime for length 10 on average for arbirtrary inputs because of the birthday paradoxon) – hoffmale Dec 06 '16 at 15:19
  • Then by that argument any string comparison would be O(1). Just check if it longer than the number of characters in the total character set. – paparazzo Dec 06 '16 at 16:21
2

You can improve it if you have extra information about the size of your alphabet.

Let's assume your string can ONLY have ASCII characters. That means that you can have at most 128 unique characters in your string. Any string with more than 128 characters will have duplicate characters.

What that means is that you only have to execute DuplicateChars if the string length is smaller or equal to 128, which places a constant upper bound on n and makes the algorithm O(1).

-1

Still O(n) but a lot less code

public static bool StringHasDup(string s)
{
    return (s.Length != new HashSet<char>(s.ToCharArray()).Count);
}
paparazzo
  • 1,937
  • 1
  • 14
  • 23