30

I have the follow algorithm which finds duplicates and removes them:

public static int numDuplicatesB(int[] arr) {
    Sort.mergesort(arr);
    int numDups = 0;
    for (int i = 1; i < arr.length; i++) {
        if (arr[i] == arr[i - 1]) {
            numDups++;
} }
    return numDups;
}

I am trying to find the worst case time complexity of this. I know mergesort is nlog(n), and in my for loop I am iterating over the entire data set so that would count as n. I am unsure what to do with these numbers though. Should I just sum them together? If I were to do that, how would I do it?

chopper draw lion4
  • 577
  • 2
  • 6
  • 8

5 Answers5

78
O(n) + O(n log(n)) = O(n log(n))

For Big O complexity, all you care about is the dominant term. n log(n) dominates n so that's the only term that you care about.

Justin Cave
  • 12,691
  • 3
  • 44
  • 53
  • 4
    Another way to think about this is to imagine your O(n) processing was really O(n log n), as though you did two independent sorts. Then you'd have 2 * O(n log n). But constants slough off, so you're back to O(n log n). – Jonathan Eunice Oct 08 '14 at 21:57
  • 4
    @Jonathan While that works in practice, it is very much true that O(n) is _not_ equal to O(n log(n)), so I wouldn't advise using that on a regular basis. –  Oct 08 '14 at 23:49
  • @Emrakul I had a professor use that same reasoning in my algorithms class many years ago. I agree it is less than ideal, but it can be used to justify dropping off non-dominant terms. Personally, I prefer using the T() justification: T1(n) < T2(n) so ignore T1(n). –  Oct 09 '14 at 00:14
  • 18
    @Emrakul actually I think the reasoning is theoretically sound as well as practical. O(n) is a proper subset of O(n log(n)). So if f(n) belongs to O(n) it also belongs to O(n log(n)). – emory Oct 09 '14 at 01:52
  • 17
    It should be noted that when we say `f(n) is O(g(n))` what we're really saying is that the function `f is a member of the set of functions that grows at the rate of at most g(n) over the long term`. This means all members of `O(n)` are also members of `O(n*log(n))`. The `+` in expressions like `O(f(n)) + O(g(n))` actually refer to set union (which of you're really pedantic, you really should use ∪). – Lie Ryan Oct 09 '14 at 10:57
  • 1
    Another way to calculate this is to ask how many times do we do n. We do it log n times. If you add O(n) to that you do it (log n) + 1 times. – 2501 Oct 09 '14 at 11:56
  • 1
    Another observation is that if f(N) is O(N), that means that there is some k0 and n0 such that for all n > n0, f(n) n1, f(n) < k1 * (nlgn). Those statements together mean that if n2 is larger than n0, n1, and then number 2, then for all n which are greater than n2, f(n)+g(n) < (k0n+(k1 * (nlgn)) < (k0+k1) * (nlgn). – supercat Oct 09 '14 at 17:42
  • 1
    @LieRyan: I think `O(f(n)) + O(g(n))` is not really a union. Rather, it's the set `{ h(n) + k(n) : h(n) in O(f(n)), k in O(g(n)) }`. It happens in this case to equal the union, but in general it might not. – Nate Eldredge Oct 10 '14 at 00:59
  • 3
    @LieRyan Originally, it is not set union, but set sum: `A + B = { a + b | a in A, b in B }`. It happens that for sets of the form `O(g(n))` this is the same as set union, as one of the sets is always a subset of the other, and they both are invariant to sums (i.e. A + A = A). (Oops, Nate did write essentially the same). – Paŭlo Ebermann Oct 10 '14 at 12:13
  • Adding O(f) does not make sense. It is a set. You would have to write [this](http://i.imgur.com/bHUCTBy.png) which is obviously true. – Martin Thoma Oct 13 '14 at 20:10
59

Let's reason our way through it and remember the definition of O. The one I'm going to use is for the limit at infinity.

You are correct in stating that you perform two operations with corresponding asymptotic bounds of O(n) and O(nlog(n)) but combining them into a single bound is not as simple as adding the two functions. You know your function takes at least O(n) time and also at least O(nlog(n)) time. So really the complexity class for your function is the union of O(n) and O(nlog(n)) but O(nlog(n)) is a superset of O(n) so really it is just O(nlog(n)).

  • 12
    +1 this should be the answer. It describes the answer more precisely using compsci terms. –  Oct 09 '14 at 00:16
6

If you were going to set it out longhand it would look roughly like this:

Suppose the total time is: a n + b n log(n), where a and b are constants (ignoring lower order terms).

As n goes to infinity (a n + b n log (n)) / n log (n) -> a / log (n) + b -> b

So the total time is O(b n log(n)) = O(n log (n)).

Pieter B
  • 12,867
  • 1
  • 40
  • 65
richardb
  • 161
  • 2
2

Start with the definition of O ():

O (n log n) means "less than C n log n, if n is large".

O (n) means "less than D n, if n is large".

If you add both, the result is less than C n log n + D n < C n log n + D n log n < (C + D) n log n = O (n log n).

In general, if f (n) > C g (n) for large n and some C > 0, then O (f (n)) + O (g (n)) = O (f (n)). And after doing a few cases using the definition of O (), you'll know what you can and can't do.

gnasher729
  • 42,090
  • 4
  • 59
  • 119
1

The big O notation is defined as a set:

enter image description here

So enter image description here contains all functions that are - starting from some arbitrary large point enter image description here - always smaller than g.

Now, when you have a function that is in enter image description here and then execute another one that increases slower than g it is certainly increasing slower than 2g. So executing anything slower than g will not change the complexity class.

More formally:

f, h \in \mathcal{O}(g) \Rightarrow (f+h) \in \mathcal{O}(g)

You can easily prove that.

TL;DR

It is still n log(n)

Martin Thoma
  • 557
  • 5
  • 17