7

Imagine we want to pick m numbers from n numbers so that the difference of the maximum and minimum of the m numbers be minimum, for example if

m = 4
n =6
numbers: 10 12 10 7 5 22

The minimum difference is 5, picking 5, 7, 10, 10 from the numbers.
the first thing that pops into mind is to sort the numbers and pick the n number which has the minimum difference through looping on an m sized window on the n numbers.
I was wondering if there would be a way to do this in a time order less than O(nlogn), maybe through dynamic programming?
EDIT : The m numbers doesn't matter, the only thing the problem wants is the minimum difference (e.g. 5 for the provided example)

Amen
  • 135
  • 8
  • 2
    "so that the difference of the maximum and minimum **of the n numbers** be minimum" -> did you perhaps mean to say "of the m numbers" ? Or better yet just "of the numbers we picked" ? – Mike Nakis May 05 '15 at 09:33
  • How often do your m numbers change compared to how often you need to run your pick-m-from-n query? Is it acceptable to precompute some stuff every time the source numbers change so as to speed up subsequent queries? – Mike Nakis May 05 '15 at 09:57
  • I added an answer. Let me know if this is specific enough. If not I can make corrections where necessary. – Neil May 05 '15 at 12:35
  • @MikeNakis It's a programming challenge question and every instance is a totally different situation, so every time there is a new array of numbers with different size of n and m. so for going with sorting, you should sort every time you want to find the minimum difference. – Amen May 05 '15 at 18:07
  • Right. (I just had to ask!) – Mike Nakis May 05 '15 at 20:00
  • Given that in reality you only have MaxInt possible numbers the complexity of the problem will decrease for large n – Ewan May 06 '15 at 08:38
  • Amen, can you link to that challenge question? Since they usually constrain the input numbers, often to small ranges, it might be possible to do better than O(n log n) due to that after all. Also, I'd like to submit my program (maybe I'm even already a member there :-) – Stefan Pochmann May 06 '15 at 15:30
  • @StefanPochmann Gladly, [Right Here](http://codeforces.ru/problemset/problem/337/A?locale=en)... the numbers _are_ constrained to 4 to 1000 and it can be done using indexing hash table in O(n) but I wanted a more general solution. – Amen May 06 '15 at 16:56
  • Thanks, already a member there and got it accepted. You switched n and m from the original text, btw, so I don't know which n you mean. But I think sort+window is faster here because there are only up to 50 numbers. So it's "O(50 log 50)", and your hash table would be like "O(1000)", no? – Stefan Pochmann May 06 '15 at 18:02
  • @StefanPochmann yes that is correct. – Amen May 06 '15 at 19:25

3 Answers3

7

Answer:

The element distinctness problem for lists of numbers is Θ(n log n) and you can reduce it to your problem in constant time by setting m=2 and checking whether the result is 0. So no, O(n log n) is optimal. (See the comments for a caveat, though)

More explanation:

A bit more explanation for those unfamiliar with the reduction idea: In the mentioned element distinctness problem, you get a list of numbers and you must find out whether they're distinct, i.e., whether there are no duplicates. So given some numbers, how do you solve that task? You could simply give the numbers, along with m=2, to any algorithm that solves Amen's problem. If that tells you "0", then you know there's a duplicate, and if it tells you something larger, then you know there isn't. So if there were an algorithm for Amen's problem faster than n log n, then by using it, you could also solve the distinctness problem faster than n log n. But that problem is already known to not be solvable faster than n log n, meaning Amen's problem isn't, either.

For good measure, here's a Python implementation of the algorithm Amen already mentioned. Sort, find the best window, and output it:

m, n, numbers = 4, 6, [10, 12, 10, 7, 4, 22]

numbers.sort()
i = min(range(n-m+1), key=lambda i: numbers[i+m-1] - numbers[i])
print numbers[i:i+m]

You can see it in action at ideone. For finding just the difference, it gets even simpler:

numbers.sort()
print min(high-low for low, high in zip(numbers, numbers[m-1:]))

(yeah I love to advertise Python)

Stefan Pochmann
  • 429
  • 3
  • 7
  • However, element distinctness can often be solved quite quickly using a hash table (linear expected time but n log n worst case). Still, using a hash table to solve the given problem looks quite tricky, but not impossible. – gnasher729 May 06 '15 at 14:40
  • @gnasher729 Ah, right, good point, I even use hashes like that all the time :-). Although I think the question is about worst case. But yeah, here I just focused on the decision tree model. I still think that's fair, as otherwise you can also sort faster than n log n. But I'll mention this in my answer. – Stefan Pochmann May 06 '15 at 15:14
  • Thank you for your descriptive answer and **thank you** for advertising python! :D – Amen May 06 '15 at 16:01
  • @Amen I'm glad that I added that stuff to my original rather bare minimum answer then :-) – Stefan Pochmann May 06 '15 at 16:30
5

To do this on a list that has not been sorted would require an algorithm that is worse than O(n log n), which is the best you can hope for to sort the list in the first place, meaning on an unsorted list O(n log n) is the best.

However, it should be said that the sorting is an operation which must only be performed once, thus you could sort the list and later directly add successive items in its sorted place in order to maintain a sorted list which is an operation that only requires O(n) time. Alternatively, if elements are initially inserted in such a way as to maintain a sorted array, you won't have to perform a sort later.

So the real question here is: What is the best possible time to pick m numbers from an array of n numbers such that the set of m numbers has a minimal difference between max and min given the array n is sorted?

As it turns out, this can be done in O(n) time. The pseudoalgorithm is as follows:

Given array of size n called A_n containing input
Init values currentMin and currentMax
Init values bestDifference and bestDifferenceIndex

for i = 0, i <= n - m
   currentMin = A_n[i]
   currentMax = A_n[i + m - 1]

   if i = 0 or currentMax - currentMin < bestDifference 
       bestDifference = currentMax - currentMin
       bestDifferenceIndex = i

At the end of this using as input your example, bestDifference should show 5 and bestDifferenceIndex will be 0 (sorted A_n would be {5, 7, 10, 10, 12, 22}, meaning it grabbed {5, 7, 10, 10}).

It is a little misleading to call this O(n log n) because this isn't due to the algorithm itself but rather the sort necessary for the algorithm to work properly, and performing sort each and everytime while guaranteed to always work will certainly be slower than if you worked with a pre-sorted array and avoided the call to sort it altogether.

Hope that helps!

Neil
  • 22,670
  • 45
  • 76
  • 3
    This answer is just a rewording of the phrase "the first thing that pops into mind is to sort the numbers and pick the n number which has the minimum difference through looping on an m sized window on the n numbers" from the original question, only using far, far, far more words than the original phrase. – Mike Nakis May 05 '15 at 14:27
  • 1
    @MikeNakis No, it's not. If you did this each and everytime, it would be O(n log n) time. If you make a distinction, than it is a sort operation that takes O(n log n) time and the actual picking of the numbers that takes O(n) time. That said, if you think it is possible to do better, I encourage you to provide an answer of your own. – Neil May 05 '15 at 14:35
  • 2
    The OP knows that the above short sentence takes `O( n log n )`, so the reasonable thing to presume is that he also knows why it is so, and therefore what role sorting plays in this. So, I maintain that your answer does not introduce anything that the OP has not already demonstrated to know. I would provide an answer of my own, but I asked for a clarification in the comments, to which the OP still has not answered, so if the OP does not bother, I won't either. – Mike Nakis May 05 '15 at 17:54
  • 1
    @MikeNakis While we're assuming he knows why it is O(n log n), lets also assume he knows why you cannot do better than O(n log n), but of course, that's the entire point, isn't it? I feel that "Nope, it can only be done in O(n log n)" would not have been useful as a comment, much less an answer. If you genuinely think you could offer an algorithm better than O(n log n), I would very much like to see it. – Neil May 06 '15 at 07:42
  • 1
    @MikeNakis is right, Amen already described the exact same obvious solution you presented here. All you're doing is his *"looping on an m sized window"*. – Stefan Pochmann May 06 '15 at 14:33
  • @StefanPochmann Not entirely unlike your proposed solution in python. What's your point? – Neil May 06 '15 at 14:44
  • Just trying to help clear up the confusion. "Not entirely unlike" is an understatement, btw. They're the same. Difference is just that I post it only as addendum after my actual answer to the question, and that I acknowledge that Amen already knew it. – Stefan Pochmann May 06 '15 at 14:55
0

As said in another answer, worst case better than O (n log n) is not possible. You can obviously try to be better in some or many cases.

Calculate the minimum and maximum. Then calculate a bucket size so that any optimal group must belong completely to two consecutive buckets, and distribute the values into that number of buckets.

That should work quite well with random data.

gnasher729
  • 42,090
  • 4
  • 59
  • 119