14

I recently asked a question on CodeReview SE where using a binary search was recommended, and the author of the answer proved that it was faster than my implementation. During my research on the topic, I came across a table that shows the complexities of a binary search:

These are the complexities of a binary search −

Worst-case Best-case Average Worst-case space complexity
O(log n) O(1) O(log n) O(1)

I think I'm misunderstanding the information this table is meant to relay. I'm perceiving it as:

The average performance of a binary search is the worst-case performance.

However, if this were true, wouldn't a binary search be avoided in many cases? I'm new to this level of optimization so my impediment could just be some underlying prerequisite that I don't have knowledge of yet.

Edit: For clarity in hindsight, the question I sought answers to is below this edit. The one above it is strictly a questioning of my reasoning during my research phase that I included to demonstrate my lack of understanding. My apologies for any confusion on the matter.


What is meant by "the complexities of a binary search", what are those complexities, and what is this table actually telling me relative to those complexities?

Hazel へいぜる
  • 1,165
  • 1
  • 8
  • 19
  • 15
    This question is strange: you don't favor X over Y just because X has some properties (in isolation). You favor X over Y because X has some properties which give an advantage **when compared to the related properties of Y**. But in your case, you did not even mention "Y", so why should one avoid X? – Doc Brown Oct 05 '21 at 18:56
  • 15
    See also: [What is O(...) and how do I calculate it?](https://softwareengineering.stackexchange.com/questions/132331/what-is-o-and-how-do-i-calculate-it) – Doc Brown Oct 05 '21 at 18:59
  • 1
    @DocBrown I'm not asking why binary searching is faster, nor if I should favor it over my currently implemented solution. I'm asking what is meant by complexity, what those complexities are in binary searching, and what the table posted conveys, related to that information. As such, the *Y* is irrelevant to my question. – Hazel へいぜる Oct 05 '21 at 19:05
  • @DocBrown for your see also, thanks for the reading materials! I likely wouldn't have come across that in my research without you posting it. – Hazel へいぜる Oct 05 '21 at 19:06
  • 6
    It's poor terminology or poor writing. They are referring to the *computational* complexity, given in Big-O notation. – user207421 Oct 06 '21 at 03:39
  • Unrelated: "Although the basic idea of binary search is comparatively straightforward, the details can be surprisingly tricky" -Donald Knuth – qwr Oct 06 '21 at 05:21
  • @DocBrown I interpret the question perhaps as simply suggesting that, mathematically, an average cannot be equal to a maximum, unless all values are the same (but certainly a binary search is not "avoided", i.e. not performed, in the average, or even the best, case). That would make the question a misunderstanding of time complexity (that has essentially nothing to do with binary search). – Bernhard Barker Oct 06 '21 at 08:59
  • 2
    @user207421 lack of knowledge and as a result, poor terminology. I'm self taught and just never needed to know this level of detail when it comes to optimization. As such, I had never heard about time complexity prior to the research that fueled this post. – Hazel へいぜる Oct 06 '21 at 15:36
  • @BernhardBarker that's precisely what the issue was, a misunderstanding of time complexity during my research on the topic of binary search. – Hazel へいぜる Oct 06 '21 at 15:37

3 Answers3

40

The full term for this is "Time Complexity" and you'll want to use that if you are searching. This is a core concept in computer science. I am not going to attempt to explain it in detail but here's a high-level explanation:

The idea of time complexity is to understand how the performance of an algorithm relates to its input size(s). A time complexity of O(1) means 'constant time'. In other words, the performance of the algorithm doesn't change with the size of the input. I think in this case, the best case is when the item you are looking for happens to be the middle item (the first one examined). Best case performance tends to be the least interesting or useful of these measures in my experience - with the exception of when the best case is really bad.

O(log n) means that the time it takes is related to a logarithm of the input. In this case the base of that is 2. So if the list has 16 elements, it will take on average about 4 checks to find an item. If the list has 30,000 elements, on average you should expect it take something like 15 checks.

The fact that the worst case is the same as the average means that you'll never need more than that many log₂(n) checks to find something. It could be better but it will never be worse. That's actually a good thing. Some algorithms have good average times but really bad worst case scenarios. That makes it hard to predict performance in practice. It's like driving on a freeway; it might be the fastest route most of the time but if there's a bad wreck, you could be stuck for hours. If you want to make sure you get somewhere on time, you might be better off with a slower (on average) route that is more predictable.

Addendum: I failed to address the 'space complexity' entry in the table. This is a very similar concept to 'time complexity' (or more properly 'computational complexity') except that it refers to the amount of memory or storage that is required to execute the algorithm. In this case, it's constant which essentially means it doesn't take anymore memory (or other storage) to search a list of a billion items than it does to search one of 16 items. The lists themselves obviously do, but the memory allocated to the execution of the search algorithm remains the same. In other algorithms, however, it's not uncommon to trade space for time e.g. an index on a database table requires extra storage but can greatly improve the performance of lookups.

JimmyJames
  • 24,682
  • 2
  • 50
  • 92
  • 9
    As an example for a similar data structure with worse worst-case complexity, consider hash tables: O(1) typical lookup speed (most of the time, can find correct element immediately), but O(n) worst case (might have to search all elements). – amon Oct 05 '21 at 17:35
  • 2
    @amon Yes exactly. If your hashing algorithm is bad, it devolves into a linear search of an unsorted list. – JimmyJames Oct 05 '21 at 17:37
  • 1
    A common situation where you want to pay attention to the best-case time is sorting: given two O(n log n) sorting algorithms, the best-case situation (sorting an already-sorted list) is fairly common, and you'd prefer the algorithm with O(n) best-case performance over the one with O(n log n) best-case. – Mark Oct 06 '21 at 03:47
  • 3
    Adding to the fun, it's often also helpful to look at the prerequisites of an algorithm. Binary search is O(log n), but it requires that the input be sorted first. A linear search of an unsorted list is O(n), but it (obviously) doesn't require that the input be sorted. And, that's before real-world complications kick in, too. – minnmass Oct 06 '21 at 04:37
  • 1
    It is worth to add constant factor to your explanation – Heisenberg Oct 06 '21 at 06:06
  • 1
    @Mark I don't think that's true at all: you wouldn't usually care very much about the best case of a sorting algorithm. By contrast, you *really* want to avoid a bad (i.e. usually quadratic) worst-case runtime. Given the choice between two algorithms, one which always uses log-normal runtime, and one that varies between linear and quadratic time, you'd virtually always pick the first algorithm, despite its worse best-case characteristic. – Konrad Rudolph Oct 06 '21 at 07:45
  • @Mark "the best-case situation (sorting an already-sorted list) is fairly common" - citation needed. There are some cases where your data may follow a certain pattern (like already being almost fully sorted), in which case a certain algorithm may make sense. But if the data is random or is not known to conform to any pattern (which is the most common use case for sorting in my experience), this is, by definition, the average case (on average, that is). Also, the best case of a sorting algorithm isn't necessarily on a sorted list. Average/worst case is typically much more useful than best case. – Bernhard Barker Oct 06 '21 at 09:56
  • 1
    @Mark In fact, one of the most popular fast sorting algorithms, Quicksort, also happens to be one which can perform poorly on already-sorted data. Average case is O(n log n), but worst case is O(n^2), and nearly sorted data is *more* likely to trigger that worst case, depending on implementation. This is why some more recent sorting algorithms (often known as Introspective Sort or Introsort for short) use Quicksort at first, but switch to Mergesort or something else when they detect such a scenario. – Darrel Hoffman Oct 06 '21 at 13:16
  • @Heisenberg I had considered mentioning constant factors but decided it was too much detail for this answer. I'll mull it over some more and maybe reconsider. While simple to apply, I think it's easier to understand once you are comfortable with the big picture as it follows logically from that. – JimmyJames Oct 06 '21 at 15:03
  • 3
    I think you've made an error with the example - with 16 elements, the worst-case will examine 4 of them; the average case almost 3 (and both scale proportionally to log _n_, but with different constants of proportion). – Toby Speight Oct 06 '21 at 15:04
  • 1
    Since the table mentioned in the question contains a "Space Complexity" entry, it would be useful to take this into account in this answer. Complexity in general means Algorithmic Complexity or Computational Complexity which is about measuring "something" in relation to the input size. The _something_ really matters. Search algorithms typically measure _the number of comparisons_ (not the time), which is not necessarily the limiting factor -- think tape disk, instead of RAM. Space Complexity, then, is measuring the additional space taken: stack depth, heap size, etc... – Matthieu M. Oct 06 '21 at 15:14
  • @TobySpeight I figured I did as I didn't really do the real math on that. I wonder though if it actually helps here. Maybe I should just qualify that these are approximate? I feel like it's easy to understand how you get 4 with an `n` of 16 but how you get to 3 as you state is a bit more involved. – JimmyJames Oct 06 '21 at 15:20
  • @MatthieuM. Good catch. I somehow didn't notice that. I'll try to work that in. – JimmyJames Oct 06 '21 at 15:22
  • 1
    I guess with 16 elements, it's not too hard to list all the elements and how many comparisons reach each one - or simplify and say that half the queries in the range hit on the last test, half the remainder on the second-last, and so on. Actually, I'm assuming that every query is an exact hit, but of course every query for a value not in the list will continue right down to the end, so what you count as "the mean" depends on exactly what you're testing... – Toby Speight Oct 06 '21 at 15:30
  • @TobySpeight I appreciate the feedback and the edit. I'm just a little hesitant to complicate the answer since it seems to have met right level of detail for the OPs needs. – JimmyJames Oct 06 '21 at 15:32
  • @JimmyJames feel free to add further information. It could be useful to future readers :) – Hazel へいぜる Oct 06 '21 at 15:33
  • @JimmyJames Agreed - I meant to say that while it's possible to make such a table, I don't think it's necessary for this answer that's already very good! – Toby Speight Oct 06 '21 at 15:34
  • 5
    I guess the key point is that average and worst-case scale the same way, but average is usually some fraction of worst-case. – Toby Speight Oct 06 '21 at 15:36
  • @TobySpeight Sort of a teaser to how constant factors all become 1 in this analysis. I'll leave why that is as homework for the reader. – JimmyJames Oct 06 '21 at 15:43
  • 1
    As @TobySpeight writes, you are wrong when you say "So if the list has 16 elements, it will take on average about 4 checks to find an item." Although worst-case and average both are $O(\log(n))$ it doesn't mean that they take equal time. In this case worst-case time is something like $t=k\log(n)$ while average time is more like $t=\frac12 k\log(n).$ – md2perpe Oct 07 '21 at 16:16
  • @md2perpe I don't think that's helpful to someone trying to get a basic handle on the idea. And I actually wrote "about 4 checks" not "exactly 4 checks" so technically, not incorrect. – JimmyJames Oct 08 '21 at 16:21
  • @JimmyJames. I beg to differ. I think that it's helpful to understand that the factors can be different. – md2perpe Oct 08 '21 at 17:15
  • @md2perpe Feel free to create your own answer. – JimmyJames Oct 08 '21 at 18:53
8

The average time is smaller than the worst-case time, because the search can terminate early, but this manifests as a constant factor, and the runtime is in the same complexity class.

Using a linear search in a sorted array as an example: the search terminates when a greater or equal element has been found. The worst case requires N comparisons, the average case fewer than that. If the array is just the range [1..1000], and we are looking for a random number in that range, we will on average use 500 comparisons per lookup, i.e. N/2.

From a complexity point of view, this constant factor does not matter, and it is omitted, so the average lookup time is still O(N).

The same happens for the average lookup time in binary searches.

Simon Richter
  • 1,568
  • 9
  • 10
6

The data in this table is not telling you how long a search takes. Its telling you how the time and storage may scale with the amount of data being searched.

Lets take a real world example. Suppose you're google, or twitter, or wikipedia, or github. Or some project/research that searches large lists. Someone enters a search term. You have a vast amount of data to search. You do not want to use a search method thats great for small lists (search spaces) but totally unfeasible for huge ones. What you want to know is, as the size of your data grows, the extra time needed to search the larger database/list stays broadly "reasonable".

Examples of scalings might be:

  • O(1) - as your data gets huge, searches will broadly stay unchanged.

  • O(n) - as you double your data size, you'll have to do twice as many checks to find an item. Increase database by a thousand times, searches will involve of the order of a thousand times as many checks to find an item.

  • O(n2) - as you double your data size, you'll have to do 4 as many checks to find an item. Increase database by a thousand times, searches will involve of the order of a million times as many checks to find an item.

  • O(log2n) - as you double your data size, you'll have to do 1 extra check to find an item. Increase database by a thousand times, searches will involve of the order of a 10 more checks to find an item. (log21000 is about 10)

None of this says which is faster:

So for example if your list has 5 items, then looking at the 5 in turn may be your fastest search strategy. If it has 500 million items then some other search method will be better.

So you could have a search strategy O(n) that takes say, 1 millisecond to search 1000 items, 1 second to search a million items, 20 minutes (about 1000 seconds) to search a billion items.

But you might also have a second search strategy that's O(log2n), that takes 500 milliseconds to search 1000 items. That's so slow, compared to the first search strategy. But because it scales differently, it might only take 1 second to search 1 million items, and 1.5 seconds to search a billion items.

(Because log2 of a million is roughly 2 x log2 of a thousand, and log2 of a billion is roughly 3 x log2 of a thousand)

Stilez
  • 507
  • 3
  • 5
  • 1
    To expand on your last section: It's possible for an algorithm to be theoretically "faster" than another in terms of its Big-O classification, but useless in practice because the *n* required is larger than can possibly fit in a computer's memory. In which case it's called a [galactic algorithm](https://en.wikipedia.org/wiki/Galactic_algorithm). – dan04 Oct 07 '21 at 00:25
  • 1
    As an example, in actual use, the Scala core library supports immutable sets, and since they are immutable, a set of size 4 uses a class that handles sets of exactly size 4, with linear search instead of hashing or binary search. – prosfilaes Oct 07 '21 at 07:23