5

Is it possible to find the closest point (or k closest points) to any arbitrary point in a set of n points (in dimension N)?

When I write closest point, I mean smallest Euclidean distance. I'm looking for an algorithm for multiple searches, where the preprocessing time for the n points is negligible?

Is it possible to get the closest point(s) in a reasonable time (O(log(n)) for example with a method similar to binary search).

Any reference is welcome.

I already read the following questions:

But, none provide a satisfying answer.

Doc Brown
  • 199,015
  • 33
  • 367
  • 565
Fifi
  • 159
  • 4
  • 6
    If you can index the space in advance, then you can find nearby points quickly. You just need a multidimensional indexing approach, such as https://en.wikipedia.org/wiki/K-d_tree – Erik Eidt Mar 16 '21 at 15:16
  • @DocBrown The closest point to a set of points. n is the carnality of the set. I mean"multiple searches", where the preprocessing time for the n points is negligible? I update the question. – Fifi Mar 16 '21 at 17:26
  • 1
    It is not clear to me what kind of query you are trying to do. Is this a standard nearest-neighbor or k-n-nearest-neighbor problem? Perhaps an example of the problem with 2 dimensions and with a set of three points would help? – amon Mar 16 '21 at 20:28
  • 2
    If you're looking to libraries, [FLANN (Fast Library for Approximate Nearest Neighbors)](https://github.com/mariusmuja/flann) is a decade old and rock solid (used in OpenCV), although I don't know their time-complexity. If you're interesting in knowing what people are talking about the latest approaches, see [Nearest Neighbor Graph](https://en.wikipedia.org/wiki/Nearest_neighbor_graph). I'm assuming the number of dimensions is very high (hundreds or thousands) for your applications so that low-dimensional approaches don't work well. Traditional R-Tree etc works well for 2D and 3D. – rwong Mar 17 '21 at 02:02
  • @rwong could you make your comment into an answer? – Kain0_0 Mar 18 '21 at 09:35

2 Answers2

5

R-Tree, Oct-Tree, and similar spatial trees can quickly find close points for arbitrary sets of points.

  1. find the leaf containing your point (or the closest to it)
  2. Within the leaf find the closest contained point
  3. Search the tree again for any leaf intersecting the volume centred on the first point, containing the closest point just found.
  4. Excluding the already searched leaf, search the intersecting leaves for a closer point.
  5. You now have the closest point

If the data changes just use an R-Tree/Oct Tree/Spatial Tree. It will give you log(N) lookups.


But If the data does not change, or it is cost-effective to do so, you can pre-index the data.

One way is to simply create a hash table over all known data points, and a sub-sampling of "grid-points". The value in the hash points to the tree node/s its located in. The idea of grid-points is so that novel data points can be mapped to their closest grid point to start this search off. Depending on how the hash is calculated this can speed up searching the data significantly. Log(N) -> Amort O(1)

The next way is to flatten the trees out themselves into arrays of segments. Each segment being a bound on that dimension, eg: [0 <= _ < 3, 3 <= _ <= 8, 8 < _ <= 9, ...] (In fact you can optimise this further as each pair of cells shares the same number, and if <= 9 is the same as < 10 you can flatten even further by converting to the same operator and moving the number).

To create the segment arrays iterate through the points and add them into their own segments. For the moment list them if multiple share the same point.

Go through your segments and if there is more than one element pick another dimension and repeat for just those listed elements.

At worst you will have an N dimensional lookup. If you pick the same dimensions in the same order its easier to perform lookups. If you can pick a different dimension on each dereference you can reduce the overall depth at the cost of complexity.

Kain0_0
  • 15,888
  • 16
  • 37
1

Finding the closest point to a single preselected point is O(n)

You simply iterate once the list calculating the distance for each. The problem occurs if you want to find the two points in the list which have the shortest distance between them.

Or more commonly, the shortest path through the list going from node to node, where the combination of the O(n!) search time and square roots allow for the optimisation opportunities.

However its such a well known problem (the traveling salesman) there are lots of libraries out there which optimize for various scenarios.

If you are trying to solve it academically you can read up on the various approaches, but if you have a practical problem, supply more details and you may find a prewritten solution.

Ewan
  • 70,664
  • 5
  • 76
  • 161
  • 1
    Did you notice the afterwards added sentence *"I'm looking for an algorithm for multiple searches, where the preprocessing time for the n points is negligible"*? – Doc Brown Mar 17 '21 at 09:49
  • 1
    of course i did. my answer directly addresses that – Ewan Mar 17 '21 at 12:04
  • 2
    As far as I can see it does not. *"two points in the list which have the shortest distance between them."* is more specific, and I would not say your answer adresses this. TSP is a completely different problem. – Doc Brown Mar 17 '21 at 13:43
  • 1
    its the same problem. the point is the optimisation you select, your indexing strategy, depends on the details of your overall problem – Ewan Mar 17 '21 at 14:19
  • 2
    If there is zero pre-processing then yes, a linear search is the best that can be achieved. I'm uncertain where the minimal spaning tree reference (traveling salesman problem) enters the actual question above? – Kain0_0 Mar 18 '21 at 09:33