0

I am having trouble with a coding question that I found practicing for a interview:

A scatter graph of points on a page, draw two horizontal lines (these lines are parallel) across the page such that the perpendicular y distance to a line from all points is minimized. Count the total distance from each point to the line that is closest (Only count the distance from each point to the closest line, ignore the longer distance, when the distance is the same, pick either one). Describe an algorithm for finding these two lines.

My approach:

I have considered brute forcing this by checking all possibilities in intervals of 1 starting from the highest y axis ending with the lowest y axis of the points, but this is perhaps not the most optimal way. I have also considered splitting the points into two groups based on y axis, however there are some edge cases that will fail.

Does anyone know a more efficient way to do this?

Hyperion
  • 21
  • 1
  • 2
    see [Why do interview questions make poor Software Engineering.SE questions](https://softwareengineering.meta.stackexchange.com/a/6361/31260) – gnat Nov 27 '18 at 05:57
  • 1
    "all possibilities in intervals of 1"? Why 1? Aren't the coordinates supposed to be double values? – Doc Brown Nov 27 '18 at 06:32
  • @DocBrown The sought lines are horizontal. This is a 1D problem. – Nick Alexeev Nov 27 '18 at 06:48
  • 1
    @NickAlexeev: obviously, but that does not make it a discrete integer problem. – Doc Brown Nov 27 '18 at 06:49
  • @DocBrown It is a discrete problem because there is an integer number of points to consider. This can be seen as a clustering problem; each point either belongs to cluster A or B → I can enumerate all permissible clusterings, giving rise to Kain's O(n log n) solution. The non-discrete version would be if I had point densities rather than discrete points, but that would be extremely hard unless the density has an analytical form that's once differentiable. – amon Nov 27 '18 at 14:39
  • @amon: Kain0_0 did not give a solution, he only gave some hints to find one. And when I understood the problem correctly, the point coordinates are real numbers, not integer coordinates, thus non-discrete, as well the possible positions of the horizontal lines. – Doc Brown Nov 27 '18 at 16:42
  • I was looking for a solution, and so far I have considered using k means clustering, but if we can simply this problem to 1D then perhaps ckmeans.1d.dp would be a good way to solve this, [Ckmeans.1d.dp](https://cran.r-project.org/web/packages/Ckmeans.1d.dp/vignettes/Ckmeans.1d.dp.html) – Hyperion Nov 27 '18 at 17:18
  • @Hyperion Given my understanding that horizontal lines, are lines with a constant y value, this is indeed a 1-d problem. The ckmeans algorithm will indeed solve the general case of this problem for k lines. There is however an even more efficient algorithm for the 2 line case that you presented. – Kain0_0 Nov 27 '18 at 23:43
  • It happens that for k=2 clusters in 1D, there can only be N-1 permissible clusterings for N > 1 points – few enough that you can simply try them all by brute force. You don't have to reuse an existing algorithm such as kmeans. While this is an O(n²) solution, calculation of the cluster's mean via dynamic programming can reduce this to O(n log n). Then, the brute force approach is better or equivalent to any other clustering algorithm. – amon Nov 28 '18 at 12:39

1 Answers1

1

Yes there is a much more efficient way to do this. Off the cuff I can write an O(N log(N)) algorithm. As this is a test question, the point is not solving it it is the process of solving.

The problem you seem to have is a perspective issue. Try reducing the size of the problem down.

  • How would you solve this for two points?
  • How would you solve this for three points? four points? five points?
  • Given that you've already solved this problem for K points, can you solve it for K+1 points?
  • Does that K+1 point need to have certain properties?
  • Would receiving the points in a particular order make it easier to solve?
  • Would considering the points one, two, three at a time make it easier to solve?
  • How does each line behave as a point is added?
  • Can you take advantage of that behaviour?

At this point you'll probably have worked out if and how to sort the points, how they are picked, and how many points you'll consider at once. You will know how to update the solution for the next point, and you'll know that your answer is correct after having considered all points.

Kain0_0
  • 15,888
  • 16
  • 37
  • The idea of solving problems this by induction is not bad in general. However, I doubt it will work well for this specific problem (but maybe I am just to blind to see it). – Doc Brown Nov 27 '18 at 06:52
  • There is a way to construct this by induction take some time and think it through. Perhaps find a copy of [Computational Geometry - Algorithms and Applications](https://people.inf.elte.hu/fekete/algoritmusok_msc/terinfo_geom/konyvek/Computational%20Geometry%20-%20Algorithms%20and%20Applications,%203rd%20Ed.pdf) – Kain0_0 Nov 27 '18 at 07:16