2

How to store a straight line (infinite length, not a line segment) efficiently?

This is not a duplication of How to represent a geometric line programmatically?, because the linked question is about 3D, and it does not answer the main problem reflected in this question.

Definition of a straight line:

This question is asking about the data structure, not user interface. I don't care about how the equation is defined or presented (because I'm not making a graphing calculator).

Basically, a straight line is an equation (not necessarily a function of x or y) that, when drawn on a rectangular coordinate system, results in an infinitely long line, of any inclination (slope), at any intercept (assuming that the floating point range is not exceeded).

Therefore, these are all valid straight line equations:

y = x
y = -2x
y = x + 3
y = 4x + 5
x = 6
y = 7

They all stand for a straight line, as shown in https://www.desmos.com/calculator/jnotvj4k7u

I am trying to create a type (class or a class hierarchy) (in Java, if language matters) that can represent any of these straight lines.

Possible uses of a straight line:

  • Used in an image, such as imageline in GD to draw a line that cuts the image (the inverted Y-axis is a minor problem irrelevant to this question). For example, the equation y = x can cut an image diagonally.
  • Stored in a file. The structure in the file should be same as that in code.
  • Implement the equals and hashCode methods that work for identical lines.
  • Evaluate the intersection point of two straight lines.
  • Evaluate the intersection point(s) of a straight line and, say, a quadratic equation (given double a, double b, double c for ax^2 + bx + c = 0) or a circle (given Point center and double radius)

My question

I have considered three possible methods, which will be listed as an answer below: https://softwareengineering.stackexchange.com/a/324229/234322

My question is, is there a fourth solution? Or, is there a way to minimize the disadvantages of these methods?

SOFe
  • 658
  • 1
  • 7
  • 27
  • 2
    I would be less concerned about saving/wasting an extra byte here or there than I would about things like "can I get equals and hashCode to work correctly and clearly". The reason I say this is that the overhead of making ANY object (at least in Java) is rather significant when compared to saving/wasting 1 or 2 extra bytes (this overhead is quite pronounced in a 64-bit system). – Ivan Jul 07 '16 at 18:32
  • 1
    One thing you don't mention in your question is numeric precision. The "two points" method will clearly have the highest ceiling on numeric precision (assuming you are limited to standard doubles to store your numbers) because it uses 4 doubles to store the Line as opposed to 2. What's more important? Numeric precision or saving bytes? – Ivan Jul 07 '16 at 18:41
  • 1
    I seriously can't wrap my head around why this was downvoted. – RubberDuck Jul 08 '16 at 10:49
  • @Ivan I agree. In my case, I am actually only using these equations in a limited range, so I have stated in the question that I assume that the values are always precise, but this may not be the case for future readers. I'm going to post an answer that integrates different answers in this question as the final accepted answer after waiting for a few more days. – SOFe Jul 08 '16 at 11:31

4 Answers4

9

The equations can be all rewritten into form :

a*X + b*Y + c = 0

That means you can store three doubles a, b and c.

This doesn't have a problem in representing an arbitrary slope. You can also calculate slope as a/b (or b/a). Two lines are equal if there exists k where a1*k = a2, b1*k=b2, c1*k = c2.

Euphoric
  • 36,735
  • 6
  • 78
  • 110
2

Possible methods

For convenience of expression, this class will be used to represent any real points in a coordinate system in code snippets below:

public class Point{
    public double x, y;

    @Override public boolean equals(Object other){
        if(!(other instanceof Point)) return false;
        Point point = (Point) other;
        return point.x == this.x && point.y == this.y;
    }
}

Method 1: slope + (Y-)intercept

Using the slope-intercept form, a class like this (in Java) can be created:

public class Line{
    public double slope;
    public double yIntercept;
}

This is a possible constructor to create a line that passes through two given points: (all straight lines have real points, so any two distinctive points can create a straight line)

public Line(Point pt0, Point pt1){
    if(pt0.equals(pt1)) throw new IllegalArgumentException("Cannot create straight line from two identical points");
    this.slope = (pt0.y - pt1.y) / (pt0.x - pt1.x);
    this.yIntercept = pt0.y - this.slope * pt0.x;
}

As most programmers can instinctively notice, there is a possible ArithmeticException of division by zero at line 1 of the constructor. What does this mean, if pt0 and pt1 are not equal? This means that two points have the same X-coordinate but different Y-coordinate, i.e. a vertical line (in a Y-X rectangular coordinate system) should be created. This also represents the equation x = x0, where x0 is a (mathematical) constant.

Even if we specifically check it and set slope to be java.lang.Double.NaN or java.lang.Double.POSITIVE_INFINITY, what about yIntercept? It should be NaN, because there are no solutions or infinite solutions. (Y-intercept means "Y-coordinate when x = 0", which is either always true or always false for a x = x0 equation)

A possible solution is to represent a vertical line with an extra subclass (then we can use this subclass by hiding the class constructor and use a static getter Line.createFromTwoPoints(Point, Point) instead):

public class VerticalLine extends Line{
    private double xIntercept;
    public VerticalLine(double x){
        this.slope = Math.POSITIVE_INFINITY;
        this.yIntercept = Math.NaN;
        this.xIntercept = x;
    }
}

However, this method is inconvenient. It seems to single out the condition of a vertical line, which is not a special condition and should not be separated from the rest of conditions. It also creates an extra field in the VerticalLine class. If we use it in replacement of yIntercept, it doesn't sound like a good idea to call a field yIntercept while it is in fact xIntercept, or use a field called intercept sometimes as the Y-intercept but sometimes as the X-intercept. This inconvenience becomes significant when there is a big mess of getters and utility functions, sometimes overriding, sometimes not overriding.

Method 2: direction + (Y-)intercept

Same with Method 1, except that the slope is replaced by its arc-tangent. This makes the infinite slope a java.lang.Math.PI direction instead, but still the same problem with the X- and Y-intercepts. Might also work slower because evaluation of intersection points might involve more calls to trigonometric functions.

Method 3: two-point form

This is very inefficient.

Memory

This requires storage of four doubles, rather than two to three doubles above.

Performance

The memory structure of two identical lines may not be identical. This means that the equals method and the hashCode method, and also many other methods, may involve more calculations. Refer to the "Possible uses" section in the main post regarding the content of "many other methods".

SOFe
  • 658
  • 1
  • 7
  • 27
  • Small note about the two point definition. A line is uniquely identified as two points. Why would performance suffer? Because you have to check `Equals` with a tolerance delta? I'd be hesitant to claim performance impact until the different implementations were benchmarked against each other. – RubberDuck Jul 07 '16 at 11:57
  • For methods 1 and 2, should that say "+ (Y-intercept)"? – Carcigenicate Jul 07 '16 at 12:34
  • @RubberDuck for example, how would you find the intersection points between that line and a circle equation `Ax^2+By^2+Dx+Ey=F`? From my mathematical knowledge, it involves conversion of the two-point form into the point-slope form or other convenient forms for calculation. I have previously worked out an equation about that using paper and pen, and the result is really complicated. – SOFe Jul 08 '16 at 10:45
  • @PEMapModder that's a good example you should add to your answer. I think a lot of people would reach for the two point form not understanding why you're saying it'd be slower. – RubberDuck Jul 08 '16 at 10:47
  • @RubberDuck I actually put that in the "Possible uses" section, which introduces the criteria I'm using to compare these methods. I should really remind it there though. – SOFe Jul 08 '16 at 10:49
  • 2
    @RubberDuck: a line *can* be identified by two points, but it not "is" defined, as there are an infinite number of points that can all describe the same line. In point slope form or slope-intersect form, there is only a single set of values that describe the line. What two points describe is a line segment. – whatsisname Jul 08 '16 at 17:23
  • Re, "The memory structure of two identical lines may not be identical..." IMO, you're going down a deep rabbit hole if you expect to be able to test whether two lines in a _real_-valued 2D space are exactly equal. – Solomon Slow Jul 08 '16 at 20:29
  • @james large why? – SOFe Jul 09 '16 at 06:13
  • You reject the two-point form because there could be different representations of the same line. OK, but that could only happen if the program was given different inputs. If both lines were computed from the same exact inputs, then you'd always end up with the same exact points. So, I conclude that you want to compare lines computed from different inputs. OK, but you're using floating point math. Floating point rarely gives exact results; Even if the slope and intercept of two lines computed from different inputs should theoretically be equal, they often won't be exactly equal. – Solomon Slow Jul 09 '16 at 14:09
  • You can compare with an "epsilon" (i.e., compare whether the slopes and the intercepts are "close enough"), but how close is close enough? The answer to that question is tricky because the degree to which the precision of the numbers matters will be strongly dependent on the slope of the line in any kind of unique representation. In fact, that's what makes the (non-uniqe) general form or the (non-unique) 2-point form attractive in some applications: The precision requirement depends much less on the slope. – Solomon Slow Jul 09 '16 at 14:14
  • I see. Actually this is still the library section of my app, so I haven't finalized how it is gonna be used yet (probably won't even get used). Now you talking of that, it seems that identical memory is not really important. – SOFe Jul 09 '16 at 14:31
1

A couple of possibilities:

The vector form of the equation for a straight line in 2 dimensions is n.x + s = 0, where 'n' is a vector normal (or perpendicular) to the line "." is the vector dot product operation, and s is a scalar constant. This allows you to represent arbitrary lines with two integers and a double. (note that the same representation in 3 dimensions forms a plane, and so on). It might be more convenient to use doubles for the vector, however, and if you do that, using a unit vector might make sense. If your vector is a unit vector, "s" can be interpreted as the distance along the normal vector between the line and the origin.

A related representation is to note that a unit vector can be represented as an angle between the vector and one of the axes, so you could store that instead of the vector. This gives a representation of any arbitrary straight line with 2 doubles.

Depending on your application, these may or may not be efficient, but you should be able to profile them reasonably easily and decide which is best for you.

Jules
  • 17,614
  • 2
  • 33
  • 63
0

A single bit saying if the line is closer to horizontal or vertical. If closer to horizontal, slope + Y-intercept. If closer to vertical, inverse slope + X-intercept.

That's two doubles, plus a bit.

Equality testing is trivial.

Mike Dunlavey
  • 12,815
  • 2
  • 35
  • 58
  • Wouldn't there be more trouble in implementation of utility functions? – SOFe Jul 07 '16 at 13:17
  • @PEMapModder: Like what functions? I don't see how it could. – Mike Dunlavey Jul 08 '16 at 02:23
  • Say, you may need to inverse the slope every time. The only difference of this method from the `VerticalLine` method I mentioned is that it separates conditions more equally (rather than singling out an individual condition), but it is still probably unnecessarily forcing (cached) calculation every time. – SOFe Jul 08 '16 at 10:41
  • Eliminating `if` statements from code has been a life-long goal for me. I would push back _hard_ if I were your code reviewer. – Solomon Slow Jul 08 '16 at 20:37
  • @jameslarge: Life-long goal? Why? Performance? If performance is your concern, we can [*have some fun*](http://stackoverflow.com/a/927773/23771) :) – Mike Dunlavey Jul 08 '16 at 22:09
  • @PEMapModder: You're worried about caching? It's funny how people worry about caching issues on the assumption that their code is otherwise as speedy as possible. (Which usually ain't true, not by miles.) – Mike Dunlavey Jul 08 '16 at 22:13
  • @Mike Dunlavey I would prefer to make my code as simple and efficient as possible, especially when it is the very basic layer of the program's internal framework's (the part without any relationship with the user interface), and especially when it is the internal layer of the internal layer of my application; say, a class almost referenced by any class, and acts like a unit in my program. – SOFe Jul 09 '16 at 06:16
  • @Mike Dunlavey I think the main problem with if statements, especially large fragments of them (where if and else have almost equal sizes) is that it greatly increases the complexity of the code. Sometimes you may simply copy the code, sometimes you may declare a variable that acts as a reference to satisfy both cases. But whichever method, it still makes the code less readable, when there is a more straightforward approach to read it. So compared to singling out the condition of division by zero, this feels even unnatural. – SOFe Jul 09 '16 at 06:24
  • @MikeDunlavey, Years ago, I took over maintenance of about 50,000 lines of life-critical, real-time code. It calculated one way if a certain part was rotating clockwise, and a different way if anti-clockwise. Found an important bug in only one of the two branches. I refactored all the math to multiply things by 1.0 or -1.0 depending on the direction, eliminated around 100 `if` statements and maybe 1000 code lines. Every line of code in that business was added risk---literally, risk that a patient could die. I've been crusading ever since to simplify everything I touch if I'm able. – Solomon Slow Jul 09 '16 at 14:22
  • @jameslarge: I agree, simplifying is good. I look at data and source code in information-theory terms, where if changing something requires changing things in more than one place to be right, that's *redundancy*. A small amount of redundancy helps you detect errors (if there is a scheme for detecting them), but a large amount only makes errors more likely. In source code, the fewer edits you have to do to make a change, the less likely you will forget one and put in a bug. So I think we're on the same page. – Mike Dunlavey Jul 09 '16 at 23:39