Why does imprecision fail for these 3 numbers specifically?

Question

Hi i asked this question on SO but no one seems to be bothered to answer the question with actual information.

Firstly just to say, i fully know computers are imprecise for floating numbers and doubles etc etc.

But what i would like to understand is why the imprecision fails for just a select few numbers in this code:

    double o = 0.795660;
    for (float i = 0; i < 4096; i++)
    {
        double a = i+0.795660;
        double b = (i+1)+0.795660;
        double resA = (int) System.Math.Floor((a - o) / 1.0);
        double resB = (int) System.Math.Floor((b - o) / 1.0);
        if (System.Math.Abs(resA - resB) < Mathf.Epsilon)
        {
            Debug.Log(i + " :: "+ (i+1));
        }
    }

Output:

1 :: 2  => 2^1
31 :: 32 => 2^5
4095 :: 4096 => 2^12

It can't be a coincidence these numbers fail when they are 2^n.

What is it about these particular numbers that the computer can't compute accurately but it managed to do so for the all the other numbers in the loop ?

My original question was here: https://stackoverflow.com/questions/59387659/fix-for-numbers-close-to-25-and-27-imprecision-problems

I just got the vague answer of "because imprecision" but no one explained why specific numbers failed in any educational detail.

I'm not a comp science student, i'm a self taught programmer, and though i understand computers are not good at representing floating numbers, i'd like to know why certain numbers fail more than others.

This was in C# i don't know if other languages will produce the same problem.

Why was this voted to close i am not asking for code debug, i am asking about the science of binary representation of floats - my code has nothing to do with that, the code merely highlights what i am trying to ask. — WDUK, Dec 18 '19 at 21:39
The question is perfectly valid to ask here, since you're asking about the mechanics of how floats work at binary level which is an engineering question, not a code question. I wouldn't worry if one person votes to close. — Dave, Dec 18 '19 at 21:43
I agree the question is perfectly valid. However, it's also a duplicate! In fact there are dozens of question here that tackle this topic. @WDUK if you type 'floating point' in the search box at the top of the page you'll find a bunch of possibly helpful questions and answers. See [What causes floating point rounding errors](https://softwareengineering.stackexchange.com/questions/101163/what-causes-floating-point-rounding-errors) for example. — Charles E. Grant, Dec 18 '19 at 21:57
@CharlesE.Grant but that link doesn't explain precisely why all numbers work up to 4096 except 3 numbers in any intricate detail..that answer is speaking generally for numbers overall. But if it was generally then all my numbers would be failing here. — WDUK, Dec 18 '19 at 21:59
That link and the others will explain the general principals of representing floating point numbers, and then you can work the details for yourself. — Charles E. Grant, Dec 18 '19 at 22:03
You may find this [tool](https://www.h-schmidt.net/FloatConverter/IEEE754.html) helpful. — Charles E. Grant, Dec 18 '19 at 22:06
'But if it was generally then all my numbers would be failing here' No, if you actually read those answers, the underlying problem is that *some* numbers can't be represented in floating point form, and must be rounded to a nearby number that can be represented. Which numbers can't be represented as floating point is not readily guessable from their printed decimal form. That's why you may find it useful to work through your numbers using the tool I linked to. — Charles E. Grant, Dec 18 '19 at 22:16

Erik Eidt · Accepted Answer · 2019-12-18T22:23:00.420

4

It's just a rounding problem.

Let's first note that 0.795660 cannot be represented exactly, which means the same is true for some i in i+0.795660.

Each time you reach a new, higher power of 2, the internal exponent changes, which changes the interpretation of all of the mantissa bits — as they are interpreted under the different exponent.

So, the powers of 2 one less (e.g. 31.795660), and the power of 2 (e.g. 32.795660) have notably different representation — this is why comparing the one less and the new power of 2 are the most likely candidates to be at issue.

Sometimes, the closest available bit pattern is less than the actual value and sometimes it is more. When it is less than the actual, the subtraction moves to an integer range that is under the actual (like 1.9999999 instead of 2). And then the floor operation, of course, gives a noticeably different value.

Floating point is very inexact stuff. Only rational numbers can be represented, and even then a very limited and finite number of them.

However, between 30.x and 31.x, the exponent remains the same, leaving this pair of numbers with a similar representation, whereas between 31.x and 32.x the exponent changes, which introduces an opportunity for differing conversion errors.

edited Dec 18 '19 at 22:23

answered Dec 18 '19 at 22:16

Erik Eidt

33,282
5
57
91

Ah i see, thanks for explaining it in simple terms. I'm curious how software that needs to be precise fix these edge cases, i currently add 0.0001 to the number before i floor it, which fixes the issue, but am i likely to still get problems using that fix that i've not yet stumbled upon ? – WDUK Dec 18 '19 at 22:28
@WDUK Software that needs precision won't use floating point, they'll use fixed point numbers or similar. – Delioth Dec 18 '19 at 22:57
Rounding and precision are mutually incompatible :-) Whenever you round numbers, arbitrarily small differences may be amplified to the rounding step. So either don't round at all, or use rounded numbers all the time and only perform operations that keep the scale (addition and subtraction are ok, multiplication is problematic, division is evil) – Hans-Martin Mosner Dec 18 '19 at 23:25
@Hans-MartinMosner what do you define as rounded numbers ? Integer values only? Because in my case im using world positions unless you mean like intentionally limit the decimal amount in steps of 0.1 for world positions to avoid getting 0.99999 it would only ever be 0.9 before going to 1? – WDUK Dec 19 '19 at 03:25
@WDUK rounded numbers are always integer multiples of some base. For example, you would round currency values to the nearest cent, so your base is 0.01. Such values should be represented either as integers (which means you need to deal with the scale implicitly) or with fixed point numbers which are just integers with an explicit scale. In your case, scaling everything by a factor of $10^6$ and working with integers would fix the perceived problem. – Hans-Martin Mosner Dec 19 '19 at 06:07
@Hans-MartinMosner yikes that sounds confusing, do you know of a good article that explains this fixed point stuff in good detail ? I'm a self learner so didn't do this at university will need to find a good resource to understand it. – WDUK Dec 19 '19 at 06:12
@WDUK I can't recommend a specific article. The answers to your question already do an excellent job of explaining where the difference comes from in your specific example, and for more in-depth explanations you should search for "fixed point" together with other relevant terms, such as "floating point", "precision" etc. Gaining a broad understanding of this whole area is a really valuable goal. – Hans-Martin Mosner Dec 19 '19 at 11:01
See also https://en.wikipedia.org/wiki/Unit_in_the_last_place, often used instead of adding 0.0001, for example. – Erik Eidt Dec 19 '19 at 15:16

score 2 · Answer 2 · answered Dec 18 '19 at 22:54

It can't be a coincidence these numbers fail when they are 2^n.

It isn't. The only candidates for failing your test are at a power of two (for binary FP, for decimal FP, they are at a power of 10). But not all candidates will fails. And if you do your computation with floats instead of doubles, the failing candidates will be different.

You are trying to find floating point numbers such that

floor((i+o)-o) != i

Let's consider i+o. If you don't play with the FP rounding modes (I don't know if it is possible with C#, in C and C++ see fesetround) this is rounded to the representable value the closest to the exact result. If it is rounded up, the exact result of (i+o)-o will be greater than i, and after rounding and floor, we'll get i (in fact we don't need floor to get i). If it is rounded down, the exact result of (i+o)-o will be smaller than i. If i is not a power of the representation base, the rounding will be the same as the one applied to i+o and we'll get i again. If i is a power of two, the closest representable value may be different than i because there is now an additional digit available.

In base 10 with 4 significant digits:

the round up case:

10.00 + 0.1251 = 10.1251 rounded to 10.13
10.13 - 0.1251 = 10.0049 rounded to 10.00

the round down case with round-trip

10.00 + 0.1204 = 10.1204 rounded to 10.12
10.12 - 0.1204 =  9.9996 rounded to 10.00

the round down case without round-trip

10.00 + 0.1234 = 10.1234 rounded to 10.12
10.12 - 0.1234 =  9.9966 rounded to  9.997

score 1 · Answer 3 · answered Dec 18 '19 at 22:31

The number o is a double precision number, therefore is stored with 53 bits to the right of the decimal point. The lowest bit of the mantissa has a value of 2^-53.

If you add for example 4096 + o, the result is a bit larger than 4096; the highest bit of the mantissa in the result has a value of 4096 = 2^12, the lowest bit has a value of 2^(12-52) = 2^-40. So 13 bits of the mantissa are rounded away.

The same rounding will happen if you add 4097 + o, 4098 + o, ..., 8191 + o. The same 13 bits of the mantissa of o are rounded away, in the exact same way. But if you add 8192 + o, then 14 bits are rounded away. If you add 4095 + o, then only 12 bits are rounded away. That's why the rounding only changes when you switch to a different power of 2, like 31 -> 32 or 4095 -> 4096.

Why doesn't the rounding always change, for example when you go from 2047 -> 2048? The extra bit that gets rounded away when you go from i = 2047 to 2048 can be a zero, so in this case whether it's rounded away or not makes no difference.

Why does imprecision fail for these 3 numbers specifically?

3 Answers3