How is the exponent expressed in single precision floating-point number representation using IEEE-754 format

Question

I am learning computer architecture and organization. I am stuck in the following question. Can someone please help me?

How is the exponent expressed in single precision floating-point number representation using IEEE-754 format?

Is the exponent expressed in 2's complement form? I am not sure because we add bias to the exponent after expressing it in 2's complement form. So is it correct to say that the exponent is expressed in 2's complement form?

This is actually a good question! It's narrowly focused and answerable in this format here. +1 — jonk, Apr 10 '21 at 20:31
It's also one that can be answered with very little research (aka lmgtfy) — , Apr 10 '21 at 20:47
If it is not stored as signed 2's complement, but biased, then it's not signed 2's complement. — Justme, Apr 10 '21 at 20:55
@Justme Ok, so if someone asks me whether the exponent is expressed in 2's complement form, so I should say yes or no? — Anshul Gupta, Apr 10 '21 at 20:57
@Brian Drummond Next time please give the correct answer before making such a comment. I mean you wrote "lmgtfy" but you did not provide the answer. — Anshul Gupta, Apr 10 '21 at 21:01

jonk · Answer 1 · 2021-04-10T22:56:35.797

It's difficult to provide specific references to prove this (I certainly don't have the time to find them, myself), but when the group was considering establishing the IEEE-754 format (it was first published in 1985) it was expensive to sort numbers using floating point software routines and floating point hardware wasn't yet ubiquitous. Hardware for floating point was actually rather rarely available on PCs at the time and not entirely cheaply had, either. So one thing to keep in mind is that there was a motivation to consider a format that was arranged to be sortable using integer sort routines.

Clearly, a number with an exponent of -1 is smaller in magnitude than a number with an exponent of +1. So it should sort as "less than." However, negative numbers are also smaller than positive numbers, regardless of the exponent. So a decision was taken to place the sign bit in a position where the sign of the floating point number would "mirror" the sign of a twos complement integer.

But what about the exponent??? Well, once you've already used up the "integer" sign bit for other purposes, you are stuck. The exponent field is forced to reside in the magnitude portion of the format and should reside in the more significant bits as the exponent is more important for deciding the sort order than the mantissa. So the exponent was placed next-down, but now there was no choice but to make it an "unsigned" value. If that's done then the values will sort just fine using integer sorts. With the mantissa placed in the least significant bits, all would be well with integer sort routines.

There was also a need for special values. One of these groups are *denormals." This is simply a means by which tiny magnitude numbers can degrade gently rather than abruptly. Another group had to do with infinities and "not a number" types, including custom "user defined" values of special nature.

Well, now it's pretty obvious. They needed to add a bias of some sort, an excess, to the exponent in order to put it into unsigned form before packing it into the format. It's really easy for humans, as well as hardware logic both, to spot all 0's and all 1's. So the all 0's case was reserved for denormals. It kind of makes sense. That left the all 1's case for the other, remaining special values.

So, starting with a possible range of -128 to 127 for the 8-bit exponent field of the single precision floating point value, if they added +127 to that, then there would be anything from -1 to 254 that results by adding this excess. Note that -1 is special (all 1's) and therefore reserved for NANs and infinities. Note also that 0 is also special (all 0's) and therefore reserved for denormals. So the usable exponent range is now actually limited to -126 to 127, before packing into the format; the possible values of -128 and -127 now consumed instead for special things like denormals, infinities, and NANs.

It's actually quite slick. Yes, you lose two of the possible exponent values that you might have had, for special purposes. But that's an acceptable loss for the advantages gained. Also, it's now very very easy to sort these values in order using hardware that doesn't support floating point hardware and where the software library execution costs would be excessive.

There's a reason why sorting is required, by the way. It's quite frequent in mathematical software that many floating point numbers need to be summed together. If these numbers happen to span some a wide dynamic range, then it is important that the smaller values are summed first, not last. If you sum them first, then they have a chance to accumulate into something larger and sufficiently significant to make a difference before the larger numbers are then also summed. However, if they were summed last, they could have no impact at all entirely because of that accident of fate.

This is especially evident in the standard deviation calculation, which is often re-written into a form requiring two sums to be generated, and then subtracted from each other. It is when taking the difference of two sums, where the least significant bits may be all you have left remaining, that the order of summation becomes starkly required.

Is the exponent expressed in 2's complement form? I am not sure because we add bias to the exponent after expressing it in 2's complement form. So is it correct to say that the exponent is expressed in 2's complement form?

For single precision it is stored in excess-127. That just means that 127 is added to the two-complement exponent before packing it into the format. (And you must remove that excess bias when you unpack it.) That is not, strictly speaking, twos complement. It's twos complement with a bias, rather. Which is to say that the value will be twos complement when you unpack the floating point format. But while packed it is excess-127.

Also keep in mind that regular floating point values (those that can be represented in the format without resorting to a denormal form) have an assumed "hidden bit" of 1 in the mantissa. This means that so long as the form isn't a denormal, you prefix to the left of the given mantissa a '1'. In the precise case of a denormal (exponent is 0), though, this hidden bit is a '0' as the mantissa had to be shifted down by at least 1 to achieve the denormal.

As a comment below this explanation (by TEMLIB) points out, the special case of zero is the case with the mantissa set to all 0's as well as the exponent (and sign.) So this makes the floating point value of zero a "natural." It's exactly the same case as if a denormal had been denormalized out until there's nothing left in the mantissa. (Zero is indistinguishable from a denormal that was so small that there were no 1-bits left to store in the mantissa.)

About denormals... Zero is actually written with zeros, thanks to the biased exponent. — TEMLIB, Apr 10 '21 at 21:03
@TEMLIB Yes. I should probably add a note to say so. Thanks. — jonk, Apr 10 '21 at 21:04

How is the exponent expressed in single precision floating-point number representation using IEEE-754 format

1 Answers1