Adding two IEEE754 floating-point representations and interpreting the result.

$\begingroup$

This isn't for any class or homework. As part of my personal study, I'm trying to better understand the IEEE754 representation of decimal floating-point numbers in binary. I'd like to add two numbers: $1.111$ and $2.222$, then compare the result by converting the IEEE754 representation of the sum back to decimal.

Per this online tool:

$1.111 = 00111111100011100011010100111111$
$2.222 = 01000000000011100011010100111111$

Summing these two together using signed binary addition, I get:

$0111 1111 1001 1100 0110 1010 0111 1110$

In hexadecimal, this is:

$7F9C6A7E$

And according to this other version of the tool, that corresponds to $NaN$.

What's going on here?

$\endgroup$ 9

1 Answer

$\begingroup$

You cannot expect to use integer binary addition on two floating-point representations and get a meaningful result.

First, $1.111$ cannot be represented exactly in binary floating point. Your 00111111100011100011010100111111 is actually the IEEE-754 single precision representation of the number$$ 1.11099994182586669921875 $$which is the closest representable number to $1.111$. This breaks up as

 0 01111111 00011100011010100111111
sign biased exponent fractional part of mantissa

and stands for the number$$ 1.00011100011010100111111_2 \times 2^{127-127} $$

The representation of $2.222$ is twice that, with the same mantissa but the exponent one higher. When we add them we must position the mantissas correctly with respect to each other:

 1.00011100011010100111111
+ 10.0011100011010100111111
----------------------------
= 11.01010101001111110111101 11.0101010100111111011110 <-- rounded to 1+23 bits mantissa using round-to-even 0 10000000 10101010100111111011110
sign biased exp fractional mantissa

And the representation 01000000010101010100111111011110 corresponds to the number $$ 3.332999706268310546875 $$Note that this is not the closest representable number to $3.333$, which would be the next one,$$ 3.33329999446868896484375 $$but the round-to-even rule led to rounding down the full result of the addition, which compounded the error inherent in the two inputs each being slightly smaller than $1.111$ and $2.222$.

$\endgroup$ 4

Adding two IEEE754 floating-point representations and interpreting the result.

1 Answer

Your Answer

Sign up or log in

Post as a guest

More in updates

How do I see what I am currently subscribed to?

Wes Anderson in the Land of Dahl