Table of Contents
|
IEEE Single Precision Floating Point Format Examples 1
Recall from the Storage of Numbers in IEEE Single-Precision Floating Point Format page that for 32 bit storage, a computer can be stored as $x = \sigma \cdot \bar{x} \cdot 2^e$ and with 32 bits $b_1b_2...b_{32}$ we had that:
Bit $1$ | The bit $b_1$ corresponds to the sign $\sigma$ of $x$ where $b_1 = \left\{\begin{matrix} 0 & \mathrm{if} \: \sigma = +1\\ 1 & \mathrm{if} \: \sigma = -1 \end{matrix}\right.$. |
---|---|
Bits $2$-$9$ | Most computers do not store the exponent $e$ of a floating point binary number directly. Instead, they define $E = e + 127$ which is a positive binary number (since $-126 ≤ e$). The eight bits $b_2b_3...b_8b_9$ correspond to this number $E$. |
Bits $10$-$32$ | The $23$ succeeding digits $a_1a_2...a_{22}a_{23}$ of the significand of $x$, $1.a_1a_2...a_{22}a_{23}$ are stored here. |
We will now look at some examples of determining the decimal value of IEEE single-precision floating point number and converting numbers to this form.
Example 1
Consider the following floating point number presented in IEEE single precision (32 bits) as $01101011101101010000000000000000$. Determine the sign $\sigma$, exponent $e$, and significand/mantissa $\bar{x}$ and determine the value of $x = \sigma \cdot \bar{x} \cdot 2^e$.
We note that the first bit of the number given above is $b_1 = 0$. It immediately follows that we have that the sign of $x$ is $\sigma = +1$.
Now the next eight bits $b_2b_3…b_9$ are $11010111$ and represent $E = e + 127$. We want to find what decimal number represents the binary number $E = (11010111)_2$. We have that:
(1)Thus we get that $e = E - 127 = 215 - 127 = 88$.
Lastly, recall that the twenty-three bits $b_{10}b_{11}…b_{32}$ represent the fractional part of the significand/mantissa $\bar{x}$, and that $\bar{x} = 1.b_{10}b_{11}…b_{32}$ and so:
(2)So the decimal representation of this number is $x = \sigma \cdot \bar{x} \cdot 2^e = + (1.4140625) \cdot 2^{88}$.
Example 2
Consider the following number presented in IEEE single precision 32 bits $11001100101111100010000000000000$. Determine the sign $\sigma$, exponent $e$, and significand/mantissa $\bar{x}$ and determine the value of $x = \sigma \cdot \bar{x} \cdot 2^e$.
Once again we immediately have that since $b_1 = 1$ then the sign of $x$ is $\sigma = -1$.
Now next eight bits are $10011001$. These bits represent $E = e + 127$. Thus we have that:
(3)Therefore the exponent of $x$ is $e = E - 127 = 153 - 127 = 26$.
Lastly we will calculate the mantissa using the last twenty-three bits of the given number. We have that:
(4)So the decimal representation of this number is $x = \sigma \cdot \bar{x} \cdot 2^e = - (1.4853515625) \cdot 2^{26}$.
Example 3
Consider the number $x = -\left ( 1 + \frac{1}{2} + \frac{1}{4} + \frac{1}{16} + \frac{1}{32} \right ) 2^{-48}$. Determine the floating point representation in IEEE single precision (32 bits).
We immediately see that $x$ is a negative number and so the sign is $\sigma = 1$. Therefore the first bit in our floating point representation of this number will be $b_1 = 1$.
Now we also see that the exponent $e = -48$. IEEE floating point single precision (32 bits) stores the number $E = e + 127$ instead though, and hence $E = -48 + 127 = 79$. We must now convert $79$ to binary number. We have that:
(5)Therefore $b_2b_3…b_9 = 01001111$. Lastly we will determine the last twenty-three digits which represent the fractional part of the significand/mantissa. We note that $\bar{x} = \left ( 1 + \frac{1}{2} + \frac{1}{4} + \frac{1}{16} + \frac{1}{32} \right )$. If we convert $\bar{x}$ to binary we get that:
(6)So the digits $b_{10}b_{11}…b_{32}$ are thus $110110…0$. Therefore the floating point representation of $x$ is:
(7)