Binary floating-point numbers are represented in three formats:
short, long, or extended.
- The short format is 4 bytes with a sign of 1 bit, an exponent
of 8 bits and a fraction of 23 bits.
- The long format is 8 bytes with a sign of 1 bit, an exponent of
11 bits and a fraction of 52 bits.
- The extended format is 16 bytes with a sign of 1 bit, an exponent
of 15 bits and a fraction of 112 bits.
There are five classes of binary floating-point data, including
numeric and related nonnumeric entities. Each data item consists of
a sign, an exponent, and a significand. The exponent is biased such
that all exponents are nonnegative unsigned numbers, and the minimum
biased exponent is zero. The significand consists of an explicit fraction
and an implicit unit bit to the left of the binary point. The sign
bit is zero for plus and one for minus values.
All finite nonzero numbers within the range permitted by a given
format are normalized and have a unique representation. There are
no unnormalized numbers, which might allow multiple representations
for the same value, and there are no unnormalized arithmetic operations.
Tiny numbers of a magnitude below the minimum normalized number in
a given format are represented as denormalized numbers,
because they imply a leading zero bit, but those values are also represented
uniquely.
The classes are:
- Zeros have a biased exponent of zero,
a zero fraction and a sign. The implied unit bit is zero.
- Denormalized numbers have a biased exponent
of zero and a nonzero fraction. The implied unit bit is zero.
The
smallest denormalized numbers have approximate magnitudes 1.4 10**-45
(short format), 4.94 10**-324 (long format) and 6.5 10**-4966 (extended
format).
- Normalized numbers have a biased exponent
greater than zero but less than all ones. The implied unit bit is
one and the fraction can have any value. The largest normalized numbers
have approximate magnitudes 3.4 10**38 (short format), 1.8 10**308
(long format), and 1.2 10**4932 (extended format). The smallest normalized
numbers have approximate magnitudes 1.18 10**-38 (short format), 2.23
10**-308 (long format), and 3.4 10**-4392 (extended format).
- An infinity is represented by a biased
exponent of all ones and a zero fraction.
- A NaN (Not-a-Number) entity is represented
by a biased exponent of all ones and a nonzero fraction. NaNs are
produced in place of a numeric result after an invalid operation when
there is no interruption. NaNs can also be used by the program to
flag special operands, such as the contents of an uninitialized storage
area. There are two types of NaNs, signaling and quiet. A signaling
NaN (SNaN) is distinguished from the corresponding quiet NaN (QNaN)
by the leftmost fraction bit: zero for the SNaN and one for QNaN.
A special QNaN is supplied as the default result for an invalid-operation
condition; it has a plus sign and a leftmost fraction bit of one,
with the remaining fraction bits being set to zeros. Normally, QNaNs
are just propagated during computations, so that they remain visible
at the end. An SNaN operand causes an invalid operation exception.