Fractional binary numbers
IEEE Floating Point
-
IEEE Standard 754
- Sign + Exp + Frac
-
Driven by numerical concerns
Precision options
- Single precision: 32bits: 1+8+23
- Double precision: 64bits: 1+11+52
Normalized Values
- Exp not 000…00 or 111…11
- Exponent coded as a biased value: E = Exp-Bias
- Significand coded with implied leading 1: 1.xxxx…x
Denormalized Values
- Exp is 000…0
- Exponent value: E = 1-Bias
- Significand coded with implied leading 0: 0.xxx…x
Special Values
- Exp all 1, and frac is all 0, infinity number
- Exp all 1, and frac not 0, NaN, not a number
Special Properties of the IEEE Encoding
- FP Zero Same as Integer Zero
- Can Use Unsigned Integer Comparison
Rounding
- Towards zero
- Round down
- Round up
Nearest Even (default)
-
Greater than half round up
- Less than half round down
- And half round near the even
-
FP Multiplication
- Exact Result
- Fixing
- Implementation
-
FP Addition
- Exact Result
- Fixing
-
Multiplication and Addition are commutative and not associative
Summary
- IEEE Floating Point has clear mathematical properties
- Represents numbers of form
- One can reason about operations independent of implementation
- Not the same as real arithmetic