CMU Computer Systems: Floating Points

2022-03-21 188 阅读1分钟

Fractional binary numbers

IEEE Floating Point

IEEE Standard 754
- Sign + Exp + Frac
Driven by numerical concerns

Precision options

Single precision: 32bits: 1+8+23
Double precision: 64bits: 1+11+52

Normalized Values

Exp not 000…00 or 111…11
Exponent coded as a biased value: E = Exp-Bias
Significand coded with implied leading 1: 1.xxxx…x

Denormalized Values

Exp is 000…0
Exponent value: E = 1-Bias
Significand coded with implied leading 0: 0.xxx…x

Special Values

Exp all 1, and frac is all 0, infinity number
Exp all 1, and frac not 0, NaN, not a number

Special Properties of the IEEE Encoding

FP Zero Same as Integer Zero
Can Use Unsigned Integer Comparison

Rounding

Towards zero
Round down
Round up

Nearest Even (default)

Greater than half round up
- Less than half round down
- And half round near the even
FP Multiplication
- Exact Result
- Fixing
- Implementation
FP Addition
- Exact Result
- Fixing
Multiplication and Addition are commutative and not associative

Summary

IEEE Floating Point has clear mathematical properties
Represents numbers of form $M \times 2^E$
One can reason about operations independent of implementation
Not the same as real arithmetic