CMU Computer Systems: Floating Points

188 阅读1分钟

Fractional binary numbers

IEEE Floating Point

  • IEEE Standard 754

    • Sign + Exp + Frac
  • Driven by numerical concerns

Precision options

  • Single precision: 32bits: 1+8+23
  • Double precision: 64bits: 1+11+52

Normalized Values

  • Exp not 000…00 or 111…11
  • Exponent coded as a biased value: E = Exp-Bias
  • Significand coded with implied leading 1: 1.xxxx…x

Denormalized Values

  • Exp is 000…0
  • Exponent value: E = 1-Bias
  • Significand coded with implied leading 0: 0.xxx…x

Special Values

  • Exp all 1, and frac is all 0, infinity number
  • Exp all 1, and frac not 0, NaN, not a number

Special Properties of the IEEE Encoding

  • FP Zero Same as Integer Zero
  • Can Use Unsigned Integer Comparison

Rounding

  • Towards zero
  • Round down
  • Round up

Nearest Even (default)

  • Greater than half round up

    • Less than half round down
    • And half round near the even
  • FP Multiplication

    • Exact Result
    • Fixing
    • Implementation
  • FP Addition

    • Exact Result
    • Fixing
  • Multiplication and Addition are commutative and not associative

Summary

  • IEEE Floating Point has clear mathematical properties
  • Represents numbers of form M×2EM \times 2^E
  • One can reason about operations independent of implementation
  • Not the same as real arithmetic