CMU Computer Systems: Thread-Level Parallelism

155 阅读1分钟

Exploiting parallel execution

  • Use threads to deal with I/O delays
  • Multi-core/Hyperthreaded CPUs offer another opportunity
    • Spread work over threads executing in parallel
    • Happens automatically, if many independent tasks
    • Can also write code to make one big task go faster

Typical Multicore Processor

  • image.png

Out-of-Order Processor Structure

  • image.png

Hyperthreading Implementation

  • image.png

Characterizing Parallel Program Performance

  • p processor cores, T_k is the running time using k cores
  • Speedup: Sp=T1/TpS_p= T_1 /T_p
    • SpS_p is relative speedup if T1T_1 is running rime of parallel version of the code running on 1 core
    • SpS_p is absolute speedup if T1T_1 is running time of sequential version of code running on 1 core
    • Absolute speedup is a much truer measure of the benefits of parallelism
  • Efficiency: Ep=Sp/p=T1/(pTp)E_p=S_p/p = T_1/(pT_p)
    • Reported as a percentage in the range (0, 100]
    • Measures the overhead due to parallelization

Amdahl's Law

  • Captures the difficulty of using parallelism to speed things up
  • Overall problem
    • T Total sequential time required
    • p Fraction of total that can be sped up (0p10\leq p \leq 1)
    • k Speedup factor
  • Resulting Performance
    • Tk=pT/k+(1p)TT_k=pT/k+(1−p)T
      • Portion which can be sped up runs k times faster
      • Portion which cannot be sped up stays the same
    • Least possible running time
      • k=k = ∞
      • T=(1p)TT_∞=(1−p)T

Experience with Parallel Partitioning

  • Could not obtain speedup
  • Speculate: Too much data copying
    • Could not do everything within source array
    • Set up temporary space for reassembling partition

Memory Consistency

  • What are the possible values printed
    • Depends on memory consistency model
    • Abstract model of how hardware handles concurrent accesses
  • Sequential consistency
    • Overall effect consistent with each individual thread
    • Otherwise, arbitrary interleaving

Snoopy Caches

  • Tag each cache block with state
    • Invalid Cannot use value
    • Shared Readable copy
    • Exclusive Writeable copy