Why systolic architectures?
- Simple and regular design
- nonrecurring cost(设计 design):简单而规则的硬件架构,Google在很短的时间内完成了芯片的设计和实现。
- recurring cost(器件 parts)
- Concurrency and communication
- Balancing computation with I/O

“(Semi-) systolic convolution arrays with global data communication”
broadcast inputs, move results, weights stay

broadcast inputs, move weights, results stay
fan-in results, move inputs, weights stay
“(Pure-) systolic convolution arrays without global data communication”
esults stay, inputs and weights move in opposite directions
results stay, inputs and weights move in the same direction but at different speeds
weights stay, inputs and results move in opposite direction
weights stay, inputs and results move in the same direction but at different speeds





对原始的矩阵进行一些reformat
from: 深入理解Google TPU的脉动阵列架构