cpu cache
Writes become slow
Store buffers
● Idea: ● Record a store in a CPU buffer ● CPU can proceed immediately ● Complete store when invalidate message is received ● Move a cache line from the store buffer to the cache Write memory barrier
Write memory barrier
● Memory barrier smp_wmb() ● Cause the CPU to flush its store buffer before applying subsequent stores to their cache lines ● The CPU could either simply stall until the store buffer was empty before proceeding, ● Or it could use the store buffer to hold subsequent stores until all of the prior entries in the store buffer had been applied
Invalidate queues
● Invalidate messages can be slow ● Caches can be overloaded ● While waiting for invalidate acknowledgements ● Run out of space in the store buffer
- Idea: Why wait for cache?
- Store invalidate request in a queue ,Acknowledge it right away ,Apply later (接收方)
Read memory barrier
● Read barrier smp_rmb() ● Marks all the entries currently in its invalidate queue, and forces any subsequent load to wait until all marked entries have been applied to the CPU’s cache.
Conclusion
● Memory barriers are required to ensure correct order of cross-CPU memory updates ● E.g. update two memory locations a, and b ● Two memory barriers are common ● Write ● Read