Atnaujinkite slapukų nuostatas

Primer on Memory Consistency and Cache Coherence 2nd Revised edition [Minkštas viršelis]

4.47/5 (17 ratings by Goodreads)
  • Formatas: Paperback / softback, 294 pages, aukštis x plotis: 235x191 mm, weight: 333 g
  • Serija: Synthesis Lectures on Computer Architecture
  • Išleidimo metai: 04-Feb-2020
  • Leidėjas: Morgan & Claypool Publishers
  • ISBN-10: 1681737094
  • ISBN-13: 9781681737096
Kitos knygos pagal šią temą:
  • Formatas: Paperback / softback, 294 pages, aukštis x plotis: 235x191 mm, weight: 333 g
  • Serija: Synthesis Lectures on Computer Architecture
  • Išleidimo metai: 04-Feb-2020
  • Leidėjas: Morgan & Claypool Publishers
  • ISBN-10: 1681737094
  • ISBN-13: 9781681737096
Kitos knygos pagal šią temą:

Many modern computer systems, including homogeneous and heterogeneous architectures, support shared memory in hardware.

In a shared memory system, each of the processor cores may read and write to a single shared address space. For a shared memory machine, the memory consistency model defines the architecturally visible behavior of its memory system. Consistency definitions provide rules about loads and stores (or memory reads and writes) and how they act upon memory. As part of supporting a memory consistency model, many machines also provide cache coherence protocols that ensure that multiple cached copies of data are kept up-to-date. The goal of this primer is to provide readers with a basic understanding of consistency and coherence. This understanding includes both the issues that must be solved as well as a variety of solutions. We present both high-level concepts as well as specific, concrete examples from real-world systems.

This second edition reflects a decade of advancements since the first edition and includes, among other more modest changes, two new chapters: one on consistency and coherence for non-CPU accelerators (with a focus on GPUs) and one that points to formal work and tools on consistency and coherence.

Preface to the Second Edition xvii
Preface to the First Edition xix
1 Introduction to Consistency and Coherence
1(8)
1.1 Consistency (a.k.a., Memory Consistency, Memory Consistency Model, or Memory Model)
2(2)
1.2 Coherence (a.k.a., Cache Coherence)
4(2)
1.3 Consistency and Coherence for Heterogeneous Systems
6(1)
1.4 Specifying and Validating Memory Consistency Models and Cache Coherence
6(1)
1.5 A Consistency and Coherence Quiz
6(1)
1.6 What This Primer Does Not Do
7(1)
1.7 References
8(1)
2 Coherence Basics
9(8)
2.1 Baseline System Model
9(1)
2.2 The Problem: How Incoherence Could Possibly Occur
10(1)
2.3 The Cache Coherence Interface
11(1)
2.4 (Consistency-Agnostic) Coherence Invariants
12(4)
2.4.1 Maintaining the Coherence Invariants
14(1)
2.4.2 The Granularity of Coherence
14(1)
2.4.3 When is Coherence Relevant?
15(1)
2.5 References
16(1)
3 Memory Consistency Motivation and Sequential Consistency
17(22)
3.1 Problems with Shared Memory Behavior
17(3)
3.2 What is a Memory Consistency Model?
20(1)
3.3 Consistency vs. Coherence
21(1)
3.4 Basic Idea of Sequential Consistency (SC)
22(1)
3.5 A Little SC Formalism
23(3)
3.6 Naive SC Implementations
26(1)
3.7 A Basic SC Implementation with Cache Coherence
27(2)
3.8 Optimized SC Implementations with Cache Coherence
29(4)
3.9 Atomic Operations with SC
33(1)
3.10 Putting it All Together: MIPS R10000
34(1)
3.11 Further Reading Regarding SC
35(1)
3.12 References
36(3)
4 Total Store Order and the x86 Memory Model
39(16)
4.1 Motivation for TSO/x86
39(1)
4.2 Basic Idea of TSO/x86
40(2)
4.3 A Little TSO/x86 Formalism
42(5)
4.4 Implementing TSO/x86
47(3)
4.4.1 Implementing Atomic Instructions
47(1)
4.4.2 Implementing Fences
48(2)
4.5 Further Reading Regarding TSO
50(1)
4.6 Comparing SC and TSO
50(2)
4.7 References
52(3)
5 Relaxed Memory Consistency
55(36)
5.1 Motivation
55(3)
5.1.1 Opportunities to Reorder Memory Operations
56(1)
5.1.2 Opportunities to Exploit Reordering
57(1)
5.2 An Example Relaxed Consistency Model (XC)
58(7)
5.2.1 The Basic Idea of the XC Model
59(1)
5.2.2 Examples Using Fences Under XC
60(1)
5.2.3 Formalizing XC
60(2)
5.2.4 Examples Showing XC Operating Correctly
62(3)
5.3 Implementing XC
65(3)
5.3.1 Atomic Instructions with XC
66(1)
5.3.2 Fences With XC
67(1)
5.3.3 A Caveat
68(1)
5.4 Sequential Consistency for Data-Race-Free Programs
68(4)
5.5 Some Relaxed Model Concepts
72(3)
5.5.1 Release Consistency
72(1)
5.5.2 Causality and Write Atomicity
73(2)
5.6 Relaxed Memory Model Case Studies
75(6)
5.6.1 RISC-V Weak Memory Order (RVWMO)
75(3)
5.6.2 IBM Power
78(3)
5.7 Further Reading and Commercial Relaxed Memory Models
81(1)
5.7.1 Academic Literature
81(1)
5.7.2 Commercial Models
81(1)
5.8 Comparing Memory Models
82(1)
5.8.1 How Do Relaxed Memory Models Relate to Each Other and TSO and SC?
82(1)
5.8.2 How Good Are Relaxed Models?
82(1)
5.9 High-Level Language Models
83(3)
5.10 References
86(5)
6 Coherence Protocols
91(16)
6.1 The Big Picture
91(2)
6.2 Specifying Coherence Protocols
93(1)
6.3 Example of a Simple Coherence Protocol
94(2)
6.4 Overview of Coherence Protocol Design Space
96(9)
6.4.1 States
97(4)
6.4.2 Transactions
101(2)
6.4.3 Major Protocol Design Options
103(2)
6.5 References
105(2)
7 Snooping Coherence Protocols
107(44)
7.1 Introduction to Snooping
107(4)
7.2 Baseline Snooping Protocol
111(12)
7.2.1 High-Level Protocol Specification
112(1)
7.2.2 Simple Snooping System Model: Atomic Requests, Atomic Transactions
113(4)
7.2.3 Baseline Snooping System Model: Non-Atomic Requests, Atomic Transactions
117(4)
7.2.4 Running Example
121(1)
7.2.5 Protocol Simplifications
122(1)
7.3 Adding the Exclusive State
123(3)
7.3.1 Motivation
123(1)
7.3.2 Getting to the Exclusive State
123(1)
7.3.3 High-Level Specification of Protocol
124(2)
7.3.4 Detailed Specification
126(1)
7.3.5 Running Example
126(1)
7.4 Adding the Owned State
126(6)
7.4.1 Motivation
128(1)
7.4.2 High-Level Protocol Specification
129(1)
7.4.3 Detailed Protocol Specification
130(2)
7.4.4 Running Example
132(1)
7.5 Non-Atomic Bus s
132(10)
7.5.1 Motivation
132(1)
7.5.2 In-Order vs. Out-of-Order Responses
133(1)
7.5.3 Non-Atomic System Model
134(1)
7.5.4 An MSI Protocol with a SpUt-Transaction Bus
135(5)
7.5.5 An Optimized, Non-Stalling MSI Protocol with a Split-Transaction Bus
140(2)
7.6 Optimizations to the Bus Interconnection Network
142(2)
7.6.1 Separate Non-Bus Network for Data Responses
142(1)
7.6.2 Logical Bus for Coherence Requests
143(1)
7.7 Case Studies
144(4)
7.7.1 Sun Starfire E10000
144(1)
7.7.2 IBM Power5
145(3)
7.8 Discussion and the Future of Snooping
148(1)
7.9 References
148(3)
8 Directory Coherence Protocols
151(40)
8.1 Introduction to Directory Protocols
151(2)
8.2 Baseline Directory System
153(9)
8.2.1 Directory System Model
153(1)
8.2.2 High-Level Protocol Specification
153(2)
8.2.3 Avoiding Deadlock
155(3)
8.2.4 Detailed Protocol Specification
158(1)
8.2.5 Protocol Operation
159(2)
8.2.6 Protocol Simplifications
161(1)
8.3 Adding the Exclusive State
162(2)
8.3.1 High-Level Protocol Specification
162(2)
8.3.2 Detailed Protocol Specification
164(1)
8.4 Adding the Owned State
164(4)
8.4.1 High-Level Protocol Specification
164(4)
8.4.2 Detailed Protocol Specification
168(1)
8.5 Representing Directory State
168(4)
8.5.1 Coarse Directory
168(3)
8.5.2 Limited Pointer Directory
171(1)
8.6 Directory Organization
172(5)
8.6.1 Directory Cache Backed by DRAM
173(1)
8.6.2 Inclusive Directory Caches
174(2)
8.6.3 Null Directory Cache (With no Backing Store)
176(1)
8.7 Performance and Scalability Optimizations
177(6)
8.7.1 Distributed Directories
177(1)
8.7.2 Non-Stalling Directory Protocols
178(2)
8.7.3 Interconnection Networks Without Point-to-Point Ordering
180(2)
8.7.4 Silent vs. Non-Silent Evictions of Blocks in State S
182(1)
8.8 Case Studies
183(6)
8.8.1 SGI Origin 2000
183(2)
8.8.2 Coherent HyperTransport
185(2)
8.8.3 Hypertransport Assist
187(1)
8.8.4 Intel QPI
187(2)
8.9 Discussion and the Future of Directory Protocols
189(1)
8.10 References
189(2)
9 Advanced Topics in Coherence
191(20)
9.1 System Models
191(7)
9.1.1 Instruction Caches
191(1)
9.1.2 Translation Lookaside Buffers (TLBs)
192(1)
9.1.3 Virtual Caches
192(1)
9.1.4 Write-Through Caches
193(1)
9.1.5 Coherent Direct Memory Access (DMA)
194(1)
9.1.6 Multi-Level Caches and Hierarchical Coherence Protocols
195(3)
9.2 Performance Optimizations
198(2)
9.2.1 Migratory Sharing Optimization
198(1)
9.2.2 False Sharing Optimizations
199(1)
9.3 Maintaining Liveness
200(7)
9.3.1 Deadlock
200(3)
9.3.2 Livelock
203(3)
9.3.3 Starvation
206(1)
9.4 Token Coherence
207(1)
9.5 The Future of Coherence
207(1)
9.6 References
207(4)
10 Consistency and Coherence for Heterogeneous Systems
211(40)
10.1 GPU Consistency and Coherence
211(26)
10.1.1 Early GPUs: Architecture and Programming Model
212(4)
10.1.2 Big Picture: GPGPU Consistency and Coherence
216(1)
10.1.3 Temporal Coherence
217(9)
10.1.4 Release Consistency-directed Coherence
226(11)
10.2 More Heterogeneity Than Just GPUs
237(10)
10.2.1 Heterogeneous Consistency Models
237(3)
10.2.2 Heterogeneous Coherence Protocols
240(7)
10.3 Further Reading
247(1)
10.4 References
247(4)
11 Specifying and Validating Memory Consistency Models and Cache Coherence
251(22)
11.1 Specification
251(9)
11.1.1 Operational Specification
252(4)
11.1.2 Axiomatic Specification
256(4)
11.2 Exploring the Behavior of Memory Consistency Models
260(2)
11.2.1 Litmus Tests
260(1)
11.2.2 Exploration
261(1)
11.3 Validating Implementations
262(5)
11.3.1 Formal Methods
262(3)
11.3.2 Testing
265(2)
11.4 History and Further Reading
267(1)
11.5 References
267(6)
Authors' Biographies 273