Series Foreword |
|
xv | |
Foreword |
|
xvii | |
Preface |
|
xix | |
1 Introduction |
|
1 | (14) |
|
|
1 | (1) |
|
|
2 | (1) |
|
|
3 | (8) |
|
1.3.1 Conway's Game of Life |
|
|
4 | (1) |
|
|
5 | (6) |
|
1.4 Passing Hints to the MPI Implementation with MPI_Info |
|
|
11 | (2) |
|
1.4.1 Motivation, Description, and Rationale |
|
|
12 | (1) |
|
1.4.2 An Example from Parallel I/O |
|
|
12 | (1) |
|
1.5 Organization of This Book |
|
|
13 | (2) |
2 Working with Large-Scale Systems |
|
15 | (40) |
|
2.1 Nonblocking Collectives |
|
|
16 | (15) |
|
|
16 | (3) |
|
2.1.2 Example: Five-Point Stencil |
|
|
19 | (1) |
|
2.1.3 Matching, Completion, and Progression |
|
|
20 | (2) |
|
|
22 | (1) |
|
2.1.5 Collective Software Pipelining |
|
|
23 | (4) |
|
2.1.6 A Nonblocking Barrier? |
|
|
27 | (3) |
|
2.1.7 Nonblocking Allreduce and Krylov Methods |
|
|
30 | (1) |
|
2.2 Distributed Graph Topologies |
|
|
31 | (9) |
|
2.2.1 Example: The Peterson Graph |
|
|
37 | (1) |
|
|
37 | (2) |
|
2.2.3 Graph Topology Info Argument |
|
|
39 | (1) |
|
|
39 | (1) |
|
2.3 Collective Operations on Process Topologies |
|
|
40 | (8) |
|
2.3.1 Neighborhood Collectives |
|
|
41 | (3) |
|
2.3.2 Vector Neighborhood Collectives |
|
|
44 | (1) |
|
2.3.3 Nonblocking Neighborhood Collectives |
|
|
45 | (3) |
|
2.4 Advanced Communicator Creation |
|
|
48 | (7) |
|
2.4.1 Nonblocking Communicator Duplication |
|
|
48 | (2) |
|
2.4.2 Noncollective Communicator Creation |
|
|
50 | (5) |
3 Introduction to Remote Memory Operations |
|
55 | (46) |
|
|
57 | (2) |
|
3.2 Contrast with Message Passing |
|
|
59 | (3) |
|
|
62 | (3) |
|
3.3.1 Hints on Choosing Window Parameters |
|
|
64 | (1) |
|
3.3.2 Relationship to Other Approaches |
|
|
65 | (1) |
|
|
65 | (6) |
|
3.4.1 Reasons for Using Displacement Units |
|
|
69 | (1) |
|
3.4.2 Cautions in Using Displacement Units |
|
|
70 | (1) |
|
3.4.3 Displacement Sizes in Fortran |
|
|
71 | (1) |
|
3.5 Completing RMA Data Transfers |
|
|
71 | (2) |
|
3.6 Examples of RMA Operations |
|
|
73 | (15) |
|
3.6.1 Mesh Ghost Cell Communication |
|
|
74 | (10) |
|
3.6.2 Combining Communication and Computation |
|
|
84 | (4) |
|
3.7 Pitfalls in Accessing Memory |
|
|
88 | (7) |
|
3.7.1 Atomicity of Memory Operations |
|
|
89 | (1) |
|
|
90 | (1) |
|
3.7.3 Some Simple Rules for RMA |
|
|
91 | (2) |
|
3.7.4 Overlapping Windows |
|
|
93 | (1) |
|
3.7.5 Compiler Optimizations |
|
|
93 | (2) |
|
3.8 Performance Tuning for RMA Operations |
|
|
95 | (6) |
|
3.8.1 Options for MPI_Win_create |
|
|
95 | (2) |
|
3.8.2 Options for MPI_Win_fence |
|
|
97 | (4) |
4 Advanced Remote Memory Access |
|
101 | (56) |
|
4.1 Passive Target Synchronization |
|
|
101 | (1) |
|
4.2 Implementing Blocking, Independent RMA Operations |
|
|
102 | (2) |
|
4.3 Allocating Memory for MPI Windows |
|
|
104 | (4) |
|
4.3.1 Using MPI_Alloc_mem and MPI_Win_allocate from C |
|
|
104 | (1) |
|
4.3.2 Using MPI_Alloc_mem and MPI_Win_allocate from Fortran 2008 |
|
|
105 | (2) |
|
4.3.3 Using MPI_ALLOC_MEM and MPI_WIN_ALLOCATE from Older Fortran |
|
|
107 | (1) |
|
4.4 Another Version of NXTVAL |
|
|
108 | (7) |
|
4.4.1 The Nonblocking Lock |
|
|
110 | (1) |
|
4.4.2 NXTVAL with MPI_Fetch_and_op |
|
|
110 | (2) |
|
|
112 | (3) |
|
|
115 | (5) |
|
|
120 | (10) |
|
|
122 | (2) |
|
|
124 | (3) |
|
|
127 | (1) |
|
4.6.4 The Rest of Global Arrays |
|
|
128 | (2) |
|
|
130 | (1) |
|
4.8 Managing a Distributed Data Structure |
|
|
131 | (17) |
|
4.8.1 A Shared-Memory Distributed List Implementation |
|
|
132 | (3) |
|
4.8.2 An MPI Implementation of a Distributed List |
|
|
135 | (5) |
|
4.8.3 Inserting into a Distributed List |
|
|
140 | (3) |
|
4.8.4 An MPI Implementation of a Dynamic Distributed List |
|
|
143 | (2) |
|
4.8.5 Comments on More Concurrent List Implementations |
|
|
145 | (3) |
|
4.9 Compiler Optimization and Passive Targets |
|
|
148 | (1) |
|
4.10 MPI RMA Memory Models |
|
|
149 | (3) |
|
4.11 Scalable Synchronization |
|
|
152 | (4) |
|
4.11.1 Exposure and Access Epochs |
|
|
152 | (1) |
|
4.11.2 The Ghost-Point Exchange Revisited |
|
|
153 | (2) |
|
4.11.3 Performance Optimizations for Scalable Synchronization |
|
|
155 | (1) |
|
|
156 | (1) |
5 Using Shared Memory with MPI |
|
157 | (12) |
|
5.1 Using MPI Shared Memory |
|
|
159 | (4) |
|
5.1.1 Shared On-Node Data Structures |
|
|
159 | (1) |
|
5.1.2 Communication through Shared Memory |
|
|
160 | (3) |
|
5.1.3 Reducing the Number of Subdomains |
|
|
163 | (1) |
|
5.2 Allocating Shared Memory |
|
|
163 | (2) |
|
|
165 | (4) |
6 Hybrid Programming |
|
169 | (18) |
|
|
169 | (1) |
|
6.2 Thread Basics and Issues |
|
|
170 | (3) |
|
|
171 | (1) |
|
6.2.2 Performance Issues with Threads |
|
|
172 | (1) |
|
6.2.3 Threads and Processes |
|
|
173 | (1) |
|
|
173 | (3) |
|
6.4 Yet Another Version of NXTVAL |
|
|
176 | (2) |
|
6.5 Nonblocking Version of MPI_Comm_accept |
|
|
178 | (1) |
|
6.6 Hybrid Programming with MPI |
|
|
179 | (3) |
|
6.7 MPI Message and Thread-Safe Probe |
|
|
182 | (5) |
7 Parallel I/0 |
|
187 | (56) |
|
|
187 | (1) |
|
7.2 Using MPI for Simple I/O |
|
|
187 | (8) |
|
7.2.1 Using Individual File Pointers |
|
|
187 | (4) |
|
7.2.2 Using Explicit Offsets |
|
|
191 | (3) |
|
|
194 | (1) |
|
7.3 Noncontiguous Accesses and Collective I/O |
|
|
195 | (8) |
|
7.3.1 Noncontiguous Accesses |
|
|
195 | (4) |
|
|
199 | (4) |
|
7.4 Accessing Arrays Stored in Files |
|
|
203 | (12) |
|
|
204 | (2) |
|
7.4.2 A Word of Warning about Darray |
|
|
206 | (1) |
|
7.4.3 Subarray Datatype Constructor |
|
|
207 | (3) |
|
7.4.4 Local Array with Ghost Area |
|
|
210 | (1) |
|
7.4.5 Irregularly Distributed Arrays |
|
|
211 | (4) |
|
7.5 Nonblocking I/O and Split Collective I/O |
|
|
215 | (1) |
|
|
216 | (3) |
|
7.7 Passing Hints to the Implementation |
|
|
219 | (2) |
|
7.8 Consistency Semantics |
|
|
221 | (8) |
|
|
224 | (1) |
|
7.8.2 Accessing a Common File Opened with MPI_COMM_WORLD |
|
|
224 | (3) |
|
7.8.3 Accessing a Common File Opened with MPI_COMM_SELF |
|
|
227 | (1) |
|
7.8.4 General Recommendation |
|
|
228 | (1) |
|
7.9 File Interoperability |
|
|
229 | (5) |
|
|
229 | (1) |
|
7.9.2 File Data Representation |
|
|
230 | (1) |
|
7.9.3 Use of Datatypes for Portability |
|
|
231 | (2) |
|
7.9.4 User-Defined Data Representations |
|
|
233 | (1) |
|
7.10 Achieving High I/O Performance with MPI |
|
|
234 | (4) |
|
7.10.1 The Four "Levels" of Access |
|
|
234 | (3) |
|
7.10.2 Performance Results |
|
|
237 | (1) |
|
7.11 An Example Application |
|
|
238 | (4) |
|
|
242 | (1) |
8 Coping with Large Data |
|
243 | (6) |
|
8.1 MPI Support for Large Data |
|
|
243 | (1) |
|
8.2 Using Derived Datatypes |
|
|
243 | (1) |
|
|
244 | (1) |
|
8.4 Limitations of This Approach |
|
|
245 | (4) |
|
8.4.1 Collective Reduction Functions |
|
|
245 | (1) |
|
8.4.2 Irregular Collectives |
|
|
246 | (3) |
9 Support for Performance and Correctness Debugging |
|
249 | (22) |
|
|
250 | (13) |
|
|
251 | (6) |
|
9.1.2 Performance Variables |
|
|
257 | (6) |
|
9.2 Info, Assertions, and MPI Objects |
|
|
263 | (4) |
|
9.3 Debugging and the MPIR Debugger Interface |
|
|
267 | (2) |
|
|
269 | (2) |
10 Dynamic Process Management |
|
271 | (34) |
|
|
271 | (1) |
|
10.2 Creating New MPI Processes |
|
|
271 | (20) |
|
10.2.1 Parallel cp: A Simple System Utility |
|
|
272 | (7) |
|
10.2.2 Matrix-Vector Multiplication Example |
|
|
279 | (5) |
|
10.2.3 Intercommunicator Collective Operations |
|
|
284 | (1) |
|
10.2.4 Intercommunicator Point-to-Point Communication |
|
|
285 | (1) |
|
10.2.5 Finding the Number of Available Processes |
|
|
285 | (5) |
|
10.2.6 Passing Command-Line Arguments to Spawned Programs |
|
|
290 | (1) |
|
10.3 Connecting MPI Processes |
|
|
291 | (11) |
|
10.3.1 Visualizing the Computation in an MPI Program |
|
|
292 | (2) |
|
10.3.2 Accepting Connections from Other Programs |
|
|
294 | (2) |
|
10.3.3 Comparison with Sockets |
|
|
296 | (2) |
|
10.3.4 Moving Data between Groups of Processes |
|
|
298 | (1) |
|
|
299 | (3) |
|
10.4 Design of the MPI Dynamic Process Routines |
|
|
302 | (3) |
|
10.4.1 Goals for MPI Dynamic Process Management |
|
|
302 | (1) |
|
10.4.2 What MPI Did Not Standardize |
|
|
303 | (2) |
11 Working with Modern Fortran |
|
305 | (8) |
|
|
305 | (1) |
|
11.2 Problems with the Fortran Interface |
|
|
306 | (7) |
|
11.2.1 Choice Parameters in Fortran |
|
|
307 | (1) |
|
11.2.2 Nonblocking Routines in Fortran |
|
|
308 | (2) |
|
|
310 | (1) |
|
11.2.4 Trouble with LOGICAL |
|
|
311 | (2) |
12 Features for Libraries |
|
313 | (28) |
|
12.1 External Interface Functions |
|
|
313 | (11) |
|
12.1.1 Decoding Datatypes |
|
|
313 | (2) |
|
12.1.2 Generalized Requests |
|
|
315 | (7) |
|
12.1.3 Adding New Error Codes and Classes |
|
|
322 | (2) |
|
12.2 Mixed-Language Programming |
|
|
324 | (3) |
|
|
327 | (4) |
|
12.4 Using Reduction Operations Locally |
|
|
331 | (2) |
|
|
333 | (2) |
|
|
333 | (2) |
|
12.5.2 Error Codes and Classes |
|
|
335 | (1) |
|
12.6 Topics Not Covered in This Book |
|
|
335 | (6) |
13 Conclusions |
|
341 | (2) |
|
13.1 MPI Implementation Status |
|
|
341 | (1) |
|
13.2 Future Versions of the MPI Standard |
|
|
341 | (1) |
|
|
342 | (1) |
A MPI Resources on the World Wide Web |
|
343 | (2) |
References |
|
345 | (8) |
Subject Index |
|
353 | (6) |
Function and Term Index |
|
359 | |