Atnaujinkite slapukų nuostatas

OpenCL Programming Guide [Minkštas viršelis]

4.00/5 (39 ratings by Goodreads)
  • Formatas: Paperback / softback, 648 pages, aukštis x plotis x storis: 230x180x32 mm, weight: 1010 g
  • Serija: OpenGL
  • Išleidimo metai: 28-Jul-2011
  • Leidėjas: Addison-Wesley Educational Publishers Inc
  • ISBN-10: 0321749642
  • ISBN-13: 9780321749642
Kitos knygos pagal šią temą:
  • Formatas: Paperback / softback, 648 pages, aukštis x plotis x storis: 230x180x32 mm, weight: 1010 g
  • Serija: OpenGL
  • Išleidimo metai: 28-Jul-2011
  • Leidėjas: Addison-Wesley Educational Publishers Inc
  • ISBN-10: 0321749642
  • ISBN-13: 9780321749642
Kitos knygos pagal šią temą:
Using the new OpenCL (Open Computing Language) standard, you can write applications that access all available programming resources: CPUs, GPUs, and other processors such as DSPs and the Cell/B.E. processor. Already implemented by Apple, AMD, Intel, IBM, NVIDIA, and other leaders, OpenCL has outstanding potential for PCs, servers, handheld/embedded devices, high performance computing, and even cloud systems. This is the first comprehensive, authoritative, and practical guide to OpenCL 1.1 specifically for working developers and software architects.

Written by five leading OpenCL authorities, OpenCL Programming Guidecovers the entire specification. It reviews key use cases, shows how OpenCL can express a wide range of parallel algorithms, and offers complete reference material on both the API and OpenCL C programming language.

Through complete case studies and downloadable code examples, the authors show how to write complex parallel programs that decompose workloads across many different devices. They also present all the essentials of OpenCL software performance optimization, including probing and adapting to hardware. Coverage includes

  • Understanding OpenCL’s architecture, concepts, terminology, goals, and rationale
  • Programming with OpenCL C and the runtime API
  • Using buffers, sub-buffers, images, samplers, and events
  • Sharing and synchronizing data with OpenGL and Microsoft’s Direct3D
  • Simplifying development with the C++ Wrapper API
  • Using OpenCL Embedded Profiles to support devices ranging from cellphones to supercomputer nodes
  • Case studies dealing with physics simulation; image and signal processing, such as image histograms, edge detection filters, Fast Fourier Transforms, and optical flow; math libraries, such as matrix multiplication and high-performance sparse matrix multiplication; and more
  • Source code for this book is available at https://code.google.com/p/opencl-book-samples/

Recenzijos

Welcome to the new world of heterogeneous parallel programming with this authoritative and accessible guide to the complete OpenCL Programming Model. Professor Pat Hanrahan, Stanford University

Figures
xv
Tables
xxi
Listings
xxv
Foreword xxix
Preface xxxiii
Acknowledgments xli
About the Authors xliii
Part I The OpenCL 1.1 Language and API
1(390)
1 An Introduction to OpenCL
3(36)
What Is OpenCL, or ... Why You Need This Book
3(1)
Our Many-Core Future: Heterogeneous Platforms
4(3)
Software in a Many-Core World
7(4)
Conceptual Foundations of OpenCL
11(18)
Platform Model
12(1)
Execution Model
13(8)
Memory Model
21(3)
Programming Models
24(5)
OpenCL and Graphics
29(1)
The Contents of OpenCL
30(5)
Platform API
31(1)
Runtime API
31(1)
Kernel Programming Language
32(2)
OpenCL Summary
34(1)
The Embedded Profile
35(1)
Learning OpenCL
36(3)
2 HelloWorld: An OpenCL Example
39(24)
Building the Examples
40(5)
Prerequisites
40(1)
Mac OS X and Code::Blocks
41(1)
Microsoft Windows and Visual Studio
42(2)
Linux and Eclipse
44(1)
HelloWorld Example
45(12)
Choosing an OpenCL Platform and Creating a Context
49(1)
Choosing a Device and Creating a Command-Queue
50(2)
Creating and Building a Program Object
52(2)
Creating Kernel and Memory Objects
54(1)
Executing a Kernel
55(2)
Checking for Errors in OpenCL
57(6)
3 Platforms, Contexts, and Devices
63(34)
OpenCL Platforms
63(5)
OpenCL Devices
68(15)
OpenCL Contexts
83(14)
4 Programming with OpenCL C
97(52)
Writing a Data-Parallel Kernel Using OpenCL C
97(2)
Scalar Data Types
99(3)
The half Data Type
101(1)
Vector Data Types
102(6)
Vector Literals
104(2)
Vector Components
106(2)
Other Data Types
108(1)
Derived Types
109(1)
Implicit Type Conversions
110(6)
Usual Arithmetic Conversions
114(2)
Explicit Casts
116(1)
Explicit Conversions
117(4)
Reinterpreting Data as Another Type
121(2)
Vector Operators
123(10)
Arithmetic Operators
124(3)
Relational and Equality Operators
127(1)
Bitwise Operators
127(1)
Logical Operators
128(1)
Conditional Operator
129(1)
Shift Operators
129(2)
Unary Operators
131(1)
Assignment Operator
132(1)
Qualifiers
133(8)
Function Qualifiers
133(1)
Kernel Attribute Qualifiers
134(1)
Address Space Qualifiers
135(5)
Access Qualifiers
140(1)
Type Qualifiers
141(1)
Keywords
141(1)
Preprocessor Directives and Macros
141(5)
Pragma Directives
143(2)
Macros
145(1)
Restrictions
146(3)
5 OpenCL C Built-in Functions
149(68)
Work-Item Functions
150(3)
Math Functions
153(15)
Floating-Point Pragmas
162(1)
Floating-Point Constants
162(1)
Relative Error as ulps
163(5)
Integer Functions
168(4)
Common Functions
172(3)
Geometric Functions
175(1)
Relational Functions
175(6)
Vector Data Load and Store Functions
181(9)
Synchronization Functions
190(1)
Async Copy and Prefetch Functions
191(4)
Atomic Functions
195(4)
Miscellaneous Vector Functions
199(2)
Image Read and Write Functions
201(16)
Reading from an Image
201(5)
Samplers
206(3)
Determining the Border Color
209(1)
Writing to an Image
210(4)
Querying Image Information
214(3)
6 Programs and Kernels
217(30)
Program and Kernel Object Overview
217(1)
Program Objects
218(19)
Creating and Building Programs
218(4)
Program Build Options
222(5)
Creating Programs from Binaries
227(9)
Managing and Querying Programs
236(1)
Kernel Objects
237(10)
Creating Kernel Objects and Setting Kernel Arguments
237(4)
Thread Safety
241(1)
Managing and Querying Kernels
242(5)
7 Buffers and Sub-Buffers
247(34)
Memory Objects, Buffers, and Sub-Buffers Overview
247(2)
Creating Buffers and Sub-Buffers
249(8)
Querying Buffers and Sub-Buffers
257(2)
Reading, Writing, and Copying Buffers and Sub-Buffers
259(17)
Mapping Buffers and Sub-Buffers
276(5)
8 Images and Samplers
281(28)
Image and Sampler Object Overview
281(2)
Creating Image Objects
283(9)
Image Formats
287(4)
Querying for Image Support
291(1)
Creating Sampler Objects
292(3)
OpenCL C Functions for Working with Images
295(4)
Transferring Image Objects
299(10)
9 Events
309(26)
Commands, Queues, and Events Overview
309(2)
Events and Command-Queues
311(6)
Event Objects
317(4)
Generating Events on the Host
321(1)
Events Impacting Execution on the Host
322(5)
Using Events for Profiling
327(5)
Events Inside Kernels
332(1)
Events from Outside OpenCL
333(2)
10 Interoperability with OpenGL
335(18)
OpenCL/OpenGL Sharing Overview
335(1)
Querying for the OpenGL Sharing Extension
336(2)
Initializing an OpenCL Context for OpenGL Interoperability
338(1)
Creating OpenCL Buffers from OpenGL Buffers
339(5)
Creating OpenCL Image Objects from OpenGL Textures
344(3)
Querying Information about OpenGL Objects
347(1)
Synchronization between OpenGL and OpenCL
348(5)
11 Interoperability with Direct3D
353(16)
Direct3D/OpenCL Sharing Overview
353(1)
Initializing an OpenCL Context for Direct3D Interoperability
354(3)
Creating OpenCL Memory Objects from Direct3D Buffers and Textures
357(4)
Acquiring and Releasing Direct3D Objects in OpenCL
361(2)
Processing a Direct3D Texture in OpenCL
363(3)
Processing D3D Vertex Data in OpenCL
366(3)
12 C++ Wrapper API
369(14)
C++ Wrapper API Overview
369(2)
C++ Wrapper API Exceptions
371(3)
Vector Add Example Using the C++ Wrapper API
374(9)
Choosing an OpenCL Platform and Creating a Context
375(1)
Choosing a Device and Creating a Command-Queue
376(1)
Creating and Building a Program Object
377(1)
Creating Kernel and Memory Objects
377(1)
Executing the Vector Add Kernel
378(5)
13 OpenCL Embedded Profile
383(8)
OpenCL Profile Overview
383(2)
64-Bit Integers
385(1)
Images
386(1)
Built-in Atomic Functions
387(1)
Mandated Minimum Single-Precision Floating-Point Capabilities
387(3)
Determining the Profile Supported by a Device in an OpenCL. C Program
390(1)
Part II OpenCL 1.1 Case Studies
391(150)
14 Image Histogram
393(14)
Computing an Image Histogram
393(2)
Parallelizing the Image Histogram
395(5)
Additional Optimizations to the Parallel Image Histogram
400(3)
Computing Histograms with Half-Float or Float Values for Each Channel
403(4)
15 Sobel Edge Detection Filter
407(4)
What Is a Sobel Edge Detection Filter?
407(1)
Implementing the Sobel Filter as an OpenCL Kernel
407(4)
16 Parallelizing Dijkstra's Single-Source Shortest-Path Graph Algorithm
411(14)
Graph Data Structures
412(2)
Kernels
414(3)
Leveraging Multiple Compute Devices
417(8)
17 Cloth Simulation in the Bullet Physics SDK
425(24)
An Introduction to Cloth Simulation
425(4)
Simulating the Soft Body
429(2)
Executing the Simulation on the CPU
431(1)
Changes Necessary for Basic GPU Execution
432(6)
Two-Layered Batching
438(3)
Optimizing for SIMD Computation and Local Memory
441(5)
Adding OpenGL Interoperation
446(3)
18 Simulating the Ocean with Fast Fourier Transform
449(20)
An Overview of the Ocean Application
450(3)
Phillips Spectrum Generation
453(4)
An OpenCL Discrete Fourier Transform
457(6)
Determining 2D Decomposition
457(2)
Using Local Memory
459(1)
Determining the Sub-Transform Size
459(1)
Determining the Work-Group Size
460(1)
Obtaining the Twiddle Factors
461(1)
Determining How Much Local Memory Is Needed
462(1)
Avoiding Local Memory Bank Conflicts
463(1)
Using Images
463(1)
A Closer Look at the FFT Kernel
463(4)
A Closer Look at the Transpose Kernel
467(2)
19 Optical Flow
469(18)
Optical Flow Problem Overview
469(11)
Sub-Pixel Accuracy with Hardware Linear Interpolation
480(1)
Application of the Texture Cache
480(1)
Using Local Memory
481(2)
Early Exit and Hardware Scheduling
483(1)
Efficient Visualization with OpenGL Interop
483(1)
Performance
484(3)
20 Using OpenCL with PyOpenCL
487(12)
Introducing PyOpenCI
487(1)
Running the PyImageFilter2D Example
488(1)
PyImageFilter2D Code
488(4)
Context and Command-Queue Creation
492(1)
Loading to an Image Object
493(1)
Creating and Building a Program
494(1)
Setting Kernel Arguments and Executing a Kernel
495(1)
Reading the Results
496(3)
21 Matrix Multiplication with OpenCL
499(16)
The Basic Matrix Multiplication Algorithm
499(2)
A Direct Translation into OpenCL
501(5)
Increasing the Amount of Work per Kernel
506(3)
Optimizing Memory Movement: Local Memory
509(2)
Performance Results and Optimizing the Original CPU Code
511(4)
22 Sparse Matrix-Vector Multiplication
515(26)
Sparse Matrix-Vector Multiplication (SpMV) Algorithm
515(3)
Description of This Implementation
518(1)
Tiled and Packetized Sparse Matrix Representation
519(3)
Header Structure
522(1)
Tiled and Packetized Sparse Matrix Design Considerations
523(1)
Optional Team Information
524(1)
Tested Hardware Devices and Results
524(14)
Additional Areas of Optimization
538(3)
A Summary of OpenCL 1.1
541(40)
The OpenCL Platform Layer
541(2)
Contexts
541(1)
Querying Platform Information and Devices
542(1)
The OpenCL Runtime
543(1)
Command-Queues
543(1)
Buffer Objects
544(2)
Create Buffer Objects
544(1)
Read, Write, and Copy Buffer Objects
544(1)
Map Buffer Objects
545(1)
Manage Buffer Objects
545(1)
Query Buffer Objects
545(1)
Program Objects
546(1)
Create Program Objects
546(1)
Build Program Executable
546(1)
Build Options
546(1)
Query Program Objects
547(1)
Unload the OpenCL Compiler
547(1)
Kernel and Event Objects
547(3)
Create Kernel Objects
547(1)
Kernel Arguments and Object Queries
548(1)
Execute Kernels
548(1)
Event Objects
549(1)
Out-of-Order Execution of Kernels and Memory Object Commands
549(1)
Profiling Operations
549(1)
Flush and Finish
550(1)
Supported Data Types
550(2)
Built-in Scalar Data Types
550(1)
Built-in Vector Data Types
551(1)
Other Built-in Data Types
551(1)
Reserved Data Types
551(1)
Vector Component Addressing
552(3)
Vector Components
552(1)
Vector Addressing Equivalencies
553(1)
Conversions and Type Casting Examples
554(1)
Operators
554(1)
Address Space Qualifiers
554(1)
Function Qualifiers
554(1)
Preprocessor Directives and Macros
555(1)
Specify Type Attributes
555(1)
Math Constants
556(1)
Work-Item Built-in Functions
557(1)
Integer Built-in Functions
557(2)
Common Built-in Functions
559(1)
Math Built-in Functions
560(3)
Geometric Built-in Functions
563(1)
Relational Built-in Functions
564(3)
Vector Data Load/Store Functions
567(1)
Atomic Functions
568(2)
Async Copies and Prefetch Functions
570(1)
Synchronization, Explicit Memory Fence
570(1)
Miscellaneous Vector Built-in Functions
571(1)
Image Read and Write Built-in Functions
572(1)
Image Objects
573(3)
Create Image Objects
573(1)
Query List of Supported Image Formats
574(1)
Copy between Image, Buffer Objects
574(1)
Map and Unmap Image Objects
574(1)
Read, Write, Copy Image Objects
575(1)
Query Image Objects
575(1)
Image Formats
576(1)
Access Qualifiers
576(1)
Sampler Objects
576(1)
Sampler Declaration Fields
577(1)
OpenCL Device Architecture Diagram
577(1)
OpenCI./OpenGL Sharing APIs
577(2)
CI, Buffer Objects > GL Buffer Objects
578(1)
CI.Image Objects > GL Textures
578(1)
CL Image Objects > GL Renderbuffers
578(1)
Query Information
578(1)
Share Objects
579(1)
CL Event Objects > GL Sync Objects
579(1)
CL Context > GL Context, Sharegroup
579(1)
OpenCL/Direct3D 10 Sharing APIs
579(2)
Index 581
Aaftab Munshi is the spec editor for the OpenGL ES 1.1, OpenGL ES 2.0, and OpenCL specifications and coauthor of the book OpenGL ES 2.0 Programming Guide (with Dan Ginsburg and Dave Shreiner, published by Addison-Wesley, 2008). He currently works at Apple.  

Benedict R. Gaster is a software architect working on programming models for next-generation heterogeneous processors, in particular looking at high-level abstractions for parallel programming on the emerging class of processors that contain both CPUs and accelerators such as GPUs. Benedict has contributed extensively to the OpenCLs design and has represented AMD at the Khronos Group open standard consortium. Benedict has a Ph.D. in computer science for his work on type systems for extensible records and variants. He has been working at AMD since 2008.

 

Timothy G. Mattson is an old-fashioned parallel programmer, having started in the mid-eighties with the Caltech Cosmic Cube and continuing to the present. Along the way, he has worked with most classes of parallel computers (vector supercomputers, SMP, VLIW, NUMA, MPP, clusters, and many-core processors). Tim has published extensively, including the books Patterns for Parallel Programming (with Beverly Sanders and Berna Massingill, published by Addison-Wesley, 2004) and An Introduction to Concurrency in Programming Languages (with Matthew J. Sottile and Craig E. Rasmussen, published by CRC Press, 2009). Tim has a Ph.D. in chemistry for his work on molecular scattering theory. He has been working at Intel since 1993.

 

James Fung has been developing computer vision on the GPU as it progressed from graphics to general-purpose computation. James has a Ph.D. in electrical and computer engineering from the University of Toronto and numerous IEEE and ACM publications in the areas of parallel GPU Computer Vision and Mediated Reality. He is currently a Developer Technology Engineer at NVIDIA, where he examines computer vision and image processing on graphics hardware.

 

Dan Ginsburg currently works at Childrens Hospital Boston as a Principal Software Architect in the Fetal-Neonatal Neuroimaging and Development Science Center, where he uses OpenCL for accelerating neuroimaging algorithms. Previously, he worked for Still River Systems developing GPU-accelerated image registration software for the Monarch 250 proton beam radiotherapy system. Dan was also Senior Member of Technical Staff at AMD, where he worked for over eight years in a variety of roles, including developing OpenGL drivers, creating desktop and hand-held 3D demos, and leading the development of handheld GPU developer tools. Dan holds a B.S. in computer science from Worcester Polytechnic Institute and an M.B.A. from Bentley University.