Klientų aptarnavimas: +370 652 87781

Pagalba | Naujas vartotojas | Prisijungti

El. knyga: Handbook of Floating-Point Arithmetic

Florent de Dinechin, Serge Torres, Nicolas Brunie, Guillaume Melquiond, Claude-Pierre Jeannerod, Vincent Lefčvre, Mioara Joldes, Nathalie Revol, Jean-Michel Muller

Formatas: PDF+DRM
Išleidimo metai: 02-May-2018
Leidėjas: Birkhauser Verlag AG
Kalba: eng
ISBN-13: 9783319765266

Kitos knygos pagal šią temą:

Formatas - PDF+DRM
Kaina: 160,54 €*
* ši kaina yra galutinė, t.y. papildomos nuolaidos nebus taikomos
Įdėti į krepšelį
Įtraukti į pageidavimų sąrašą
Ši e-knyga skirta tik asmeniniam naudojimui. El. knygos nėra grąžinamos.

Formatas: PDF+DRM
Išleidimo metai: 02-May-2018
Leidėjas: Birkhauser Verlag AG
Kalba: eng
ISBN-13: 9783319765266

Kitos knygos pagal šią temą:

DRM apribojimai

Kopijuoti:

neleidžiama
Spausdinti:

neleidžiama
El. knygos naudojimas:

Skaitmeninių teisių valdymas (DRM)
Leidykla pateikė šią knygą šifruota forma, o tai reiškia, kad norint ją atrakinti ir perskaityti reikia įdiegti nemokamą programinę įrangą. Norint skaityti šią el. knygą, turite susikurti Adobe ID . Daugiau informacijos čia. El. knygą galima atsisiųsti į 6 įrenginius (vienas vartotojas su tuo pačiu Adobe ID).

Reikalinga programinė įranga
Norint skaityti šią el. knygą mobiliajame įrenginyje (telefone ar planšetiniame kompiuteryje), turite įdiegti šią nemokamą programėlę: PocketBook Reader (iOS / Android)

Norint skaityti šią el. knygą asmeniniame arba „Mac“ kompiuteryje, Jums reikalinga Adobe Digital Editions “ (tai nemokama programa, specialiai sukurta el. knygoms. Tai nėra tas pats, kas „Adobe Reader“, kurią tikriausiai jau turite savo kompiuteryje.)

Negalite skaityti šios el. knygos naudodami „Amazon Kindle“.

This handbook is a definitive guide to the effective use of modern floating-point arithmetic, which has considerably evolved, from the frequently inconsistent floating-point number systems of early computing to the recent IEEE 754-2008 standard. Most of computational mathematics depends on floating-point numbers, and understanding their various implementations will allow readers to develop programs specifically tailored for the standards technical features. Algorithms for floating-point arithmetic are presented throughout the book and illustrated where possible by example programs which show how these techniques appear in actual coding and design. The volume itself breaks its core topic into four parts: the basic concepts and history of floating-point arithmetic; methods of analyzing floating-point algorithms and optimizing them; implementations of IEEE 754-2008 in hardware and software; and useful extensions to the standard floating-point system, such as interval arithmetic, double- and triple-word arithmetic, operations on complex numbers, and formal verification of floating-point algorithms. This new edition updates chapters to reflect recent changes to programming languages and compilers and the new prevalence of GPUs in recent years. The revisions also add material on fused multiply-add instruction, and methods of extending the floating-point precision. As supercomputing becomes more common, more numerical engineers will need to use number representation to account for trade-offs between various parameters, such as speed, accuracy, and energy consumption. The Handbook of Floating-Point Arithmetic is designed for students and researchers in numerical analysis, programmers of numerical algorithms, compiler designers, and designers of arithmetic operators.

Recenzijos

The new edition of this book updates chapters to reflect recent changes to programming languages and compilers and the new prevalence of Graphic Processing Units in recent years. In the Appendix, the reader will find an introduction to relevant number theory tools . This book is designed for programmers of numerical applications and more generally students and researchers in numerical analysis who wish to more accurately understand a tool that they manipulate on an everyday basis. (T. C. Mohan, zbMATH 1394.65001, 2018)

List of Figures

List of Tables

xix

Preface

xxiii

Part I Introduction, Basic Definitions, and Standards

(94)

1 Introduction

(12)

1.1 Some History

(3)

1.2 Desirable Properties

(1)

1.3 Some Strange Behaviors

(7)

1.3.1 Some famous bugs

(1)

1.3.2 Difficult problems

(6)

2 Definitions and Basic Notions

(32)

2.1 Floating-Point Numbers

(7)

2.1.1 Main definitions

(2)

2.1.2 Normalized representations, normal and subnormal numbers

(2)

2.1.3 A note on underflow

(2)

2.1.4 Special floating-point data

(1)

2.2 Rounding

(3)

2.2.1 Rounding functions

(2)

2.2.2 Useful properties

(1)

2.3 Tools for Manipulating Floating-Point Errors

(12)

2.3.1 Relative error due to rounding

(4)

2.3.2 The ulp function

(5)

2.3.3 Link between errors in ulps and relative errors

(1)

2.3.4 An example: iterated products

(2)

2.4 The Fused Multiply-Add (FMA) Instruction

(1)

2.5 Exceptions

(3)

2.6 Lost and Preserved Properties of Real Arithmetic

(1)

2.7 Note on the Choice of the Radix

(3)

2.7.1 Representation errors

(2)

2.7.2 A case for radix 10

(1)

2.8 Reproducibility

(3)

3 Floating-Point Formats and Environment

(48)

3.1 The IEEE 754-2008 Standard

(31)

3.1.1 Formats

(18)

3.1.2 Attributes and rounding

(4)

3.1.3 Operations specified by the standard

(2)

3.1.4 Comparisons

(1)

3.1.5 Conversions to/from string representations

(1)

3.1.6 Default exception handling

(3)

3.1.7 Special values

(2)

3.1.8 Recommended functions

(1)

3.2 On the Possible Hidden Use of a Higher Internal Precision

(3)

3.3 Revision of the IEEE 754-2008 Standard

(1)

3.4 Floating-Point Hardware in Current Processors

(6)

3.4.1 The common hardware denominator

(1)

3.4.2 Fused multiply-add

(1)

3.4.3 Extended precision and 128-bit formats

(1)

3.4.4 Rounding and precision control

(1)

3.4.5 SIMD instructions

(1)

3.4.6 Binaryl6 (half-precision) support

(1)

3.4.7 Decimal arithmetic

(1)

3.4.8 The legacy x87 processor

(1)

3.5 Floating-Point Hardware in Recent Graphics Processing Units

(1)

3.6 IEEE Support in Programming Languages

(1)

3.7 Checking the Environment

(4)

3.7.1 MACHAR

(1)

3.7.2 Paranoia

(1)

3.7.3 UCBTest

(1)

3.7.4 TestFloat

(1)

3.7.5 Miscellaneous

(2)

Part II Cleverly Using Floating-Point Arithmetic

(136)

4 Basic Properties and Algorithms

(66)

4.1 Testing the Computational Environment

(3)

4.1.1 Computing the radix

(2)

4.1.2 Computing the precision

(1)

4.2 Exact Operations

100

(3)

4.2.1 Exact addition

100

(3)

4.2.2 Exact multiplications and divisions

103

(1)

4.3 Accurate Computation of the Sum of Two Numbers

103

(8)

4.3.1 The Fast2Sum algorithm

104

(3)

4.3.2 The 2Sum algorithm

107

(2)

4.3.3 If we do not use rounding to nearest

109

(2)

4.4 Accurate Computation of the Product of Two Numbers

111

(9)

4.4.1 The 2MultFMA Algorithm

112

(1)

4.4.2 If no FMA instruction is available: Veltkamp splitting and Dekker product

113

(7)

4.5 Computation of Residuals of Division and Square Root with an FMA

120

(3)

4.6 Another splitting technique: splitting around a power of 2

123

(1)

4.7 Newton-Raphson-Based Division with an FMA

124

(14)

4.7.1 Variants of the Newton-Raphson iteration

124

(5)

4.7.2 Using the Newton-Raphson iteration for correctly rounded division with an FMA

129

(7)

4.7.3 Possible double roundings in division algorithms

136

(2)

4.8 Newton-Raphson-Based Square Root with an FMA

138

(4)

4.8.1 The basic iterations

138

(1)

4.8.2 Using the Newton-Raphson iteration for correctly rounded square roots

138

(4)

4.9 Radix Conversion

142

(11)

4.9.1 Conditions on the formats

142

(4)

4.9.2 Conversion algorithms

146

(7)

4.10 Conversion Between Integers and Floating-Point Numbers

153

(3)

4.10.1 From 32-bit integers to floating-point numbers

153

(1)

4.10.2 From 64-bit integers to floating-point numbers

154

(1)

4.10.3 From floating-point numbers to integers

155

(1)

4.11 Multiplication by an Arbitrary-Precision Constant with an FMA

156

(4)

4.12 Evaluation of the Error of an FMA

160

(3)

5 Enhanced Floating-Point Sums, Dot Products, and Polynomial Values

163

(30)

5.1 Preliminaries

164

(11)

5.1.1 Floating-point arithmetic models

165

(1)

5.1.2 Notation for error analysis and classical error estimates

166

(3)

5.1.3 Some refined error estimates

169

(5)

5.1.4 Properties for deriving validated running error bounds

174

(1)

5.2 Computing Validated Running Error Bounds

175

(2)

5.3 Computing Sums More Accurately

177

(12)

5.3.1 Reordering the operands, and a bit more

177

(1)

5.3.2 Compensated sums

178

(6)

5.3.3 Summation algorithms that somehow imitate a fixed-point arithmetic

184

(3)

5.3.4 On the sum of three floating-point numbers

187

(2)

5.4 Compensated Dot Products

189

(1)

5.5 Compensated Polynomial Evaluation

190

(3)

6 Languages and Compilers

193

(38)

6.1 A Play with Many Actors

193

(7)

6.1.1 Floating-point evaluation in programming languages

194

(2)

6.1.2 Processors, compilers, and operating systems

196

(1)

6.1.3 Standardization processes

197

(2)

6.1.4 In the hands of the programmer

199

(1)

6.2 Floating Point in the C Language

200

(15)

6.2.1 Standard C11 headers and IEEE 754-1985 support

201

(1)

6.2.2 Types

202

(2)

6.2.3 Expression evaluation

204

(5)

6.2.4 Code transformations

209

(1)

6.2.5 Enabling unsafe optimizations

210

(1)

6.2.6 Summary: a few horror stories

211

(3)

6.2.7 The CompCert C compiler

214

(1)

6.3 Floating-Point Arithmetic in the C++ Language

215

(3)

6.3.1 Semantics

215

(1)

6.3.2 Numeric limits

215

(2)

6.3.3 Overloaded functions

217

(1)

6.4 FORTRAN Floating Point in a Nutshell

218

(4)

6.4.1 Philosophy

218

(2)

6.4.2 IEEE 754 support in FORTRAN

220

(2)

6.5 Java Floating Point in a Nutshell

222

(7)

6.5.1 Philosophy

222

(1)

6.5.2 Types and classes

222

(2)

6.5.3 Infinities, NaNs, and signed zeros

224

(1)

6.5.4 Missing features

225

(1)

6.5.5 Reproducibility

226

(1)

6.5.6 The BigDecimal package

227

(2)

6.6 Conclusion

229

(2)

Part III Implementing Floating-Point Operators

231

(204)

7 Algorithms for the Basic Operations

233

(34)

7.1 Overview of Basic Operation Implementation

233

(2)

7.2 Implementing IEEE 754-2008 Rounding

235

(5)

7.2.1 Rounding a nonzero finite value with unbounded exponent range

235

(3)

7.2.2 Overflow

238

(1)

7.2.3 Underflow and subnormal results

239

(1)

7.2.4 The inexact exception

239

(1)

7.2.5 Rounding for actual operations

240

(1)

7.3 Floating-Point Addition and Subtraction

240

(5)

7.3.1 Decimal addition

244

(1)

7.3.2 Decimal addition using binary encoding

245

(1)

7.3.3 Subnormal inputs and outputs in binary addition

245

(1)

7.4 Floating-Point Multiplication

245

(3)

7.4.1 Normal case

246

(1)

7.4.2 Handling subnormal numbers in binary multiplication

247

(1)

7.4.3 Decimal specifics

248

(1)

7.5 Floating-Point Fused Multiply-Add

248

(8)

7.5.1 Case analysis for normal inputs

249

(3)

7.5.2 Handling subnormal inputs

252

(1)

7.5.3 Handling decimal cohorts

253

(1)

7.5.4 Overview of a binary FMA implementation

254

(2)

7.6 Floating-Point Division

256

(3)

7.6.1 Overview and special cases

256

(1)

7.6.2 Computing the significand quotient

257

(1)

7.6.3 Managing subnormal numbers

258

(1)

7.6.4 The inexact exception

259

(1)

7.6.5 Decimal specifics

259

(1)

7.7 Floating-Point Square Root

259

(2)

7.7.1 Overview and special cases

259

(1)

7.7.2 Computing the significand square root

260

(1)

7.7.3 Managing subnormal numbers

260

(1)

7.7.4 The inexact exception

261

(1)

7.7.5 Decimal specifics

261

(1)

7.8 Nonhomogeneous Operators

261

(6)

7.8.1 A software algorithm around double rounding

262

(2)

7.8.2 The mixed-precision fused multiply-and-add

264

(1)

7.8.3 Motivation

265

(1)

7.8.4 Implementation issues

265

(2)

8 Hardware Implementation of Floating-Point Arithmetic

267

(54)

8.1 Introduction and Context

267

(5)

8.1.1 Processor internal formats

267

(1)

8.1.2 Hardware handling of subnormal numbers

268

(1)

8.1.3 Full-custom VLSI versus reconfigurable circuits (FPGAs)

269

(1)

8.1.4 Hardware decimal arithmetic

270

(1)

8.1.5 Pipelining

271

(1)

8.2 The Primitives and Their Cost

272

(15)

8.2.1 Integer adders

272

(6)

8.2.2 Digit-by-integer multiplication in hardware

278

(1)

8.2.3 Using nonstandard representations of numbers

278

(2)

8.2.4 Binary integer multiplication

280

(1)

8.2.5 Decimal integer multiplication

280

(2)

8.2.6 Shifters

282

(1)

8.2.7 Leading-zero counters

283

(2)

8.2.8 Tables and table-based methods for fixed-point function approximation

285

(2)

8.3 Binary Floating-Point Addition

287

(8)

8.3.1 Overview

287

(1)

8.3.2 A first dual-path architecture

288

(2)

8.3.3 Leading-zero anticipation

290

(4)

8.3.4 Probing further on floating-point adders

294

(1)

8.4 Binary Floating-Point Multiplication

295

(6)

8.4.1 Basic architecture

295

(1)

8.4.2 FPGA implementation

296

(1)

8.4.3 VLSI implementation optimized for delay

297

(3)

8.4.4 Managing subnormals

300

(1)

8.5 Binary Fused Multiply-Add

301

(3)

8.5.1 Classic architecture

301

(1)

8.5.2 To probe further

302

(2)

8.6 Division and Square Root

304

(5)

8.6.1 Digit-recurrence division

305

(3)

8.6.2 Decimal division

308

(1)

8.7 Beyond the Classical Floating-Point Unit

309

(3)

8.7.1 More fused operators

309

(1)

8.7.2 Exact accumulation and dot product

309

(2)

8.7.3 Hardware-accelerated compensated algorithms

311

(1)

8.8 Floating-Point for FPGAs

312

(8)

8.8.1 Optimization in context of standard operators

312

(2)

8.8.2 Operations with a constant operand

314

(1)

8.8.3 Computing large floating-point sums

315

(4)

8.8.4 Block floating point

319

(1)

8.8.5 Algebraic operators

319

(1)

8.8.6 Elementary and compound functions

320

(1)

8.9 Probing Further

320

(1)

9 Software Implementation of Floating-Point Arithmetic

321

(54)

9.1 Implementation Context

322

(7)

9.1.1 Standard encoding of binary floating-point data

322

(1)

9.1.2 Available integer operators

323

(3)

9.1.3 First examples

326

(2)

9.1.4 Design choices and optimizations

328

(1)

9.2 Binary Floating-Point Addition

329

(12)

9.2.1 Handling special values

330

(2)

9.2.2 Computing the sign of the result

332

(1)

9.2.3 Swapping the operands and computing the alignment shift

333

(2)

9.2.4 Getting the correctly rounded result

335

(6)

9.3 Binary Floating-Point Multiplication

341

(8)

9.3.1 Handling special values

341

(2)

9.3.2 Sign and exponent computation

343

(2)

9.3.3 Overflow detection

345

(1)

9.3.4 Getting the correctly rounded result

346

(3)

9.4 Binary Floating-Point Division

349

(13)

9.4.1 Handling special values

350

(1)

9.4.2 Sign and exponent computation

351

(3)

9.4.3 Overflow detection

354

(1)

9.4.4 Getting the correctly rounded result

355

(7)

9.5 Binary Floating-Point Square Root

362

(10)

9.5.1 Handling special values

362

(1)

9.5.2 Exponent computation

363

(2)

9.5.3 Getting the correctly rounded result

365

(7)

9.6 Custom Operators

372

(3)

10 Evaluating Floating-Point Elementary Functions

375

(60)

10.1 Introduction

375

(4)

10.1.1 Which accuracy?

375

(1)

10.1.2 The various steps of function evaluation

376

(3)

10.2 Range Reduction

379

(10)

10.2.1 Basic range reduction algorithms

379

(3)

10.2.2 Bounding the relative error of range reduction

382

(1)

10.2.3 More sophisticated range reduction algorithms

383

(3)

10.2.4 Examples

386

(3)

10.3 Polynomial Approximations

389

(7)

10.3.1 L2 case

390

(1)

10.3.2 L∞, or minimax, case

391

(3)

10.3.3 "Truncated" approximations

394

(1)

10.3.4 In practice: using the Sollya tool to compute constrained approximations and certified error bounds

394

(2)

10.4 Evaluating Polynomials

396

(1)

10.4.1 Evaluation strategies

396

(1)

10.4.2 Evaluation error

397

(1)

10.5 The Table Maker's Dilemma

397

(30)

10.5.1 When there is no need to solve the TMD

400

(1)

10.5.2 On breakpoints

400

(4)

10.5.3 Finding the hardest-to-round points

404

(23)

10.6 Some Implementation Tricks Used in the CRlibm Library

427

(8)

10.6.1 Rounding test

428

(1)

10.6.2 Accurate second step

429

(1)

10.6.3 Error analysis and the accuracy/performance tradeoff

429

(1)

10.6.4 The point with efficient code

430

(5)

Part IV Extensions

435

(118)

11 Complex Numbers

437

(16)

11.1 Introduction

437

(2)

11.2 Componentwise and Normwise Errors

439

(1)

11.3 Computing ad ± be with an FMA

440

(2)

11.4 Complex Multiplication

442

(1)

11.4.1 Complex multiplication without an FMA instruction

442

(1)

11.4.2 Complex multiplication with an FMA instruction

442

(1)

11.5 Complex Division

443

(4)

11.5.1 Error bounds for complex division

443

(1)

11.5.2 Scaling methods for avoiding over-/underflow in complex division

444

(3)

11.6 Complex Absolute Value

447

(2)

11.6.1 Error bounds for complex absolute value

447

(1)

11.6.2 Scaling for the computation of complex absolute value

447

(2)

11.7 Complex Square Root

449

(2)

11.7.1 Error bounds for complex square root

449

(1)

11.7.2 Scaling techniques for complex square root

450

(1)

11.8 An Alternative Solution: Exception Handling

451

(2)

12 Interval Arithmetic

453

(26)

12.1 Introduction to Interval Arithmetic

454

(3)

12.1.1 Definitions and the inclusion property

454

(2)

12.1.2 Loss of algebraic properties

456

(1)

12.2 The IEEE 1788-2015 Standard for Interval Arithmetic

457

(8)

12.2.1 Structuration into levels

457

(1)

12.2.2 Flavors

458

(2)

12.2.3 Decorations

460

(3)

12.2.4 Level 2: discretization issues

463

(1)

12.2.5 Exact dot product

464

(1)

12.2.6 Levels 3 and 4: implementation issues

464

(1)

12.2.7 Libraries implementing IEEE 1788-2015

464

(1)

12.3 Intervals with Floating-Point Bounds

465

(3)

12.3.1 Implementation using floating-point arithmetic

465

(1)

12.3.2 Difficulties

465

(2)

12.3.3 Optimized rounding

467

(1)

12.4 Interval Arithmetic and Roundoff Error Analysis

468

(8)

12.4.1 Influence of the computing precision

468

(3)

12.4.2 A more efficient approach: the mid-rad representation

471

(4)

12.4.3 Variants: affine arithmetic, Taylor models

475

(1)

12.5 Further Readings

476

(3)

13 Verifying Floating-Point Algorithms

479

(34)

13.1 Formalizing Floating-Point Arithmetic

479

(8)

13.1.1 Defining floating-point numbers

480

(2)

13.1.2 Simplifying the definition

482

(1)

13.1.3 Defining rounding operators

483

(3)

13.1.4 Extending the set of numbers

486

(1)

13.2 Formalisms for Verifying Algorithms

487

(6)

13.2.1 Hardware units

487

(2)

13.2.2 Floating-point algorithms

489

(1)

13.2.3 Automating proofs

490

(3)

13.3 Roundoff Errors and the Gappa Tool

493

(20)

13.3.1 Computing on bounds

494

(2)

13.3.2 Counting digits

496

(2)

13.3.3 Manipulating expressions

498

(4)

13.3.4 Handling the relative error

502

(1)

13.3.5 Example: toy implementation of sine

503

(5)

13.3.6 Example: integer division on Itanium

508

(5)

14 Extending the Precision

513

(40)

14.1 Double-Words, Triple-Words

514

(8)

14.1.1 Double-word arithmetic

515

(5)

14.1.2 Static triple-word arithmetic

520

(2)

14.2 Floating-Point Expansions

522

(12)

14.2.1 Renormalization of floating-point expansions

525

(1)

14.2.2 Addition of floating-point expansions

526

(2)

14.2.3 Multiplication of floating-point expansions

528

(3)

14.2.4 Division of floating-point expansions

531

(3)

14.3 Floating-Point Numbers with Batched Additional Exponent

534

(1)

14.4 Large Precision Based on a High-Radix Representation

535

(18)

14.4.1 Specifications

536

(1)

14.4.2 Using arbitrary-precision integer arithmetic for arbitrary-precision floating-point arithmetic

537

(1)

14.4.3 A brief introduction to arbitrary-precision integer arithmetic

538

(3)

14.4.4 GNUMPFR

541

(12)

Appendix A Number Theory Tools for Floating-Point Arithmetic

553

(8)

A.1 Continued Fractions

553

(3)

A.2 Euclidean Lattices

556

(5)

Appendix B Previous Floating-Point Standards

561

(8)

B.1 The IEEE 754-1985 Standard

561

(3)

B.1.1 Formats specified by IEEE 754-1985

561

(1)

B.1.2 Rounding modes specified by IEEE 754-1985

562

(1)

B.1.3 Operations specified by IEEE 754-1985

562

(1)

B.1.4 Exceptions specified by IEEE 754-1985

563

(1)

B.2 The IEEE 854-1987 Standard

564

(2)

B.2.1 Constraints internal to a format

564

(1)

B.2.2 Various formats and the constraints between them

565

(1)

B.2.3 Rounding

566

(1)

B.2.4 Operations

566

(1)

B.2.5 Comparisons

566

(1)

B.2.6 Exceptions

566

(1)

B.3 The Need for a Revision

566

(1)

B.4 The IEEE 754-2008 Revision

567

(2)

Bibliography

569

(52)

Index

621

Jean-Michel Muller (coordinator), CNRS, Laboratoire LIP, AriC teamNicolas Brunie, KalrayFlorent de Dinechin, INSA Lyon, Laboratoire CITI, Socrate teamClaude-Pierre Jeannerod, Inria, Laboratoire LIP, AriC teamMioara Joldes, CNRS, LAAS, MAC teamVincent Lefčvre, Inria, Laboratoire LIP, AriC teamGuillaume Melquiond, Inria, Laboratoire LRI, Toccata teamNathalie Revol, Inria, Laboratoire LIP, AriC teamSerge Torres, ENS de Lyon, Laboratoire LIP, AriC team

Daugiau apie elektronines knygas

Pastovi nuoroda: https://www.kriso.lt/db/97833197652662e.html

El. knyga: Handbook of Floating-Point Arithmetic

DRM apribojimai

Kopijuoti:

Spausdinti:

El. knygos naudojimas:

Recenzijos

Paskyra ir nustatymai

Paieška

Ieškoti duomenų bazėje

Patikslinti paiešką

Temos E-knygų temos

Pasirinkti pirkinių krepšelį