Atnaujinkite slapukų nuostatas

El. knyga: Handbook of Floating-Point Arithmetic

4.00/5 (12 ratings by Goodreads)
  • Formatas: PDF+DRM
  • Išleidimo metai: 11-Nov-2009
  • Leidėjas: Birkhauser Boston Inc
  • Kalba: eng
  • ISBN-13: 9780817647056
Kitos knygos pagal šią temą:
  • Formatas: PDF+DRM
  • Išleidimo metai: 11-Nov-2009
  • Leidėjas: Birkhauser Boston Inc
  • Kalba: eng
  • ISBN-13: 9780817647056
Kitos knygos pagal šią temą:

DRM apribojimai

  • Kopijuoti:

    neleidžiama

  • Spausdinti:

    neleidžiama

  • El. knygos naudojimas:

    Skaitmeninių teisių valdymas (DRM)
    Leidykla pateikė šią knygą šifruota forma, o tai reiškia, kad norint ją atrakinti ir perskaityti reikia įdiegti nemokamą programinę įrangą. Norint skaityti šią el. knygą, turite susikurti Adobe ID . Daugiau informacijos  čia. El. knygą galima atsisiųsti į 6 įrenginius (vienas vartotojas su tuo pačiu Adobe ID).

    Reikalinga programinė įranga
    Norint skaityti šią el. knygą mobiliajame įrenginyje (telefone ar planšetiniame kompiuteryje), turite įdiegti šią nemokamą programėlę: PocketBook Reader (iOS / Android)

    Norint skaityti šią el. knygą asmeniniame arba „Mac“ kompiuteryje, Jums reikalinga  Adobe Digital Editions “ (tai nemokama programa, specialiai sukurta el. knygoms. Tai nėra tas pats, kas „Adobe Reader“, kurią tikriausiai jau turite savo kompiuteryje.)

    Negalite skaityti šios el. knygos naudodami „Amazon Kindle“.

This handbook aims to provide a complete overview of modern floating-point arithmetic, including a detailed treatment of the newly revised (IEEE 754-2008) standard for floating-point arithmetic. Presented throughout are algorithms for implementing floating-point arithmetic as well as algorithms that use floating-point arithmetic. So that the techniques presented can directly be put into practice in actual coding or design, they are illustrated, whenever possible, by a corresponding program.
Handbook of Floating-Point Arithmetic is designed for programmers of numerical applications, compiler designers, programmers of floating-point algorithms, designers of arithmetic operators, and more generally, students and researchers in numerical analysis who wish to better understand a tool used in their daily work and research.

Floating-point arithmetic is by far the most widely used way of implementing real-number arithmetic on modern computers. This handbook aims to provide an overview of modern floating-point arithmetic. It is suitable for programmers of numerical applications, compiler designers, and programmers of floating-point algorithms.

Floating-point arithmetic is the most widely used way of implementing real-number arithmetic on modern computers. However, making such an arithmetic reliable and portable, yet fast, is a very difficult task. As a result, floating-point arithmetic is far from being exploited to its full potential. This handbook aims to provide a complete overview of modern floating-point arithmetic. So that the techniques presented can be put directly into practice in actual coding or design, they are illustrated, whenever possible, by a corresponding program.The handbook is designed for programmers of numerical applications, compiler designers, programmers of floating-point algorithms, designers of arithmetic operators, and more generally, students and researchers in numerical analysis who wish to better understand a tool used in their daily work and research.

This handbook aims to provide a complete overview of modern floating-point arithmetic. This includes a detailed treatment of the current (IEEE-754) and next (preliminarily called IEEE-754R) standards for floating-point arithmetic.

Recenzijos

From the reviews:

This handbook aims to provide a complete overview of modern floating-point arithmetic, including a detailed treatment of the newly revised IEEE 751-2008 standard for floating-point arithmetic. This book is useful to programmers, compiler designers and students and researchers in numerical analysis.--- (T. C. Mohan, Zentralblatt MATH, Vol. 1197, 2010)

Preface xv
List of Figures
xvii
List of Tables
xxi
I Introduction, Basic Definitions, and Standards
1(116)
Introduction
3(10)
Some History
3(3)
Desirable Properties
6(1)
Some Strange Behaviors
7(6)
Some famous bugs
7(1)
Difficult problems
8(5)
Definitions and Basic Notions
13(42)
Floating-Point Numbers
13(7)
Rounding
20(5)
Rounding modes
20(2)
Useful properties
22(1)
Relative error due to rounding
23(2)
Exceptions
25(2)
Lost or Preserved Properties of the Arithmetic on the Real Numbers
27(2)
Note on the Choice of the Radix
29(3)
Representation errors
29(1)
A case for radix 10
30(2)
Tools for Manipulating Floating-Point Errors
32(8)
The ulp function
32(5)
Errors in ulps and relative errors
37(1)
An example: iterated products
37(2)
Unit roundoff
39(1)
Note on Radix Conversion
40(11)
Conditions on the formats
40(3)
Conversion algorithms
43(8)
The Fused Multiply-Add (FMA) Instruction
51(1)
Interval Arithmetic
51(4)
Intervals with floating-point bounds
52(1)
Optimized rounding
52(3)
Floating-Point Formats and Environment
55(62)
The IEEE 754-1985 Standard
56(14)
Formats specified by IEEE 754-1985
56(4)
Little-endian, big-endian
60(1)
Rounding modes specified by IEEE 754-1985
61(1)
Operations specified by IEEE 754-1985
62(4)
Exceptions specified by IEEE 754-1985
66(3)
Special values
69(1)
The IEEE 854-1987 Standard
70(4)
Constraints internal to a format
70(1)
Various formats and the constraints between them
71(1)
Conversions between floating-point numbers and decimal strings
72(1)
Rounding
73(1)
Operations
73(1)
Comparisons
74(1)
Exceptions
74(1)
The Need for a Revision
74(5)
A typical problem: ``double rounding''
75(2)
Various ambiguities
77(2)
The New IEEE 754-2008 Standard
79(25)
Formats specified by the revised standard
80(1)
Binary interchange format encodings
81(1)
Decimal interchange format encodings
82(10)
Larger formats
92(1)
Extended and extendable precisions
92(1)
Attributes
93(4)
Operations specified by the standard
97(2)
Comparisons
99(1)
Conversions
99(1)
Default exception handling
100(3)
Recommended transcendental functions
103(1)
Floating-Point Hardware in Current Processors
104(4)
The common hardware denominator
104(1)
Fused multiply-add
104(1)
Extended precision
104(1)
Rounding and precision control
105(1)
SIMD instructions
106(1)
Floating-point on x86 processors: SSE2 versus x87
106(1)
Decimal arithmetic
107(1)
Floating-Point Hardware in Recent Graphics Processing Units
108(1)
Relations with Programming Languages
109(1)
The Language Independent Arithmetic (LIA) standard
109(1)
Programming languages
110(1)
Checking the Environment
110(7)
MACHAR
111(1)
Paranoia
111(4)
UCBTest
115(1)
TestFloat
116(1)
IeeeCC754
116(1)
Miscellaneous
116(1)
II Cleverly Using Floating-Point Arithmetic
117(120)
Basic Properties and Algorithms
119(32)
Testing the Computational Environment
119(3)
Computing the radix
119(2)
Computing the precision
121(1)
Exact Operations
122(3)
Exact addition
122(2)
Exact multiplications and divisions
124(1)
Accurate Computations of Sums of Two Numbers
125(7)
The Fast2Sum algorithm
126(3)
The 2Sum algorithm
129(2)
If we do not use rounding to nearest
131(1)
Computation of Products
132(7)
Veltkamp splitting
132(3)
Dekker's multiplication algorithm
135(4)
Complex numbers
139(12)
Various error bounds
140(1)
Error bound for complex multiplication
141(3)
Complex division
144(5)
Complex square root
149(2)
The Fused Multiply-Add Instruction
151(30)
The 2MultFMA Algorithm
152(1)
Computation of Residuals of Division and Square Root
153(2)
Newton-Raphson-Based Division with an FMA
155(12)
Variants of the Newton--Raphson iteration
155(5)
Using the Newton--Raphson iteration for correctly rounded division
160(7)
Newton--Raphson-Based Square Root with an FMA
167(4)
The basic iterations
167(1)
Using the Newton--Raphson iteration for correctly rounded square roots
168(3)
Multiplication by an Arbitrary-Precision Constant
171(4)
Checking for a given constant C if Algorithm 5.2 will always work
172(3)
Evaluation of the Error of an FMA
175(2)
Evaluation of Integer Powers
177(4)
Enhanced Floating-Point Sums, Dot Products, and Polynomial Values
181(24)
Preliminaries
182(6)
Floating-point arithmetic models
183(1)
Notation for error analysis and classical error estimates
184(3)
Properties for deriving running error bounds
187(1)
Computing Validated Running Error Bounds
188(2)
Computing Sums More Accurately
190(11)
Reordering the operands, and a bit more
190(2)
Compensated sums
192(7)
Implementing a ``long accumulator''
199(1)
On the sum of three floating-point numbers
199(2)
Compensated Dot Products
201(2)
Compensated Polynomial Evaluation
203(2)
Languages and Compilers
205(32)
A Play with Many Actors
205(4)
Floating-point evaluation in programming languages
206(2)
Processors, compilers, and operating systems
208(1)
In the hands of the programmer
209(1)
Floating Point in the C Language
209(11)
Standard C99 headers and IEEE 754--1985 support
209(1)
Types
210(3)
Expression evaluation
213(3)
Code transformations
216(1)
Enabling unsafe optimizations
217(1)
Summary: a few horror stories
218(2)
Floating-Point Arithmetic in the C++ Language
220(3)
Semantics
220(1)
Numeric limits
221(1)
Overloaded functions
222(1)
FORTRAN Floating Point in a Nutshell
223(4)
Philosophy
223(3)
IEEE 754 support in FORTRAN
226(1)
Java Floating Point in a Nutshell
227(7)
Philosophy
227(1)
Types and classes
228(2)
Infinities, NaNs, and signed zeros
230(1)
Missing features
231(1)
Reproducibility
232(1)
The BigDecimal package
233(1)
Conclusion
234(3)
III Implementing Floating-Point Operators
237(136)
Algorithms for the Five Basic Operations
239(30)
Overview of Basic Operation Implementation
239(2)
Implementing IEEE 754-2008 Rounding
241(5)
Rounding a nonzero finite value with unbounded exponent range
241(2)
Overflow
243(1)
Underflow and subnormal results
244(1)
The inexact exception
245(1)
Rounding for actual operations
245(1)
Floating-Point Addition and Subtraction
246(5)
Decimal addition
249(1)
Decimal addition using binary encoding
250(1)
Subnormal inputs and outputs in binary addition
251(1)
Floating-Point Multiplication
251(3)
Normal case
252(1)
Handling subnormal numbers in binary multiplication
252(1)
Decimal specifics
253(1)
Floating-Point Fused Multiply-Add
254(8)
Case analysis for normal inputs
254(4)
Handling subnormal inputs
258(1)
Handling decimal cohorts
259(1)
Overview of a binary FMA implementation
259(3)
Floating-Point Division
262(3)
Overview and special cases
262(1)
Computing the significand quotient
263(1)
Managing subnormal numbers
264(1)
The inexact exception
265(1)
Decimal specifics
265(1)
Floating-Point Square Root
265(4)
Overview and special cases
265(1)
Computing the significand square root
266(1)
Managing subnormal numbers
267(1)
The inexact exception
267(1)
Decimal specifics
267(2)
Hardware Implementation of Floating-Point Arithmetic
269(52)
Introduction and Context
269(5)
Processor internal formats
269(1)
Hardware handling of subnormal numbers
270(1)
Full-custom VLSI versus reconfigurable circuits
271(1)
Hardware decimal arithmetic
272(1)
Pipelining
273(1)
The Primitives and Their Cost
274(14)
Integer adders
274(6)
Digit-by-integer multiplication in hardware
280(1)
Using nonstandard representations of numbers
280(1)
Binary integer multiplication
281(2)
Decimal integer multiplication
283(1)
Shifters
284(1)
Leading-zero counters
284(2)
Tables and table-based methods for fixed-point function approximation
286(2)
Binary Floating-Point Addition
288(8)
Overview
288(1)
A first dual-path architecture
289(2)
Leading-zero anticipation
291(4)
Probing further on floating-point adders
295(1)
Binary Floating-Point Multiplication
296(6)
Basic architecture
296(1)
FPGA implementation
296(2)
VLSI implementation optimized for delay
298(3)
Managing subnormals
301(1)
Binary Fused Multiply-Add
302(3)
Classic architecture
303(2)
To probe further
305(1)
Division
305(4)
Digit-recurrence division
306(3)
Decimal division
309(1)
Conclusion: Beyond the FPU
309(11)
Optimization in context of standard operators
310(1)
Operation with a constant operand
311(2)
Block floating point
313(1)
Specific architectures for accumulation
313(4)
Coarser-grain operators
317(3)
Probing Further
320(1)
Software Implementation of Floating-Point Arithmetic
321(52)
Implementation Context
322(7)
Standard encoding of binary floating-point data
322(1)
Available integer operators
323(3)
First examples
326(2)
Design choices and optimizations
328(1)
Binary Floating-Point Addition
329(12)
Handling special values
330(2)
Computing the sign of the result
332(1)
Swapping the operands and computing the alignment shift
333(2)
Getting the correctly rounded result
335(6)
Binary Floating-Point Multiplication
341(8)
Handling special values
341(2)
Sign and exponent computation
343(2)
Overflow detection
345(1)
Getting the correctly rounded result
346(3)
Binary Floating-Point Division
349(12)
Handling special values
350(1)
Sign and exponent computation
351(3)
Overflow detection
354(1)
Getting the correctly rounded result
355(6)
Binary Floating-Point Square Root
361(12)
Handling special values
362(2)
Exponent computation
364(1)
Getting the correctly rounded result
365(8)
IV Elementary Functions
373(85)
Evaluating Floating-Point Elementary Functions
375(30)
Basic Range Reduction Algorithms
379(3)
Cody and Waite's reduction algorithm
379(2)
Payne and Hanek's algorithm
381(1)
Bounding the Relative Error of Range Reduction
382(2)
More Sophisticated Range Reduction Algorithms
384(4)
An example of range reduction for the exponential function
386(1)
An example of range reduction for the logarithm
387(1)
Polynomial or Rational Approximations
388(5)
L2 case
389(1)
L∞, or minimax case
390(2)
``Truncated'' approximations
392(1)
Evaluating Polynomials
393(1)
Correct Rounding of Elementary Functions to binary64
394(8)
The Table Maker's Dilemma and Ziv's onion peeling strategy
394(1)
When the TMD is solved
395(1)
Rounding test
396(4)
Accurate second step
400(1)
Error analysis and the accuracy/performance tradeoff
401(1)
Computing Error Bounds
402(3)
The point with efficient code
402(1)
Example: a ``double-double'' polynomial evaluation
403(2)
Solving the Table Maker's Dilemma
405(53)
Introduction
405(7)
The Table Maker's Dilemma
406(4)
Brief history of the TMD
410(1)
Organization of the chapter
411(1)
Preliminary Remarks on the Table Maker's Dilemma
412(8)
Statistical arguments: what can be expected in practice
412(4)
In some domains, there is no need to find worst cases
416(3)
Deducing the worst cases from other functions or domains
419(1)
The Table Maker's Dilemma for Algebraic Functions
420(5)
Algebraic and transcendental numbers and functions
420(2)
The elementary case of quotients
422(2)
Around Liouville's theorem
424(1)
Generating bad rounding cases for the square root using Hensel 2-adic lifting
425(23)
Solving the Table Maker's Dilemma for Arbitrary Functions
429(1)
Lindemann's theorem: application to some transcendental functions
429(1)
A theorem of Nesterenko and Waldschmidt
430(2)
A first method: tabulated differences
432(2)
From the TMD to the distance between a grid and a segment
434(2)
Linear approximation: Lefevre's algorithm
436(7)
The SLZ algorithm
443(5)
Periodic functions on large arguments
448(10)
Some Results
449(1)
Worst cases for the exponential, logarithmic, trigonometric, and hyperbolic functions
449(9)
A special case: integer powers
458(1)
V Current Limits and Perspectives
458(59)
Extensions
461(27)
Formalisms for Certifying Floating-Point Algorithms
463(7)
Formalizing Floating-Point Arithmetic
463(1)
Defining floating-point numbers
464(2)
Simplifying the definition
466(1)
Defining rounding operators
467(3)
Extending the set of numbers
470(3)
Formalisms for Certifying Algorithms by Hand
471(1)
Hardware units
471(1)
Low-level algorithms
472(1)
Advanced algorithms
473(10)
Automating Proofs
474(1)
Computing on bounds
475(2)
Counting digits
477(2)
Manibulating expressions
479(4)
Handling the relative error
483(5)
Using Gappa
484(1)
Toy implementation of sine
484(4)
Integer division on Itanium
488(29)
Extending the Precision
493(7)
Double-Words, Triple-Words
494(1)
Double-word arithmetic
495(3)
Static triple-word arithmetic
498(2)
Quad-word arithmetic
500(3)
Floating-Point Expansions
503(6)
Floating-Point Numbers with Batched Additional Exponent
509(8)
Large Precision Relying on Processor Integers
510(2)
Using arbitrary-precision integer arithmetic for arbitrary-precision floating-point arithmetic
512(1)
A brief introduction to arbitrary-precision integer arithmetic
513(4)
VI Perspectives and Appendix
517(12)
Conclusion and Perspectives
519(2)
Appendix: Number Theory Tools for Floating-Point Arithmetic
521(8)
Continued Fractions
521(3)
The LLL Algorithm
524(5)
Bibliography 529(38)
Index 567