Atnaujinkite slapukų nuostatas

Foundations of Data Science [Kietas viršelis]

4.24/5 (29 ratings by Goodreads)
, (Cornell University, New York),
  • Formatas: Hardback, 432 pages, aukštis x plotis x storis: 259x182x27 mm, weight: 930 g, Worked examples or Exercises
  • Išleidimo metai: 23-Jan-2020
  • Leidėjas: Cambridge University Press
  • ISBN-10: 1108485065
  • ISBN-13: 9781108485067
Kitos knygos pagal šią temą:
  • Formatas: Hardback, 432 pages, aukštis x plotis x storis: 259x182x27 mm, weight: 930 g, Worked examples or Exercises
  • Išleidimo metai: 23-Jan-2020
  • Leidėjas: Cambridge University Press
  • ISBN-10: 1108485065
  • ISBN-13: 9781108485067
Kitos knygos pagal šią temą:
"This book provides an introduction to the mathematical and algorithmic foundations of data science, including machine learning, high-dimensional geometry, and analysis of large networks. Topics include the counterintuitive nature of data in high dimensions, important linear algebraic techniques such as singular value decomposition, the theory of random walks and Markov chains, the fundamentals of and important algorithms for machine learning, algorithms and analysis for clustering, probabilistic models for large networks, representation learning including topic modelling and non-negative matrix factorization, wavelets and compressed sensing. Important probabilistic techniques are developed including the law of large numbers, tail inequalities, analysis of random projections, generalization guarantees in machine learning, and moment methods for analysis of phase transitions in large random graphs. Additionally, important structural and complexity measures are discussed such as matrix norms and VC-dimension. This book is suitable for both undergraduate and graduate courses in the design and analysis of algorithms for data"--

Recenzijos

'This beautifully written text is a scholarly journey through the mathematical and algorithmic foundations of data science. Rigorous but accessible, and with many exercises, it will be a valuable resource for advanced undergraduate and graduate classes.' Peter Bartlett, University of California, Berkeley 'The rise of the Internet, digital media, and social networks has brought us to the world of data, with vast sources from every corner of society. Data Science - aiming to understand and discover the essences that underlie the complex, multifaceted, and high-dimensional data - has truly become a 'universal discipline', with its multidisciplinary roots, interdisciplinary presence, and societal relevance. This timely and comprehensive book presents - by bringing together from diverse fields of computing - a full spectrum of mathematical, statistical, and algorithmic materials fundamental to data analysis, machine learning, and network modeling. Foundations of Data Science offers an effective roadmap to approach this fascinating discipline and engages more advanced readers with rigorous mathematical/algorithmic theory.' Shang-Hua Teng, University of Southern California 'A lucid account of mathematical ideas that underlie today's data analysis and machine learning methods. I learnt a lot from it, and I am sure it will become an invaluable reference for many students, researchers and faculty around the world.' Sanjeev Arora, Princeton University, New Jersey 'It provides a very broad overview of the foundations of data science that should be accessible to well-prepared students with backgrounds in computer science, linear algebra, and probability theory These are all important topics in the theory of machine learning and it is refreshing to see them introduced together in a textbook at this level.' Brian Borchers, MAA Reviews 'One plausible measure of [ Foundations of Data Science's] impact is the book's own citation metrics. Semantic Scholar (https://www.semanticscholar.org) reports 81 citations with 42 citations related to background or methods; [ Foundations of Data Science] appears to be on course to becoming influential.' M. Mounts, Choice

Daugiau informacijos

Covers mathematical and algorithmic foundations of data science: machine learning, high-dimensional geometry, and analysis of large networks.
1 Introduction
1(3)
2 High-Dimensional Space
4(25)
2.1 Introduction
4(1)
2.2 The Law of Large Numbers
4(4)
2.3 The Geometry of High Dimensions
8(1)
2.4 Properties of the Unit Ball
8(5)
2.5 Generating Points Uniformly at Random from a Ball
13(2)
2.6 Gaussians in High Dimension
15(1)
2.7 Random Projection and Johnson-Lindenstrauss Lemma
16(2)
2.8 Separating Gaussians
18(2)
2.9 Fitting a Spherical Gaussian to Data
20(1)
2.10 Bibliographic Notes
21(1)
2.11 Exercises
22(7)
3 Best-Fit Subspaces and Singular Value Decomposition (SVD)
29(33)
3.1 Introduction
29(2)
3.2 Preliminaries
31(1)
3.3 Singular Vectors
31(3)
3.4 Singular Value Decomposition (SVD)
34(2)
3.5 Best Rank-A: Approximations
36(1)
3.6 Left Singular Vectors
37(2)
3.7 Power Method for Singular Value Decomposition
39(3)
3.8 Singular Vectors and Eigenvectors
42(1)
3.9 Applications of Singular Value Decomposition
42(11)
3.10 Bibliographic Notes
53(1)
3.11 Exercises
54(8)
4 Random Walks and Markov Chains
62(47)
4.1 Stationary Distribution
65(2)
4.2 Markov Chain Monte Carlo
67(4)
4.3 Areas and Volumes
71(2)
4.4 Convergence of Random Walks on Undirected Graphs
73(8)
4.5 Electrical Networks and Random Walks
81(4)
4.6 Random Walks on Undirected Graphs with Unit Edge Weights
85(7)
4.7 Random Walks in Euclidean Space
92(3)
4.8 The Web as a Markov Chain
95(3)
4.9 Bibliographic Notes
98(1)
4.10 Exercises
99(10)
5 Machine Learning
109(50)
5.1 Introduction
109(1)
5.2 The Perceptron Algorithm
110(1)
5.3 Kernel Functions and Nonlinearly Separable Data
111(2)
5.4 Generalizing to New Data
113(5)
5.5 VC-Dimension
118(8)
5.6 VC-Dimension and Machine Learning
126(1)
5.7 Other Measures of Complexity
127(1)
5.8 Deep Learning
128(6)
5.9 Gradient Descent
134(4)
5.10 Online Learning
138(7)
5.11 Boosting
145(3)
5.12 Further Current Directions
148(4)
5.13 Bibliographic Notes
152(1)
5.14 Exercises
152(7)
6 Algorithms for Massive Data Problems: Streaming, Sketching, and Sampling
159(23)
6.1 Introduction
159(1)
6.2 Frequency Moments of Data Streams
160(9)
6.3 Matrix Algorithms Using Sampling
169(8)
6.4 Sketches of Documents
177(1)
6.5 Bibliographic Notes
178(1)
6.6 Exercises
179(3)
7 Clustering
182(33)
7.1 Introduction
182(3)
7.2 k-Means Clustering
185(4)
7.3 k-Center Clustering
189(1)
7.4 Finding Low-Error Clusterings
189(1)
7.5 Spectral Clustering
190(7)
7.6 Approximation Stability
197(2)
7.7 High-Density Clusters
199(2)
7.8 Kernel Methods
201(1)
7.9 Recursive Clustering Based on Sparse Cuts
202(1)
7.10 Dense Submatrices and Communities
202(3)
7.11 Community Finding and Graph Partitioning
205(3)
7.12 Spectral Clustering Applied to Social Networks
208(2)
7.13 Bibliographic Notes
210(1)
7.14 Exercises
210(5)
8 Random Graphs
215(59)
8.1 The G(n,p) Model
215(7)
8.2 Phase Transitions
222(10)
8.3 Giant Component
232(3)
8.4 Cycles and Full Connectivity
235(4)
8.5 Phase Transitions for Increasing Properties
239(2)
8.6 Branching Processes
241(5)
8.7 CNF-SAT
246(6)
8.8 Nonuniform Models of Random Graphs
252(2)
8.9 Growth Models
254(7)
8.10 Small-World Graphs
261(5)
8.11 Bibliographic Notes
266(1)
8.12 Exercises
266(8)
9 Topic Models, Nonnegative Matrix Factorization, Hidden Markov Models, and Graphical Models
274(44)
9.1 Topic Models
274(3)
9.2 An Idealized Model
277(2)
9.3 Nonnegative Matrix Factorization
279(2)
9.4 NMF with Anchor Terms
281(1)
9.5 Hard and Soft Clustering
282(1)
9.6 The Latent Dirichlet Allocation Model for Topic Modeling
283(2)
9.7 The Dominant Admixture Model
285(2)
9.8 Formal Assumptions
287(3)
9.9 Finding the Term-Topic Matrix
290(5)
9.10 Hidden Markov Models
295(3)
9.11 Graphical Models and Belief Propagation
298(1)
9.12 Bayesian or Belief Networks
299(1)
9.13 Markov Random Fields
300(1)
9.14 Factor Graphs
301(1)
9.15 Tree Algorithms
301(2)
9.16 Message Passing in General Graphs
303(7)
9.17 Warning Propagation
310(1)
9.18 Correlation between Variables
311(4)
9.19 Bibliographic Notes
315(1)
9.20 Exercises
315(3)
10 Other Topics
318(23)
10.1 Ranking and Social Choice
318(4)
10.2 Compressed Sensing and Sparse Vectors
322(3)
10.3 Applications
325(2)
10.4 An Uncertainty Principle
327(3)
10.5 Gradient
330(2)
10.6 Linear Programming
332(2)
10.7 Integer Optimization
334(1)
10.8 Semi-Definite Programming
334(2)
10.9 Bibliographic Notes
336(1)
10.10 Exercises
337(4)
11 Wavelets
341(19)
11.1 Dilation
341(1)
11.2 The Haar Wavelet
342(3)
11.3 Wavelet Systems
345(1)
11.4 Solving the Dilation Equation
346(1)
11.5 Conditions on the Dilation Equation
347(3)
11.6 Derivation of the Wavelets from the Scaling Function
350(3)
11.7 Sufficient Conditions for the Wavelets to Be Orthogonal
353(2)
11.8 Expressing a Function in Terms of Wavelets
355(1)
11.9 Designing a Wavelet System
356(1)
11.10 Applications
357(1)
11.11 Bibliographic Notes
357(1)
11.12 Exercises
357(3)
12 Background Material
360(51)
12.1 Definitions and Notation
360(1)
12.2 Useful Relations
361(4)
12.3 Useful Inequalities
365(7)
12.4 Probability
372(8)
12.5 Bounds on Tail Probability
380(6)
12.6 Applications of the Tail Bound
386(1)
12.7 Eigenvalues and Eigenvectors
387(13)
12.8 Generating Functions
400(4)
12.9 Miscellaneous
404(3)
12.10 Exercises
407(4)
References 411(10)
Index 421
Avrim Blum is Chief Academic Officer at Toyota Technical Institute at Chicago and formerly Professor at Carnegie Mellon University, Pennsylvania. He has over 25,000 citations for his work in algorithms and machine learning. He has received the AI Journal Classic Paper Award, ICML/COLT 10-Year Best Paper Award, Sloan Fellowship, NSF NYI award, and Herb Simon Teaching Award, and is a Fellow of the Association for Computing Machinery (ACM). John Hopcroft is a member of the National Academy of Sciences and National Academy of Engineering, and a foreign member of the Chinese Academy of Sciences. He received the Turing Award in 1986, was appointed to the National Science Board in 1992 by President George H. W. Bush, and was presented with the Friendship Award by Premier Li Keqiang for his work in China. Ravi Kannan is Principal Researcher for Microsoft Research, India. He was the recipient of the Fulkerson Prize in Discrete Mathematics (1991) and the Knuth Prize (ACM) in 2011. He is a distinguished alumnus of the Indian Institute of Technology, Bombay, and his past faculty appointments include Massachusetts Institute of Technology, Carnegie Mellon University, Pennsylvania, Yale University, Connecticut, and the Indian Institute of Science.