I Classification |
|
1 | (66) |
|
|
3 | (18) |
|
1.1 Classification: The Big Ideas |
|
|
4 | (3) |
|
1.1.1 The Error Rate and Other Summaries of Performance |
|
|
4 | (1) |
|
1.1.2 More Detailed Evaluation |
|
|
5 | (1) |
|
1.1.3 Overfitting and Cross-Validation |
|
|
6 | (1) |
|
1.2 Classifying with Nearest Neighbors |
|
|
7 | (3) |
|
1.2.1 Practical Considerations for Nearest Neighbors |
|
|
8 | (2) |
|
|
10 | (6) |
|
1.3.1 Cross-Validation to Choose a Model |
|
|
13 | (2) |
|
|
15 | (1) |
|
|
16 | (5) |
|
1.4.1 Remember These Terms |
|
|
16 | (1) |
|
1.4.2 Remember These Facts |
|
|
16 | (1) |
|
1.4.3 Remember These Procedures |
|
|
17 | (1) |
|
|
17 | (4) |
|
2 SVMs and Random Forests |
|
|
21 | (28) |
|
2.1 The Support Vector Machine |
|
|
21 | (13) |
|
|
22 | (2) |
|
|
24 | (1) |
|
2.1.3 Finding a Classifier with Stochastic Gradient Descent |
|
|
25 | (2) |
|
|
27 | (2) |
|
2.1.5 Summary: Training with Stochastic Gradient Descent |
|
|
29 | (1) |
|
2.1.6 Example: Adult Income with an SVM |
|
|
30 | (3) |
|
2.1.7 Multiclass Classification with SVMs |
|
|
33 | (1) |
|
2.2 Classifying with Random Forests |
|
|
34 | (10) |
|
2.2.1 Building a Decision Tree |
|
|
35 | (3) |
|
2.2.2 Choosing a Split with Information Gain |
|
|
38 | (3) |
|
|
41 | (1) |
|
2.2.4 Building and Evaluating a Decision Forest |
|
|
41 | (1) |
|
2.2.5 Classifying Data Items with a Decision Forest |
|
|
42 | (2) |
|
|
44 | (5) |
|
2.3.1 Remember These Terms |
|
|
44 | (1) |
|
2.3.2 Remember These Facts |
|
|
44 | (1) |
|
2.3.3 Use These Procedures |
|
|
45 | (1) |
|
|
45 | (4) |
|
3 A Little Learning Theory |
|
|
49 | (18) |
|
3.1 Held-Out Loss Predicts Test Loss |
|
|
49 | (4) |
|
3.1.1 Sample Means and Expectations |
|
|
50 | (2) |
|
3.1.2 Using Chebyshev's Inequality |
|
|
52 | (1) |
|
3.1.3 A Generalization Bound |
|
|
52 | (1) |
|
3.2 Test and Training Error for a Classifier from a Finite Family |
|
|
53 | (4) |
|
3.2.1 Hoeffding's Inequality |
|
|
54 | (1) |
|
3.2.2 Test from Training for a Finite Family of Predictors |
|
|
55 | (1) |
|
3.2.3 Number of Examples Required |
|
|
56 | (1) |
|
3.3 An Infinite Collection of Predictors |
|
|
57 | (7) |
|
3.3.1 Predictors and Binary Functions |
|
|
57 | (4) |
|
|
61 | (1) |
|
3.3.3 Bounding the Generalization Error |
|
|
62 | (2) |
|
|
64 | (5) |
|
3.4.1 Remember These Terms |
|
|
64 | (1) |
|
3.4.2 Remember These Facts |
|
|
64 | (1) |
|
|
65 | (2) |
II High Dimensional Data |
|
67 | (86) |
|
|
69 | (24) |
|
4.1 Summaries and Simple Plots |
|
|
69 | (8) |
|
|
70 | (1) |
|
4.1.2 Stem Plots and Scatterplot Matrices |
|
|
70 | (3) |
|
|
73 | (1) |
|
4.1.4 The Covariance Matrix |
|
|
74 | (3) |
|
4.2 The Curse of Dimension |
|
|
77 | (2) |
|
4.2.1 The Curse: Data Isn't Where You Think It Is |
|
|
77 | (1) |
|
4.2.2 Minor Banes of Dimension |
|
|
78 | (1) |
|
4.3 Using Mean and Covariance to Understand High Dimensional Data |
|
|
79 | (4) |
|
4.3.1 Mean and Covariance Under Affine Transformations |
|
|
80 | (1) |
|
4.3.2 Eigenvectors and Diagonalization |
|
|
81 | (1) |
|
4.3.3 Diagonalizing Covariance by Rotating Blobs |
|
|
82 | (1) |
|
4.4 The Multivariate Normal Distribution |
|
|
83 | (5) |
|
4.4.1 Affine Transformations and Gaussian |
|
|
84 | (1) |
|
4.4.2 Plotting a 2D Gaussian: Covariance Ellipses |
|
|
85 | (1) |
|
4.4.3 Descriptive Statistics and Expectations |
|
|
86 | (1) |
|
4.4.4 More from the Curse of Dimension |
|
|
87 | (1) |
|
|
88 | (5) |
|
4.5.1 Remember These Terms |
|
|
88 | (1) |
|
4.5.2 Remember These Facts |
|
|
88 | (1) |
|
4.5.3 Remember These Procedures |
|
|
89 | (4) |
|
5 Principal Component Analysis |
|
|
93 | (24) |
|
5.1 Representing Data on Principal Components |
|
|
93 | (12) |
|
5.1.1 Approximating Blobs |
|
|
93 | (1) |
|
5.1.2 Example: Transforming the Height-Weight Blob |
|
|
94 | (2) |
|
5.1.3 Representing Data on Principal Components |
|
|
96 | (2) |
|
5.1.4 The Error in a Low Dimensional Representation |
|
|
98 | (1) |
|
5.1.5 Extracting a Few Principal Components with NIPALS . |
|
|
99 | (2) |
|
5.1.6 Principal Components and Missing Values |
|
|
101 | (2) |
|
|
103 | (2) |
|
5.2 Example: Representing Colors with Principal Components |
|
|
105 | (4) |
|
5.3 Example: Representing Faces with Principal Components |
|
|
109 | (2) |
|
|
111 | (6) |
|
5.4.1 Remember These Terms |
|
|
111 | (1) |
|
5.4.2 Remember These Facts |
|
|
111 | (1) |
|
5.4.3 Remember These Procedures |
|
|
111 | (1) |
|
|
111 | (6) |
|
6 Low Rank Approximations |
|
|
117 | (22) |
|
6.1 The Singular Value Decomposition |
|
|
117 | (5) |
|
|
119 | (1) |
|
6.1.2 SVD and Low Rank Approximations |
|
|
120 | (1) |
|
6.1.3 Smoothing with the SVD |
|
|
120 | (2) |
|
6.2 Multidimensional Scaling |
|
|
122 | (4) |
|
6.2.1 Choosing Low D Points Using High D Distances |
|
|
122 | (1) |
|
6.2.2 Using a Low Rank Approximation to Factor |
|
|
123 | (1) |
|
6.2.3 Example: Mapping with Multidimensional Scaling |
|
|
124 | (2) |
|
6.3 Example: Text Models and Latent Semantic Analysis |
|
|
126 | (10) |
|
6.3.1 The Cosine Distance |
|
|
127 | (1) |
|
6.3.2 Smoothing Word Counts |
|
|
128 | (2) |
|
6.3.3 Example: Mapping NIPS Documents |
|
|
130 | (1) |
|
6.3.4 Obtaining the Meaning of Words |
|
|
130 | (3) |
|
6.3.5 Example: Mapping NIPS Words |
|
|
133 | (1) |
|
|
134 | (2) |
|
|
136 | (3) |
|
6.4.1 Remember These Terms |
|
|
136 | (1) |
|
6.4.2 Remember These Facts |
|
|
136 | (1) |
|
6.4.3 Remember These Procedures |
|
|
136 | (1) |
|
|
136 | (3) |
|
7 Canonical Correlation Analysis |
|
|
139 | (14) |
|
7.1 Canonical Correlation Analysis |
|
|
139 | (3) |
|
7.2 Example: CCA of Words and Pictures |
|
|
142 | (2) |
|
7.3 Example: CCA of Albedo and Shading |
|
|
144 | (6) |
|
7.3.1 Are Correlations Significant? |
|
|
148 | (2) |
|
|
150 | (5) |
|
7.4.1 Remember These Terms |
|
|
150 | (1) |
|
7.4.2 Remember These Facts |
|
|
150 | (1) |
|
7.4.3 Remember These Procedures |
|
|
150 | (1) |
|
|
150 | (3) |
III Clustering |
|
153 | (50) |
|
|
155 | (28) |
|
8.1 Agglomerative and Divisive Clustering |
|
|
155 | (4) |
|
8.1.1 Clustering and Distance |
|
|
157 | (2) |
|
8.2 The k-Means Algorithm and Variants |
|
|
159 | (12) |
|
|
163 | (1) |
|
|
164 | (2) |
|
8.2.3 Efficient Clustering and Hierarchical k-Means |
|
|
166 | (1) |
|
|
167 | (1) |
|
8.2.5 Example: Groceries in Portugal |
|
|
167 | (3) |
|
8.2.6 General Comments on k-Means |
|
|
170 | (1) |
|
8.3 Describing Repetition with Vector Quantization |
|
|
171 | (7) |
|
8.3.1 Vector Quantization |
|
|
172 | (3) |
|
8.3.2 Example: Activity from Accelerometer Data |
|
|
175 | (3) |
|
|
178 | (5) |
|
8.4.1 Remember These Terms |
|
|
178 | (1) |
|
8.4.2 Remember These Facts |
|
|
178 | (1) |
|
8.4.3 Remember These Procedures |
|
|
178 | (5) |
|
9 Clustering Using Probability Models |
|
|
183 | (20) |
|
9.1 Mixture Models and Clustering |
|
|
183 | (5) |
|
9.1.1 A Finite Mixture of Blobs |
|
|
184 | (1) |
|
9.1.2 Topics and Topic Models |
|
|
185 | (3) |
|
|
188 | (10) |
|
9.2.1 Example: Mixture of Normals: The E-step |
|
|
189 | (2) |
|
9.2.2 Example: Mixture of Normals: The M-step |
|
|
191 | (1) |
|
9.2.3 Example: Topic Model: The E-step |
|
|
192 | (1) |
|
9.2.4 Example: Topic Model: The M-step |
|
|
193 | (1) |
|
|
193 | (5) |
|
|
198 | (7) |
|
9.3.1 Remember These Terms |
|
|
198 | (1) |
|
9.3.2 Remember These Facts |
|
|
198 | (1) |
|
9.3.3 Remember These Procedures |
|
|
198 | (1) |
|
|
198 | (5) |
IV Regression |
|
203 | (100) |
|
|
205 | (40) |
|
|
205 | (3) |
|
10.1.1 Regression to Spot Trends |
|
|
206 | (2) |
|
10.2 Linear Regression and Least Squares |
|
|
208 | (10) |
|
|
209 | (1) |
|
|
210 | (2) |
|
|
212 | (1) |
|
|
212 | (2) |
|
10.2.5 Transforming Variables |
|
|
214 | (3) |
|
10.2.6 Can You Trust Your Regression? |
|
|
217 | (1) |
|
10.3 Visualizing Regressions to Find Problems |
|
|
218 | (7) |
|
10.3.1 Problem Data Points Have Significant Impact |
|
|
219 | (3) |
|
10.3.2 The Hat Matrix and Leverage |
|
|
222 | (1) |
|
|
223 | (1) |
|
10.3.4 Standardized Residuals |
|
|
224 | (1) |
|
10.4 Many Explanatory Variables |
|
|
225 | (11) |
|
10.4.1 Functions of One Explanatory Variable |
|
|
227 | (1) |
|
10.4.2 Regularizing Linear Regressions |
|
|
227 | (5) |
|
10.4.3 Example: Weight Against Body Measurements |
|
|
232 | (4) |
|
|
236 | (9) |
|
10.5.1 Remember These Terms |
|
|
236 | (1) |
|
10.5.2 Remember These Facts |
|
|
236 | (1) |
|
10.5.3 Remember These Procedures |
|
|
236 | (1) |
|
|
237 | (8) |
|
11 Regression: Choosing and Managing Models |
|
|
245 | (30) |
|
11.1 Model Selection: Which Model Is Best? |
|
|
245 | (8) |
|
|
246 | (2) |
|
11.1.2 Choosing a Model Using Penalties: AIC and BIC |
|
|
248 | (2) |
|
11.1.3 Choosing a Model Using Cross-Validation |
|
|
250 | (1) |
|
11.1.4 Greedy Search with Stagewise Regression |
|
|
251 | (1) |
|
11.1.5 What Variables Are Important? |
|
|
252 | (1) |
|
|
253 | (5) |
|
11.2.1 M-Estimators and Iteratively Reweighted Least Squares |
|
|
254 | (3) |
|
11.2.2 Scale for M-Estimators |
|
|
257 | (1) |
|
11.3 Generalized Linear Models |
|
|
258 | (4) |
|
11.3.1 Logistic Regression |
|
|
258 | (2) |
|
11.3.2 Multiclass Logistic Regression |
|
|
260 | (1) |
|
11.3.3 Regressing Count Data |
|
|
261 | (1) |
|
|
262 | (1) |
|
11.4 L1 Regularization and Sparse Models |
|
|
262 | (9) |
|
11.4.1 Dropping Variables with Ll Regularization |
|
|
263 | (4) |
|
|
267 | (3) |
|
11.4.3 Using Sparsity Penalties with Other Models |
|
|
270 | (1) |
|
|
271 | (4) |
|
11.5.1 Remember These Terms |
|
|
271 | (1) |
|
11.5.2 Remember These Facts |
|
|
271 | (1) |
|
11.5.3 Remember These Procedures |
|
|
272 | (3) |
|
|
275 | (28) |
|
12.1 Greedy and Stagewise Methods for Regression |
|
|
276 | (8) |
|
12.1.1 Example: Greedy Stagewise Linear Regression |
|
|
276 | (3) |
|
|
279 | (1) |
|
12.1.3 Greedy Stagewise Regression with Trees |
|
|
279 | (5) |
|
12.2 Boosting a Classifier |
|
|
284 | (15) |
|
|
284 | (2) |
|
12.2.2 Recipe: Stagewise Reduction of Loss |
|
|
286 | (2) |
|
12.2.3 Example: Boosting Decision Stumps |
|
|
288 | (1) |
|
12.2.4 Gradient Boost with Decision Stumps |
|
|
289 | (1) |
|
12.2.5 Gradient Boost with Other Predictors |
|
|
290 | (1) |
|
12.2.6 Example: Is a Prescriber an Opiate Prescriber? |
|
|
291 | (2) |
|
12.2.7 Pruning the Boosted Predictor with the Lasso |
|
|
293 | (1) |
|
12.2.8 Gradient Boosting Software |
|
|
294 | (5) |
|
|
299 | (6) |
|
12.3.1 Remember These Definitions |
|
|
299 | (1) |
|
12.3.2 Remember These Terms |
|
|
299 | (1) |
|
12.3.3 Remember These Facts |
|
|
299 | (1) |
|
12.3.4 Remember These Procedures |
|
|
300 | (1) |
|
|
300 | (3) |
V Graphical Models |
|
303 | (62) |
|
|
305 | (28) |
|
|
305 | (11) |
|
13.1.1 Transition Probability Matrices |
|
|
309 | (2) |
|
13.1.2 Stationary Distributions |
|
|
311 | (2) |
|
13.1.3 Example: Markov Chain Models of Text |
|
|
313 | (3) |
|
13.2 Hidden Markov Models and Dynamic Programming |
|
|
316 | (7) |
|
13.2.1 Hidden Markov Models |
|
|
316 | (1) |
|
13.2.2 Picturing Inference with a Trellis |
|
|
317 | (3) |
|
13.2.3 Dynamic Programming for HMMs: Formalities |
|
|
320 | (1) |
|
13.2.4 Example: Simple Communication Errors |
|
|
321 | (2) |
|
|
323 | (6) |
|
13.3.1 When the States Have Meaning |
|
|
324 | (1) |
|
13.3.2 Learning an HMM with EM |
|
|
324 | (5) |
|
|
329 | (4) |
|
13.4.1 Remember These Terms |
|
|
329 | (1) |
|
13.4.2 Remember These Facts |
|
|
330 | (1) |
|
|
330 | (3) |
|
14 Learning Sequence Models Discriminatively |
|
|
333 | (18) |
|
|
333 | (5) |
|
14.1.1 Inference and Graphs |
|
|
334 | (2) |
|
|
336 | (1) |
|
14.1.3 Learning in Graphical Models |
|
|
337 | (1) |
|
14.2 Conditional Random Field Models for Sequences |
|
|
338 | (5) |
|
14.2.1 MEMMs and Label Bias |
|
|
339 | (2) |
|
14.2.2 Conditional Random Field Models |
|
|
341 | (1) |
|
14.2.3 Learning a CRF Takes Care |
|
|
342 | (1) |
|
14.3 Discriminative Learning of CRFs |
|
|
343 | (5) |
|
14.3.1 Representing the Model |
|
|
343 | (1) |
|
14.3.2 Example: Modelling a Sequence of Digits |
|
|
344 | (1) |
|
14.3.3 Setting Up the Learning Problem |
|
|
345 | (1) |
|
14.3.4 Evaluating the Gradient |
|
|
346 | (2) |
|
|
348 | (3) |
|
14.4.1 Remember These Terms |
|
|
348 | (1) |
|
14.4.2 Remember These Procedures |
|
|
348 | (1) |
|
|
348 | (3) |
|
|
351 | (14) |
|
15.1 Useful but Intractable Models |
|
|
351 | (7) |
|
15.1.1 Denoising Binary Images with Boltzmann Machines |
|
|
352 | (1) |
|
15.1.2 A Discrete Markov Random Field |
|
|
353 | (1) |
|
15.1.3 Denoising and Segmenting with Discrete MRFs |
|
|
354 | (3) |
|
15.1.4 MAP Inference in Discrete MRFs Can Be Hard |
|
|
357 | (1) |
|
15.2 Variational Inference |
|
|
358 | (3) |
|
|
359 | (1) |
|
15.2.2 The Variational Free Energy |
|
|
360 | (1) |
|
15.3 Example: Variational Inference for Boltzmann Machines |
|
|
361 | (3) |
|
|
364 | (3) |
|
15.4.1 Remember These Terms |
|
|
364 | (1) |
|
15.4.2 Remember These Facts |
|
|
364 | (1) |
|
|
364 | (1) |
VI Deep Networks |
|
365 | (114) |
|
16 Simple Neural Networks |
|
|
367 | (32) |
|
16.1 Units and Classification |
|
|
367 | (5) |
|
16.1.1 Building a Classifier out of Units: The Cost Function |
|
|
368 | (1) |
|
16.1.2 Building a Classifier out of Units: Strategy |
|
|
369 | (1) |
|
16.1.3 Building a Classifier out of Units: Training |
|
|
370 | (2) |
|
16.2 Example: Classifying Credit Card Accounts |
|
|
372 | (5) |
|
|
377 | (6) |
|
|
377 | (2) |
|
16.3.2 Jacobians and the Gradient |
|
|
379 | (1) |
|
16.3.3 Setting up Multiple Layers |
|
|
380 | (1) |
|
16.3.4 Gradients and Backpropagation |
|
|
381 | (2) |
|
16.4 Training Multilayer Networks |
|
|
383 | (11) |
|
16.4.1 Software Environments |
|
|
385 | (1) |
|
16.4.2 Dropout and Redundant Units |
|
|
386 | (1) |
|
16.4.3 Example: Credit Card Accounts Revisited |
|
|
387 | (3) |
|
16.4.4 Advanced Tricks: Gradient Scaling |
|
|
390 | (4) |
|
|
394 | (5) |
|
16.5.1 Remember These Terms |
|
|
394 | (1) |
|
16.5.2 Remember These Facts |
|
|
394 | (1) |
|
16.5.3 Remember These Procedures |
|
|
394 | (1) |
|
|
395 | (4) |
|
17 Simple Image Classifiers |
|
|
399 | (24) |
|
17.1 Image Classification |
|
|
399 | (9) |
|
17.1.1 Pattern Detection by Convolution |
|
|
401 | (6) |
|
17.1.2 Convolutional Layers upon Convolutional Layers |
|
|
407 | (1) |
|
17.2 Two Practical Image Classifiers |
|
|
408 | (12) |
|
17.2.1 Example: Classifying MNIST |
|
|
410 | (2) |
|
17.2.2 Example: Classifying CIFAR-10 |
|
|
412 | (6) |
|
17.2.3 Quirks: Adversarial Examples |
|
|
418 | (2) |
|
|
420 | (3) |
|
17.3.1 Remember These Definitions |
|
|
420 | (1) |
|
17.3.2 Remember These Terms |
|
|
420 | (1) |
|
17.3.3 Remember These Facts |
|
|
420 | (1) |
|
17.3.4 Remember These Procedures |
|
|
420 | (1) |
|
|
420 | (3) |
|
18 Classifying Images and Detecting Objects |
|
|
423 | (32) |
|
18.1 Image Classification |
|
|
423 | (15) |
|
18.1.1 Datasets for Classifying Images of Objects |
|
|
424 | (2) |
|
18.1.2 Datasets for Classifying Images of Scenes |
|
|
426 | (1) |
|
18.1.3 Augmentation and Ensembles |
|
|
427 | (1) |
|
|
428 | (2) |
|
|
430 | (2) |
|
18.1.6 Batch Normalization |
|
|
432 | (1) |
|
18.1.7 Computation Graphs |
|
|
433 | (1) |
|
18.1.8 Inception Networks |
|
|
434 | (2) |
|
|
436 | (2) |
|
|
438 | (9) |
|
18.2.1 How Object Detectors Work |
|
|
438 | (2) |
|
|
440 | (1) |
|
18.2.3 R-CNN, Fast R-CNN and Faster R-CNN |
|
|
441 | (2) |
|
|
443 | (2) |
|
18.2.5 Evaluating Detectors |
|
|
445 | (2) |
|
|
447 | (2) |
|
|
449 | (6) |
|
18.4.1 Remember These Terms |
|
|
449 | (1) |
|
18.4.2 Remember These Facts |
|
|
449 | (1) |
|
|
450 | (5) |
|
19 Small Codes for Big Signals |
|
|
455 | (24) |
|
19.1 Better Low Dimensional Maps |
|
|
455 | (5) |
|
|
456 | (1) |
|
|
457 | (3) |
|
19.2 Maps That Make Low-D Representations |
|
|
460 | (9) |
|
19.2.1 Encoders, Decoders, and Autoencoders |
|
|
461 | (1) |
|
19.2.2 Making Data Blocks Bigger |
|
|
462 | (3) |
|
19.2.3 The Denoising Autoencoder |
|
|
465 | (4) |
|
19.3 Generating Images from Examples |
|
|
469 | (6) |
|
19.3.1 Variational Autoencoders |
|
|
470 | (1) |
|
19.3.2 Adversarial Losses: Fooling a Classifier |
|
|
471 | (2) |
|
19.3.3 Matching Distributions with Test Functions |
|
|
473 | (1) |
|
19.3.4 Matching Distributions by Looking at Distances |
|
|
474 | (1) |
|
|
475 | (4) |
|
19.4.1 Remember These Terms |
|
|
475 | (1) |
|
19.4.2 Remember These Facts |
|
|
476 | (1) |
|
|
476 | (3) |
Index |
|
479 | (6) |
Index: Useful Facts |
|
485 | (2) |
Index: Procedures |
|
487 | (2) |
Index: Worked Examples |
|
489 | (2) |
Index: Remember This |
|
491 | |