Preface |
|
xvii | |
|
|
1 | (20) |
|
1.1 Rise of Big Data and Dimensionality |
|
|
1 | (8) |
|
1.1.1 Biological sciences |
|
|
2 | (2) |
|
|
4 | (1) |
|
1.1.3 Computer and information sciences |
|
|
5 | (2) |
|
1.1.4 Economics and finance |
|
|
7 | (2) |
|
1.1.5 Business and program evaluation |
|
|
9 | (1) |
|
1.1.6 Earth sciences and astronomy |
|
|
9 | (1) |
|
|
9 | (2) |
|
1.3 Impact of Dimensionality |
|
|
11 | (7) |
|
|
11 | (1) |
|
|
12 | (2) |
|
1.3.3 Spurious correlation |
|
|
14 | (3) |
|
|
17 | (1) |
|
1.4 Aim of High-dimensional Statistical Learning |
|
|
18 | (1) |
|
|
19 | (1) |
|
|
19 | (2) |
|
2 Multiple and Nonparametric Regression |
|
|
21 | (34) |
|
|
21 | (1) |
|
2.2 Multiple Linear Regression |
|
|
21 | (6) |
|
2.2.1 The Gauss-Markov theorem |
|
|
23 | (3) |
|
|
26 | (1) |
|
2.3 Weighted Least-Squares |
|
|
27 | (2) |
|
2.4 Box-Cox Transformation |
|
|
29 | (1) |
|
2.5 Model Building and Basis Expansions |
|
|
30 | (7) |
|
2.5.1 Polynomial regression |
|
|
31 | (1) |
|
|
32 | (3) |
|
2.5.3 Multiple covariates |
|
|
35 | (2) |
|
|
37 | (5) |
|
2.6.1 Bias-variance tradeoff |
|
|
37 | (1) |
|
2.6.2 2 penalized least squares |
|
|
38 | (1) |
|
2.6.3 Bayesian interpretation |
|
|
38 | (1) |
|
2.6.4 Ridge regression solution path |
|
|
39 | (2) |
|
2.6.5 Kernel ridge regression |
|
|
41 | (1) |
|
2.7 Regression in Reproducing Kernel Hilbert Space |
|
|
42 | (5) |
|
2.8 Leave-one-out and Generalized Cross-validation |
|
|
47 | (2) |
|
|
49 | (6) |
|
3 Introduction to Penalized Least-Squares |
|
|
55 | (66) |
|
3.1 Classical Variable Selection Criteria |
|
|
55 | (4) |
|
|
55 | (1) |
|
3.1.2 Relation with penalized regression |
|
|
56 | (1) |
|
3.1.3 Selection of regularization parameters |
|
|
57 | (2) |
|
3.2 Folded-concave Penalized Least Squares |
|
|
59 | (7) |
|
3.2.1 Orthonormal designs |
|
|
61 | (1) |
|
|
62 | (1) |
|
3.2.3 Thresholding by SCAD and MCP |
|
|
63 | (1) |
|
|
64 | (1) |
|
3.2.5 Characterization of folded-concave PLS |
|
|
65 | (1) |
|
3.3 Lasso and L1 Regularization |
|
|
66 | (15) |
|
3.3.1 Nonnegative garrote |
|
|
66 | (2) |
|
|
68 | (3) |
|
|
71 | (1) |
|
|
72 | (2) |
|
|
74 | (3) |
|
3.3.6 SLOPE and sorted penalties |
|
|
77 | (1) |
|
3.3.7 Concentration inequalities and uniform convergence |
|
|
78 | (3) |
|
3.3.8 A brief history of model selection |
|
|
81 | (1) |
|
3.4 Bayesian Variable Selection |
|
|
81 | (3) |
|
3.4.1 Bayesian view of the PLS |
|
|
81 | (2) |
|
3.4.2 A Bayesian framework for selection |
|
|
83 | (1) |
|
|
84 | (15) |
|
|
84 | (2) |
|
3.5.2 Least angle regression* |
|
|
86 | (3) |
|
3.5.3 Local quadratic approximations |
|
|
89 | (2) |
|
3.5.4 Local linear algorithm |
|
|
91 | (1) |
|
3.5.5 Penalized linear unbiased selection* |
|
|
92 | (1) |
|
3.5.6 Cyclic coordinate descent algorithms |
|
|
93 | (1) |
|
3.5.7 Iterative shrinkage-thresholding algorithms |
|
|
94 | (2) |
|
3.5.8 Projected proximal gradient method |
|
|
96 | (1) |
|
|
96 | (1) |
|
3.5.10 Iterative local adaptive majorization and minimization |
|
|
97 | (1) |
|
3.5.11 Other methods and timeline |
|
|
98 | (1) |
|
3.6 Regularization Parameters for PLS |
|
|
99 | (4) |
|
|
100 | (2) |
|
3.6.2 Extension of information criteria |
|
|
102 | (1) |
|
3.6.3 Application to PLS estimators |
|
|
102 | (1) |
|
3.7 Residual Variance and Refitted Cross-validation |
|
|
103 | (3) |
|
3.7.1 Residual variance of Lasso |
|
|
103 | (1) |
|
3.7.2 Refitted cross-validation |
|
|
104 | (2) |
|
3.8 Extensions to Nonparametric Modeling |
|
|
106 | (3) |
|
3.8.1 Structured nonparametric models |
|
|
106 | (1) |
|
|
107 | (2) |
|
|
109 | (5) |
|
3.10 Bibliographical Notes |
|
|
114 | (1) |
|
|
115 | (6) |
|
4 Penalized Least Squares: Properties |
|
|
121 | (106) |
|
4.1 Performance Benchmarks |
|
|
121 | (18) |
|
4.1.1 Performance measures |
|
|
122 | (3) |
|
4.1.2 Impact of model uncertainty |
|
|
125 | (1) |
|
4.1.2.1 Bayes lower bounds for orthogonal design |
|
|
126 | (4) |
|
4.1.2.2 Minimax lower bounds for general design |
|
|
130 | (6) |
|
4.1.3 Performance goals, sparsity and sub-Gaussian noise |
|
|
136 | (3) |
|
4.2 Penalized L0 Selection |
|
|
139 | (6) |
|
4.3 Lasso and Dantzig Selector |
|
|
145 | (38) |
|
4.3.1 Selection consistency |
|
|
146 | (4) |
|
4.3.2 Prediction and coefficient estimation errors |
|
|
150 | (11) |
|
4.3.3 Model size and least squares after selection |
|
|
161 | (6) |
|
4.3.4 Properties of the Dantzig selector |
|
|
167 | (8) |
|
4.3.5 Regularity conditions on the design matrix |
|
|
175 | (8) |
|
4.4 Properties of Concave PLS |
|
|
183 | (23) |
|
4.4.1 Properties of penalty functions |
|
|
185 | (5) |
|
4.4.2 Local and oracle solutions |
|
|
190 | (5) |
|
4.4.3 Properties of local solutions |
|
|
195 | (5) |
|
4.4.4 Global and approximate global solutions |
|
|
200 | (6) |
|
4.5 Smaller and Sorted Penalties |
|
|
206 | (18) |
|
4.5.1 Sorted concave penalties and their local approximation |
|
|
207 | (4) |
|
4.5.2 Approximate PLS with smaller and sorted penalties |
|
|
211 | (9) |
|
4.5.3 Properties of LLA and LCA |
|
|
220 | (4) |
|
4.6 Bibliographical Notes |
|
|
224 | (1) |
|
|
225 | (2) |
|
5 Generalized Linear Models and Penalized Likelihood |
|
|
227 | (60) |
|
5.1 Generalized Linear Models |
|
|
227 | (11) |
|
|
227 | (3) |
|
5.1.2 Elements of generalized linear models |
|
|
230 | (1) |
|
|
231 | (1) |
|
5.1.4 Computing MLE: Iteratively reweighed least squares |
|
|
232 | (2) |
|
5.1.5 Deviance and analysis of deviance |
|
|
234 | (2) |
|
|
236 | (2) |
|
|
238 | (5) |
|
5.2.1 Bernoulli and binomial models |
|
|
238 | (3) |
|
5.2.2 Models for count responses |
|
|
241 | (2) |
|
5.2.3 Models for nonnegative continuous responses |
|
|
243 | (1) |
|
5.2.4 Normal error models |
|
|
243 | (1) |
|
5.3 Sparest Solution in High Confidence Set |
|
|
243 | (3) |
|
|
244 | (1) |
|
|
244 | (1) |
|
|
245 | (1) |
|
5.4 Variable Selection via Penalized Likelihood |
|
|
246 | (3) |
|
|
249 | (3) |
|
5.5.1 Local quadratic approximation |
|
|
249 | (1) |
|
5.5.2 Local linear approximation |
|
|
250 | (1) |
|
|
251 | (1) |
|
5.5.4 Iterative local adaptive majorization and minimization |
|
|
252 | (1) |
|
5.6 Tuning Parameter Selection |
|
|
252 | (2) |
|
|
254 | (2) |
|
5.8 Sampling Properties in Low-dimension |
|
|
256 | (8) |
|
5.8.1 Notation and regularity conditions |
|
|
257 | (1) |
|
5.8.2 The oracle property |
|
|
258 | (2) |
|
5.8.3 Sampling properties with diverging dimensions |
|
|
260 | (2) |
|
5.8.4 Asymptotic properties of GIC selectors |
|
|
262 | (2) |
|
5.9 Properties under Ultrahigh Dimensions |
|
|
264 | (10) |
|
5.9.1 The Lasso penalized estimator and its risk property |
|
|
264 | (4) |
|
5.9.2 Strong oracle property |
|
|
268 | (5) |
|
|
273 | (1) |
|
|
274 | (4) |
|
5.11 Bibliographical Notes |
|
|
278 | (2) |
|
|
280 | (7) |
|
|
287 | (34) |
|
6.1 Penalized Quantile Regression |
|
|
287 | (7) |
|
6.1.1 Quantile regression |
|
|
287 | (2) |
|
6.1.2 Variable selection in quantile regression |
|
|
289 | (2) |
|
6.1.3 A fast algorithm for penalized quantile regression |
|
|
291 | (3) |
|
6.2 Penalized Composite Quantile Regression |
|
|
294 | (3) |
|
6.3 Variable Selection in Robust Regression |
|
|
297 | (4) |
|
|
297 | (2) |
|
6.3.2 Variable selection in Huber regression |
|
|
299 | (2) |
|
6.4 Rank Regression and Its Variable Selection |
|
|
301 | (2) |
|
|
302 | (1) |
|
6.4.2 Penalized weighted rank regression |
|
|
302 | (1) |
|
6.5 Variable Selection for Survival Data |
|
|
303 | (5) |
|
|
305 | (1) |
|
6.5.2 Variable selection via penalized partial likelihood and its properties |
|
|
306 | (2) |
|
6.6 Theory of Folded-concave Penalized M-estimator |
|
|
308 | (9) |
|
6.6.1 Conditions on penalty and restricted strong convexity |
|
|
309 | (1) |
|
6.6.2 Statistical accuracy of penalized M-estimator with folded concave penalties |
|
|
310 | (4) |
|
6.6.3 Computational accuracy |
|
|
314 | (3) |
|
6.7 Bibliographical Notes |
|
|
317 | (2) |
|
|
319 | (2) |
|
7 High Dimensional Inference |
|
|
321 | (60) |
|
7.1 Inference in Linear Regression |
|
|
322 | (8) |
|
7.1.1 Debias of regularized regression estimators |
|
|
323 | (2) |
|
|
325 | (2) |
|
7.1.3 Inference for the noise level |
|
|
327 | (3) |
|
7.2 Inference in Generalized Linear Models |
|
|
330 | (9) |
|
|
331 | (1) |
|
7.2.2 Decorrelated score estimator |
|
|
332 | (3) |
|
7.2.3 Test of linear hypotheses |
|
|
335 | (2) |
|
7.2.4 Numerical comparison |
|
|
337 | (1) |
|
|
338 | (1) |
|
7.3 Asymptotic Efficiency* |
|
|
339 | (16) |
|
7.3.1 Statistical efficiency and Fisher information |
|
|
340 | (5) |
|
7.3.2 Linear regression with random design |
|
|
345 | (6) |
|
7.3.3 Partial linear regression |
|
|
351 | (4) |
|
7.4 Gaussian Graphical Models |
|
|
355 | (13) |
|
7.4.1 Inference via penalized least squares |
|
|
356 | (5) |
|
7.4.2 Sample size in regression and graphical models |
|
|
361 | (7) |
|
|
368 | (8) |
|
7.5.1 Local semi-LD decomposition |
|
|
368 | (2) |
|
|
370 | (4) |
|
7.5.3 Gradient approximation |
|
|
374 | (2) |
|
7.6 Bibliographical Notes |
|
|
376 | (1) |
|
|
377 | (4) |
|
|
381 | (50) |
|
8.1 Correlation Screening |
|
|
381 | (5) |
|
8.1.1 Sure screening property |
|
|
382 | (2) |
|
8.1.2 Connection to multiple comparison |
|
|
384 | (1) |
|
|
385 | (1) |
|
8.2 Generalized and Rank Correlation Screening |
|
|
386 | (3) |
|
8.3 Feature Screening for Parametric Models |
|
|
389 | (6) |
|
8.3.1 Generalized linear models |
|
|
389 | (2) |
|
8.3.2 A unified strategy for parametric feature screening |
|
|
391 | (3) |
|
8.3.3 Conditional sure independence screening |
|
|
394 | (1) |
|
8.4 Nonparametric Screening |
|
|
395 | (6) |
|
|
395 | (1) |
|
8.4.2 Varying coefficient models |
|
|
396 | (4) |
|
8.4.3 Heterogeneous nonparametric models |
|
|
400 | (1) |
|
8.5 Model-free Feature Screening |
|
|
401 | (8) |
|
8.5.1 Sure independent ranking screening procedure |
|
|
401 | (2) |
|
8.5.2 Feature screening via distance correlation |
|
|
403 | (3) |
|
8.5.3 Feature screening for high-dimensional categorial data |
|
|
406 | (3) |
|
8.6 Screening and Selection |
|
|
409 | (8) |
|
8.6.1 Feature screening via forward regression |
|
|
409 | (1) |
|
8.6.2 Sparse maximum likelihood estimate |
|
|
410 | (2) |
|
8.6.3 Feature screening via partial correlation |
|
|
412 | (5) |
|
8.7 Refitted Cross-Validation |
|
|
417 | (6) |
|
|
417 | (1) |
|
8.7.2 RCV in linear models |
|
|
418 | (2) |
|
8.7.3 RCV in nonparametric regression |
|
|
420 | (3) |
|
|
423 | (3) |
|
8.9 Bibliographical Notes |
|
|
426 | (2) |
|
|
428 | (3) |
|
9 Covariance Regularization and Graphical Models |
|
|
431 | (40) |
|
9.1 Basic Facts about Matrices |
|
|
431 | (4) |
|
9.2 Sparse Covariance Matrix Estimation |
|
|
435 | (8) |
|
9.2.1 Covariance regularization by thresholding and banding |
|
|
435 | (3) |
|
9.2.2 Asymptotic properties |
|
|
438 | (3) |
|
9.2.3 Nearest positive definite matrices |
|
|
441 | (2) |
|
9.3 Robust Covariance Inputs |
|
|
443 | (3) |
|
9.4 Sparse Precision Matrix and Graphical Models |
|
|
446 | (10) |
|
9.4.1 Gaussian graphical models |
|
|
446 | (1) |
|
9.4.2 Penalized likelihood and M-estimation |
|
|
447 | (1) |
|
9.4.3 Penalized least-squares |
|
|
448 | (3) |
|
9.4.4 CLIME and its adaptive version |
|
|
451 | (5) |
|
9.5 Latent Gaussian Graphical Models |
|
|
456 | (4) |
|
|
460 | (5) |
|
9.6.1 Proof of Theorem 9.1 |
|
|
460 | (1) |
|
9.6.2 Proof of Theorem 9.3 |
|
|
461 | (1) |
|
9.6.3 Proof of Theorem 9.4 |
|
|
462 | (1) |
|
9.6.4 Proof of Theorem 9.6 |
|
|
463 | (2) |
|
9.7 Bibliographical Notes |
|
|
465 | (1) |
|
|
466 | (5) |
|
10 Covariance Learning and Factor Models |
|
|
471 | (40) |
|
10.1 Principal Component Analysis |
|
|
471 | (3) |
|
10.1.1 Introduction to PCA |
|
|
471 | (2) |
|
|
473 | (1) |
|
10.2 Factor Models and Structured Covariance Learning |
|
|
474 | (9) |
|
10.2.1 Factor model and high-dimensional PCA |
|
|
475 | (3) |
|
10.2.2 Extracting latent factors and POET |
|
|
478 | (2) |
|
10.2.3 Methods for selecting number of factors |
|
|
480 | (3) |
|
10.3 Covariance and Precision Learning with Known Factors |
|
|
483 | (5) |
|
10.3.1 Factor model with observable factors |
|
|
483 | (2) |
|
10.3.2 Robust initial estimation of covariance matrix |
|
|
485 | (3) |
|
10.4 Augmented Factor Models and Projected PCA |
|
|
488 | (3) |
|
10.5 Asymptotic Properties |
|
|
491 | (4) |
|
10.5.1 Properties for estimating loading matrix |
|
|
491 | (2) |
|
10.5.2 Properties for estimating covariance matrices |
|
|
493 | (1) |
|
10.5.3 Properties for estimating realized latent factors |
|
|
494 | (1) |
|
10.5.4 Properties for estimating idiosyncratic components |
|
|
495 | (1) |
|
|
495 | (11) |
|
10.6.1 Proof of Theorem 10.1 |
|
|
495 | (5) |
|
10.6.2 Proof of Theorem 10.2 |
|
|
500 | (1) |
|
10.6.3 Proof of Theorem 10.3 |
|
|
501 | (3) |
|
10.6.4 Proof of Theorem 10.4 |
|
|
504 | (2) |
|
10.7 Bibliographical Notes |
|
|
506 | (1) |
|
|
507 | (4) |
|
11 Applications of Factor Models and PCA |
|
|
511 | (42) |
|
11.1 Factor-adjusted Regularized Model Selection |
|
|
511 | (7) |
|
11.1.1 Importance of factor adjustments |
|
|
512 | (1) |
|
|
513 | (1) |
|
11.1.3 Application to forecasting bond risk premia |
|
|
514 | (2) |
|
11.1.4 Application to a neuroblastoma data |
|
|
516 | (2) |
|
11.1.5 Asymptotic theory for FarmSelect |
|
|
518 | (1) |
|
11.2 Factor-adjusted Robust Multiple Testing |
|
|
518 | (10) |
|
11.2.1 False discovery rate control |
|
|
519 | (2) |
|
11.2.2 Multiple testing under dependence measurements |
|
|
521 | (2) |
|
11.2.3 Power of factor adjustments |
|
|
523 | (1) |
|
|
524 | (2) |
|
11.2.5 Application to neuroblastoma data |
|
|
526 | (2) |
|
11.3 Factor Augmented Regression Methods |
|
|
528 | (4) |
|
11.3.1 Principal component regression |
|
|
528 | (2) |
|
11.3.2 Augmented principal component regression |
|
|
530 | (1) |
|
11.3.3 Application to forecast bond risk premia |
|
|
531 | (1) |
|
11.4 Applications to Statistical Machine Learning |
|
|
532 | (16) |
|
11.4.1 Community detection |
|
|
533 | (6) |
|
|
539 | (1) |
|
|
540 | (2) |
|
|
542 | (3) |
|
11.4.5 Gaussian mixture models |
|
|
545 | (3) |
|
11.5 Bibliographical Notes |
|
|
548 | (2) |
|
|
550 | (3) |
|
|
553 | (54) |
|
12.1 Model-based Classifiers |
|
|
553 | (6) |
|
12.1.1 Linear and quadratic discriminant analysis |
|
|
553 | (4) |
|
12.1.2 Logistic regression |
|
|
557 | (2) |
|
12.2 Kernel Density Classifiers and Naive Bayes |
|
|
559 | (4) |
|
12.3 Nearest Neighbor Classifiers |
|
|
563 | (2) |
|
12.4 Classification Trees and Ensemble Classifiers |
|
|
565 | (10) |
|
12.4.1 Classification trees |
|
|
565 | (2) |
|
|
567 | (2) |
|
|
569 | (2) |
|
|
571 | (4) |
|
12.5 Support Vector Machines |
|
|
575 | (6) |
|
12.5.1 The standard support vector machine |
|
|
575 | (3) |
|
12.5.2 Generalizations of SVMs |
|
|
578 | (3) |
|
12.6 Sparse Classifiers via Penalized Empirical Loss |
|
|
581 | (5) |
|
12.6.1 The importance of sparsity under high-dimensionality |
|
|
581 | (2) |
|
12.6.2 Sparse support vector machines |
|
|
583 | (1) |
|
12.6.3 Sparse large margin classifiers |
|
|
584 | (2) |
|
12.7 Sparse Discriminant Analysis |
|
|
586 | (11) |
|
12.7.1 Nearest shrunken centroids classifier |
|
|
588 | (1) |
|
12.7.2 Features annealed independent rule |
|
|
589 | (2) |
|
12.7.3 Selection bias of sparse independence rules |
|
|
591 | (1) |
|
12.7.4 Regularized optimal affine discriminant |
|
|
592 | (1) |
|
12.7.5 Linear programming discriminant |
|
|
593 | (1) |
|
12.7.6 Direct sparse discriminant analysis |
|
|
594 | (2) |
|
12.7.7 Solution path equivalence between ROAD and DSDA |
|
|
596 | (1) |
|
12.8 Feature Augmention and Sparse Additive Classifiers |
|
|
597 | (5) |
|
12.8.1 Feature augmentation |
|
|
597 | (2) |
|
12.8.2 Penalized additive logistic regression |
|
|
599 | (1) |
|
12.8.3 Semiparametric sparse discriminant analysis |
|
|
600 | (2) |
|
12.9 Bibliographical Notes |
|
|
602 | (1) |
|
|
602 | (5) |
|
|
607 | (36) |
|
|
607 | (10) |
|
13.1.1 K-means clustering |
|
|
608 | (1) |
|
13.1.2 Hierarchical clustering |
|
|
609 | (2) |
|
13.1.3 Model-based clustering |
|
|
611 | (4) |
|
13.1.4 Spectral clustering |
|
|
615 | (2) |
|
13.2 Data-driven Choices of the Number of Clusters |
|
|
617 | (3) |
|
13.3 Variable Selection in Clustering |
|
|
620 | (7) |
|
|
620 | (2) |
|
13.3.2 Sparse model-based clustering |
|
|
622 | (2) |
|
13.3.3 Sparse mixture of experts model |
|
|
624 | (3) |
|
13.4 An Introduction to High Dimensional PCA |
|
|
627 | (3) |
|
13.4.1 Inconsistency of the regular PCA |
|
|
627 | (1) |
|
13.4.2 Consistency under sparse eigenvector model |
|
|
628 | (2) |
|
13.5 Sparse Principal Component Analysis |
|
|
630 | (9) |
|
|
630 | (3) |
|
13.5.2 An iterative SVD thresholding approach |
|
|
633 | (2) |
|
13.5.3 A penalized matrix decomposition approach |
|
|
635 | (1) |
|
13.5.4 A semidefinite programming approach |
|
|
636 | (1) |
|
13.5.5 A generalized power method |
|
|
637 | (2) |
|
13.6 Bibliographical Notes |
|
|
639 | (1) |
|
|
640 | (3) |
|
14 An Introduction to Deep Learning |
|
|
643 | (40) |
|
14.1 Rise of Deep Learning |
|
|
644 | (2) |
|
14.2 Feed-forward Neural Networks |
|
|
646 | (4) |
|
|
646 | (1) |
|
14.2.2 Back-propagation in computational graphs |
|
|
647 | (3) |
|
|
650 | (9) |
|
14.3.1 Convolutional neural networks |
|
|
651 | (3) |
|
14.3.2 Recurrent neural networks |
|
|
654 | (1) |
|
|
654 | (1) |
|
|
655 | (1) |
|
|
656 | (1) |
|
|
657 | (2) |
|
14.4 Deep Unsupervised Learning |
|
|
659 | (6) |
|
|
659 | (3) |
|
14.4.2 Generative adversarial networks |
|
|
662 | (1) |
|
14.4.2.1 Sampling view of GANs |
|
|
662 | (1) |
|
14.4.2.2 Minimum distance view of GANs |
|
|
663 | (2) |
|
14.5 Training deep neural nets |
|
|
665 | (6) |
|
14.5.1 Stochastic gradient descent |
|
|
666 | (1) |
|
|
666 | (1) |
|
14.5.1.2 Momentum-based SGD |
|
|
667 | (1) |
|
14.5.1.3 SGD with adaptive learning rates |
|
|
667 | (1) |
|
14.5.2 Easing numerical instability |
|
|
668 | (1) |
|
14.5.2.1 ReLU activation function |
|
|
668 | (1) |
|
14.5.2.2 Skip connections |
|
|
669 | (1) |
|
14.5.2.3 Batch normalization |
|
|
669 | (1) |
|
14.5.3 Regularization techniques |
|
|
670 | (1) |
|
|
670 | (1) |
|
|
670 | (1) |
|
14.5.3.3 Data augmentation |
|
|
671 | (1) |
|
14.6 Example: Image Classification |
|
|
671 | (2) |
|
14.7 Additional Examples using FensorFlow and R |
|
|
673 | (7) |
|
|
680 | (3) |
References |
|
683 | (48) |
Author Index |
|
731 | (12) |
Index |
|
743 | |