Klientų aptarnavimas: +370 652 87781

Pagalba | Naujas vartotojas | Prisijungti

Machine Learning for Knowledge Discovery with R: Methodologies for Modeling, Inference and Prediction [Kietas viršelis]

Kao-Tai Tsai

Formatas: Hardback, 244 pages, aukštis x plotis: 234x156 mm, weight: 485 g, 98 Line drawings, black and white; 98 Illustrations, black and white
Išleidimo metai: 15-Sep-2021
Leidėjas: Chapman & Hall/CRC
ISBN-10: 1032065362
ISBN-13: 9781032065366

Kitos knygos pagal šią temą:

Probability & statistics - (Šiuo metu turimos knygos: 2 prekės)

Kietas viršelis
Kaina: 124,29 €
Knygas pristatysime per 3-4 savaites.
Kiekis:
- - 1
  - 2
  - 3
  - 4
  - 5
  - 6
  - 7
  - 8
  - 9
  - 10
Įdėti į krepšelį
Pristatymas per 4-6 savaites
Įtraukti į pageidavimų sąrašą
Bibliotekoms

Formatas: Hardback, 244 pages, aukštis x plotis: 234x156 mm, weight: 485 g, 98 Line drawings, black and white; 98 Illustrations, black and white
Išleidimo metai: 15-Sep-2021
Leidėjas: Chapman & Hall/CRC
ISBN-10: 1032065362
ISBN-13: 9781032065366

Kitos knygos pagal šią temą:

Probability & statistics - (Šiuo metu turimos knygos: 2 prekės)

Pastovi nuoroda: https://www.kriso.lt/db/9781032065366.html

Raktažodžiai:

"Machine Learning for Knowledge Discovery with R contains methodologies and examples for statistical modelling, inference, and prediction of data analysis. It includes many recent supervised and unsupervised machine learning methodologies such as recursive partitioning modelling, regularized regression, support vector machine, neural network, clustering, and causal-effect inference. Additionally, it emphasizes statistical thinking of data analysis, use of statistical graphs for data structure exploration, and result presentations. The book includes many real-world data examples from life-science, finance, etc. to illustrate the applications of the methods described therein"--

‘Machine Learning for Knowledge Discovery with R’ contains methodologies and examples for statistical modelling, inference, and prediction of data analysis. It includes most recent supervised and unsupervised machine learning methodologies

Machine Learning for Knowledge Discovery with R contains methodologies and examples for statistical modelling, inference, and prediction of data analysis. It includes many recent supervised and unsupervised machine learning methodologies such as recursive partitioning modelling, regularized regression, support vector machine, neural network, clustering, and causal-effect inference. Additionally, it emphasizes statistical thinking of data analysis, use of statistical graphs for data structure exploration, and result presentations. The book includes many real-world data examples from life-science, finance, etc. to illustrate the applications of the methods described therein.

Key Features:

Contains statistical theory for the most recent supervised and unsupervised machine learning methodologies.
Emphasizes broad statistical thinking, judgment, graphical methods, and collaboration with subject-matter-experts in analysis, interpretation, and presentations.
Written by statistical data analysis practitioner for practitioners.

The book is suitable for upper-level-undergraduate or graduate-level data analysis course. It also serves as a useful desk-reference for data analysts in scientific research or industrial applications.

Recenzijos

"A knowledgeable applied statistician with good math skills will likely appreciate the brevity of this presentation, as well as its clear descriptions about how to easily apply the methods in R. This book is likely best used as a quick reference for those already familiar with these methods, for when one wants to aplly a particular machine learning method."

Amit K. Chowdhry, University of Rochester, USA, Royal Statistical Society, Series A: Statistics in Society.

"I will definitely recommend this book without any reservation to individuals in data science or associated disciplines that utilize machine learning and predictive modelling strategies for quantitatively making inference of data sets."

- Reuben Adatorwovor, ISCB News, September 2022.

"This book is a must-read for those involved in data science, machine learning, and statistical analysis. It provides the necessary tools and knowledge to understand and apply various techniques in data analysis. I highly recommend this book for academics, professionals, and enthusiasts interested in advancing their understanding of machine learning and statistical analysis. This book promises to enlighten readers on the theory and equip them with the practical skills to apply these concepts in real-world situations."

- Aszani Aszani, Universitas Gadjah Mada, Indonesia, Technometrics, November 2023.

Preface

xiii

1 Data Analysis

(8)

1.1 Perspectives of Data Analysis

(2)

1.2 Strategies and Stages of Data Analysis

(1)

1.3 Data Quality

(3)

1.3.1 Heterogeneity in Data Sources

(1)

1.3.1.1 Heterogeneity in Study Subject Populations

(1)

1.3.1.2 Heterogeneity in Data due to Timing of Generations

(1)

1.3.2 Noise Accumulation

(1)

1.3.3 Spurious Correlation

(1)

1.3.4 Missing Data

(1)

1.4 Data Sets Analyzed in This Book

(2)

1.4.1 NCI-60

(1)

1.4.2 Riboflavin Production with Bacillus Subtilis

(1)

1.4.3 TCGA

(1)

1.4.4 The Boston Housing Data Set

(1)

2 Examining Data Distribution

(20)

2.1 One Dimension

(3)

2.1.1 Histogram, Stem-and-Leaf, Density Plot

(1)

2.1.2 Box Plot

(1)

2.1.3 Quantile-Quantile (Q-Q) Plot, Normal Plot, Probability-Probability (P-P) Plot

(1)

2.2 Two Dimension

(7)

2.2.1 Scatter Plot

(1)

2.2.2 Ellipse - Visualization of Covariance and Correlation

(4)

2.2.3 Multivariate Normality Test

(2)

2.3 More Than Two Dimension

(6)

2.3.1 Scatter Plot Matrix

(1)

2.3.2 Andrews's Plot

(3)

2.3.3 Conditional Plot

(2)

2.4 Visualization of Categorical Data

(4)

2.4.1 Mosaic Plot

(1)

2.4.2 Association Plot

(2)

3 Regressions

(30)

3.1 Ridge Regression

(1)

3.2 Lasso

(4)

3.2.1 Example: Lasso on Continuous Data

(1)

3.2.2 Example: Lasso on Binary Data

(1)

3.2.3 Example: Lasso on Survival Data

(1)

3.3 Group Lasso

(3)

3.3.1 Example: Group Lasso on Gene Signatures

(2)

3.4 Sparse Group Lasso

(8)

3.4.1 Example: Lasso, Group Lasso, Sparse Group Lasso on Simulated Continuous Data

(3)

3.4.2 Example: Lasso, Group Lasso, Sparse Group Lasso on Gene Signatures Continuous Data

(4)

3.5 Adaptive Lasso

(4)

3.5.1 Example: Adaptive Lasso on Continuous Data

(1)

3.5.2 Example: Adaptive Lasso on Binary Data

(2)

3.6 Elastic Net

(4)

3.6.1 Example: Elastic Net on Continuous Data

(1)

3.6.2 Example: Elastic Net on Binary Data

(1)

3.7 The Sure Screening Method

(4)

3.7.1 The Sure Screening Method

(1)

3.7.2 Sure Independence Screening on Model Selection

(1)

3.7.3 Example: SIS on Continuous Data

(1)

3.7.4 Example: SIS on Survival Data

(1)

3.8 Identify Minimal Class of Models

(2)

3.8.1 Analysis Using Minimal Models

(1)

4 Recursive Partitioning Modeling

(42)

4.1 Recursive Partitioning Modeling via Trees

(11)

4.1.1 Elements of Growing a Tree

(1)

4.1.1.1 Grow a Tree

(1)

4.1.2 The Impurity Function

(1)

4.1.2.1 Definition of Impurity Function

(1)

4.1.2.2 Measure of Node Impurity - the Gini Index

(1)

4.1.3 Misclassification Cost

(1)

4.1.4 Size of Trees

(1)

4.1.5 Example of Recursive Partitioning

(7)

4.1.5.1 Recursive Partitioning with Binary Outcomes

(2)

4.1.5.2 Recursive Partitioning with Continuous Outcomes

(2)

4.1.5.3 Recursive Partitioning for Survival Outcomes

(3)

4.2 Random Forest

(7)

4.2.1 Mechanism of Action of Random Forests

(1)

4.2.2 Variable Importance

(1)

4.2.3 Random Forests for Regression

(1)

4.2.4 Example of Random Forest Data Analysis

(4)

4.2.4.1 randomForest for Binary Data

(3)

4.2.4.2 randomForest for Continuous Data

(1)

4.3 Random Survival Forest

(4)

4.3.1 Algorithm to Construct RSF

(1)

4.3.2 Individual and Ensemble Estimate at Terminal Nodes

(1)

4.3.3 VIMP

(1)

4.3.4 Example

(2)

4.4 XGBoost: A Tree Boosting System

(7)

4.4.1 Example Using xgboost for Data Analysis

(4)

4.4.1.1 xgboost for Binary Data

(1)

4.4.1.2 xgboost for Continuous Data

(3)

4.4.2 Example - xgboost for Cox Regression

(1)

4.5 Model-based Recursive Partitioning

(3)

4.5.1 The Recursive Partitioning Algorithm

(1)

4.5.2 Example

(2)

4.6 Recursive Partition for Longitudinal Data

(4)

4.6.1 Methodology

(1)

4.6.2 Recursive Partition for Longitudinal Data Based on Baseline Covariates

(1)

4.6.2.1 Methodology

(1)

4.6.3 LongCART Algorithm

(1)

4.6.4 Example of Recursive Partitioning of Longitudinal Data

(2)

4.7 Analysis of Ordinal Data

(1)

4.8 Examples - Analysis of Ordinal Data

(3)

4.8.1 Analysis of Cleveland Clinic Heart Data (Ordinal)

(1)

4.8.2 Analysis of Cleveland Clinic Heart Data (Twoing)

(2)

4.9 Advantages and Disadvantages of Trees

(2)

5 Support Vector Machine

101

(28)

5.1 General Theory of Classification and Regression in Hyperplane

101

(3)

5.1.1 Separable Case

102

(1)

5.1.2 Non-separable Case

102

(2)

5.1.2.1 Method of Stochastic Approximation

103

(1)

5.1.2.2 Method of Sigmoid Approximations

103

(1)

5.1.2.3 Method of Radial Basis Functions

104

(1)

5.2 SVM for Indicator Functions

104

(8)

5.2.1 Optimal Hyperplane for Separable Data Sets

104

(2)

5.2.1.1 Constructing the Optimal Hyperplane

105

(1)

5.2.2 Optimal Hyperplane for Non-Separable Sets

106

(2)

5.2.2.1 Generalization of the Optimal Hyperplane

106

(2)

5.2.3 Support Vector Machine

108

(1)

5.2.4 Constructing SVM

109

(1)

5.2.4.1 Polynomial Kernel Functions

110

(1)

5.2.4.2 Radial Basis Kernel Functions

110

(1)

5.2.5 Example: Analysis of Binary Classification Using SVM

110

(2)

5.2.6 Example: Effect of Kernel Selection

112

(1)

5.3 SVM for Continuous Data

112

(5)

5.3.1 Minimizing the Risk with f-insensitive Loss Functions

113

(2)

5.3.2 Example: Regression Analysis Using SVM

115

(2)

5.4 SVM for Survival Data Analysis

117

(2)

5.4.1 Example: Analysis of Survival Data Using SVM

118

(1)

5.5 Feature Elimination for SVM

119

(3)

5.5.1 Example: Gene Selection via SVM with Feature Elimination

120

(2)

5.6 Spare Bayesian Learning with Relevance Vector Machine (RVM)

122

(5)

5.6.1 Example: Regression Analysis Using RVM

125

(1)

5.6.2 Example: Curve Fitting for SVM and RVM

125

(2)

5.7 SV Machines for Function Estimation

127

(2)

6 Cluster Analysis

129

(26)

6.1 Measure of Distance/Dissimilarity

129

(2)

6.1.1 Continuous Variables

130

(1)

6.1.2 Binary and Categorical Variables

130

(1)

6.1.3 Mixed Data Types

130

(1)

6.1.4 Other Measure of Dissimilarity

131

(1)

6.2 Hierarchical Clustering

131

(4)

6.2.1 Options of Linkage

132

(1)

6.2.2 Example of Hierarchical Clustering

133

(2)

6.3 K-means Cluster

135

(4)

6.3.1 General Description of K-means Clustering

135

(2)

6.3.2 Estimating the Number of Clusters

137

(2)

6.4 The PAM Clustering Algorithm

139

(2)

6.4.1 Example of K-means with PAM Clustering Algorithm

141

(1)

6.5 Bagged Clustering

141

(3)

6.5.1 Example of Bagged Clustering

142

(2)

6.6 RandomForest for Clustering

144

(1)

6.6.1 Example: Random Forest for Clustering

144

(1)

6.7 Mixture Models/Model-based Cluster Analysis

145

(2)

6.8 Stability of Clusters

147

(1)

6.9 Consensus Clustering

147

(4)

6.9.1 Determination of Clusters

148

(1)

6.9.2 Example of Consensus Clustering on RNA Sequence Data

149

(2)

6.10 The Integrative Clustering Framework

151

(4)

6.10.1 Example: Integrative Clustering

152

(3)

7 Neural Network

155

(18)

7.1 General Theory of Neural Network

155

(1)

7.2 Elemental Aspects and Structure of Artificial Neural Networks

156

(1)

7.3 Multilayer Perceptrons

157

(1)

7.3.1 The Simple (Single Unit) Perceptron

157

(1)

7.3.2 Training Perceptron Learning

157

(1)

7.4 Multilayer Perceptrons (MLP)

158

(1)

7.4.1 Architectures of MLP

158

(1)

7.4.2 Training MLP

159

(1)

7.5 Deep Learning

159

(2)

7.5.1 Model Parameterization

160

(1)

7.6 Few Pros and Cons of Neural Networks

161

(1)

7.7 Examples

162

(11)

8 Causal Inference and Matching

173

(24)

8.1 Introduction

173

(1)

8.2 Three Layer Causal Hierarchy

173

(1)

8.3 Seven Tools of Causal Inference

174

(2)

8.4 Statistical Framework of Causal Inferences

176

(1)

8.5 Propensity Score

177

(1)

8.6 Methodologies of Matching

178

(6)

8.6.1 Nearest Neighbor (or greedy) Matching

178

(2)

8.6.1.1 Example Using Nearest Neighbor Matching

178

(2)

8.6.2 Exact Matching

180

(1)

8.6.2.1 Example

180

(1)

8.6.3 Mahalanobis Distance Matching

181

(1)

8.6.3.1 Example

181

(1)

8.6.4 Genetic Matching

182

(4)

8.6.4.1 Example

183

(1)

8.7 Optimal Matching

184

(2)

8.7.0.1 Example

185

(1)

8.8 Full Matching

186

(5)

8.8.0.1 Example

187

(1)

8.8.1 Analysis of Data After Matching

188

(3)

8.8.1.1 Example

189

(2)

8.9 Cluster Matching

191

(6)

8.9.1 Example

192

(5)

9 Business

197

(24)

9.1 Case Study One: Marketing Campaigns of a Portuguese Banking Institution

197

(6)

9.1.1 Description of Data

197

(1)

9.1.2 Data Analysis

198

(6)

9.1.2.1 Analysis via Lasso

198

(1)

9.1.2.2 Analysis via Elastic Net

198

(1)

9.1.2.3 Analysis via SIS

199

(1)

9.1.2.4 Analysis via rpart

200

(1)

9.1.2.5 Analysis via randomForest

200

(2)

9.1.2.6 Analysis via xgboost

202

(1)

9.2 Summary

203

(1)

9.3 Case Study Two: Polish Companies Bankruptcy Data

204

(14)

9.3.1 Description of Data

204

(2)

9.3.2 Data Analysis

206

(19)

9.3.2.1 Analysis of Year-1 Data (univariate analysis)

207

(2)

9.3.2.2 Analysis of Year-3 Data (univariate analysis)

209

(1)

9.3.2.3 Analysis of Year-5 Data (univariate analysis)

210

(2)

9.3.2.4 Analysis of Year-1 Data (composite analysis)

212

(2)

9.3.2.5 Analysis of Year-3 Data (composite analysis)

214

(2)

9.3.2.6 Analysis of Year-5 Data (composite analysis)

216

(2)

9.4 Summary

218

(3)

10 Analysis of Response Profiles

221

(14)

10.1 Introduction

221

(1)

10.2 Data Example

221

(3)

10.3 Transition of Response States

224

(1)

10.4 Classification of Response Profiles

225

(5)

10.4.1 Dissimilarities Between Response Profiles

225

(1)

10.4.2 Visualizing Clusters via Multidimensional Scaling

226

(1)

10.4.3 Response Profile Differences among Clusters

227

(1)

10.4.4 Significant Clinical Variables for Each Cluster

228

(2)

10.5 Modeling of Response Profiles via GEE

230

(3)

10.5.1 Marginal Models

230

(1)

10.5.2 Estimation of Marginal Regression Parameters

231

(1)

10.5.3 Local Odds Ratio

231

(1)

10.5.4 Results of Modeling

231

(2)

10.6 Summary

233

(2)

Bibliography

235

Index

Kao-Tai Tsai obtained his Ph.D. in Mathematical Statistics from University of California, San Diego and had worked at AT&T Bell Laboratories to conduct statistical research, modelling, and exploratory data analysis. After that, he joined the US FDA and later pharmaceutical companies focusing on biostatistics, clinical trial research and data analysis to address the unmet needs in human health.

Machine Learning for Knowledge Discovery with R: Methodologies for Modeling, Inference and Prediction [Kietas viršelis]

Recenzijos

Paskyra ir nustatymai

Paieška

Ieškoti duomenų bazėje

Patikslinti paiešką

Temos Temos anglų kalba

Pasirinkti pirkinių krepšelį