Foreword |
|
xiii | |
|
Introduction |
|
xv | |
|
1 The Importance of Regression in People Analytics |
|
|
1 | (8) |
|
1.1 Why is regression modeling so important in people analytics? |
|
|
2 | (1) |
|
1.2 What do we mean by `modeling'? |
|
|
3 | (3) |
|
1.2.1 The theory of inferential modeling |
|
|
3 | (2) |
|
1.2.2 The process of inferential modeling |
|
|
5 | (1) |
|
1.3 The structure, system and organization of this book |
|
|
6 | (3) |
|
2 The Basics of the R Programming Language |
|
|
9 | (30) |
|
|
10 | (1) |
|
|
10 | (1) |
|
|
11 | (7) |
|
|
13 | (1) |
|
2.3.2 Homogeneous data structures |
|
|
14 | (2) |
|
2.3.3 Heterogeneous data structures |
|
|
16 | (2) |
|
2.4 Working with dataframes |
|
|
18 | (6) |
|
2.4.1 Loading and tidying data in dataframes |
|
|
18 | (4) |
|
2.4.2 Manipulating dataframes |
|
|
22 | (2) |
|
2.5 Functions, packages and libraries |
|
|
24 | (5) |
|
|
24 | (1) |
|
2.5.2 Help with functions |
|
|
25 | (1) |
|
2.5.3 Writing your own functions |
|
|
26 | (1) |
|
2.5.4 Installing packages |
|
|
26 | (1) |
|
|
27 | (1) |
|
|
28 | (1) |
|
2.6 Errors, warnings and messages |
|
|
29 | (2) |
|
2.7 Plotting and graphing |
|
|
31 | (3) |
|
|
31 | (2) |
|
2.7.2 Specialist plotting and graphing packages |
|
|
33 | (1) |
|
2.8 Documenting your work using R Markdown |
|
|
34 | (3) |
|
|
37 | (2) |
|
2.9.1 Discussion questions |
|
|
37 | (1) |
|
|
38 | (1) |
|
|
39 | (26) |
|
3.1 Elementary descriptive statistics of populations and samples |
|
|
40 | (6) |
|
3.1.1 Mean, variance and standard deviation |
|
|
40 | (3) |
|
3.1.2 Covariance and correlation |
|
|
43 | (3) |
|
3.2 Distribution of random variables |
|
|
46 | (3) |
|
3.2.1 Sampling of random variables |
|
|
46 | (1) |
|
3.2.2 Standard errors, the t-distribution and confidence intervals |
|
|
47 | (2) |
|
|
49 | (9) |
|
3.3.1 Testing for a difference in means (Welch's i-test) |
|
|
51 | (3) |
|
3.3.2 Testing for a non-zero correlation between two variables (i-test for correlation) |
|
|
54 | (2) |
|
3.3.3 Testing for a difference in frequency distribution between different categories in a data set (Chi-square test) |
|
|
56 | (2) |
|
3.4 Foundational statistics in Python |
|
|
58 | (4) |
|
|
62 | (3) |
|
3.5.1 Discussion tiuestions |
|
|
62 | (1) |
|
|
63 | (2) |
|
4 Linear Regression for Continuous Outcomes |
|
|
65 | (36) |
|
|
65 | (4) |
|
4.1.1 Origins and intuition of linear regression |
|
|
65 | (1) |
|
4.1.2 Use cases for linear regression |
|
|
66 | (1) |
|
4.1.3 Walkthrough example |
|
|
67 | (2) |
|
4.2 Simple linear regression |
|
|
69 | (7) |
|
4.2.1 Linear relationship between a single input and an outcome |
|
|
70 | (1) |
|
4.2.2 Minimising the error |
|
|
70 | (3) |
|
4.2.3 Determining the best fit |
|
|
73 | (1) |
|
4.2.4 Measuring the fit of the model |
|
|
74 | (2) |
|
4.3 Multiple linear regression |
|
|
76 | (6) |
|
4.3.1 Running a multiple linear regression model and interpreting its coefficients |
|
|
76 | (1) |
|
4.3.2 Coefficient confidence |
|
|
77 | (1) |
|
4.3.3 Model `goodness-of-fit' |
|
|
78 | (3) |
|
4.3.4 Making predictions from your model |
|
|
81 | (1) |
|
4.4 Managing inputs in linear regression |
|
|
82 | (4) |
|
4.4.1 Relevance of input variables |
|
|
83 | (1) |
|
4.4.2 Sparseness (`missingness') of data |
|
|
83 | (1) |
|
4.4.3 Transforming categorical inputs to dummy variables |
|
|
84 | (2) |
|
4.5 Testing your model assumptions |
|
|
86 | (7) |
|
4.5.1 Assumption of linearity and additivity |
|
|
86 | (2) |
|
4.5.2 Assumption of constant error variance |
|
|
88 | (1) |
|
4.5.3 Assumption of normally distributed errors |
|
|
89 | (1) |
|
4.5.4 Avoiding high collinearity and multicollinearity between input variables |
|
|
90 | (3) |
|
4.6 Extending multiple linear regression |
|
|
93 | (4) |
|
4.6.1 Interactions between input variables |
|
|
93 | (3) |
|
4.6.2 Quadratic and higher-order polynomial terms |
|
|
96 | (1) |
|
|
97 | (4) |
|
4.7.1 Discussion questions |
|
|
97 | (1) |
|
|
97 | (4) |
|
5 Binomial Logistic Regression for Binary Outcomes |
|
|
101 | (26) |
|
|
102 | (4) |
|
5.1.1 Origins and intuition of binomial logistic regression |
|
|
102 | (1) |
|
5.1.2 Use cases for binomial logistic regression |
|
|
103 | (1) |
|
5.1.3 Walkthrough example |
|
|
104 | (2) |
|
5.2 Modeling probabilistic outcomes using a logistic function |
|
|
106 | (6) |
|
5.2.1 Deriving the concept of log odds |
|
|
107 | (2) |
|
5.2.2 Modeling the log odds and interpreting the coefficients |
|
|
109 | (1) |
|
5.2.3 Odds versus probability |
|
|
110 | (2) |
|
5.3 Running a multivariate binomial logistic regression model |
|
|
112 | (10) |
|
5.3.1 Running and interpreting a multivariate binomial logistic regression model |
|
|
113 | (3) |
|
5.3.2 Understanding the fit and goodness-of-fit of a binomial logistic regression model |
|
|
116 | (4) |
|
|
120 | (2) |
|
5.4 Other considerations in binomial logistic regression |
|
|
122 | (2) |
|
|
124 | (3) |
|
5.5.1 Discussion questions |
|
|
124 | (1) |
|
|
124 | (3) |
|
6 Multinomial Logistic Regression for Nominal Category Outcomes |
|
|
127 | (16) |
|
|
127 | (4) |
|
6.1.1 Intuition for multinomial logistic regression |
|
|
127 | (1) |
|
6.1.2 Use cases for multinomial logistic regression |
|
|
128 | (1) |
|
6.1.3 Walkthrough example |
|
|
128 | (3) |
|
6.2 Running stratified binomial models |
|
|
131 | (2) |
|
6.2.1 Modeling the choice of Product A versus other products |
|
|
131 | (2) |
|
6.2.2 Modeling other choices |
|
|
133 | (1) |
|
6.3 Running a multinomial regression model |
|
|
133 | (5) |
|
6.3.1 Defining a reference level and running the model |
|
|
134 | (2) |
|
6.3.2 Interpreting the model |
|
|
136 | (1) |
|
6.3.3 Changing the reference |
|
|
137 | (1) |
|
6.4 Model simplification, fit and goodness-of-fit for multinomial logistic regression models |
|
|
138 | (2) |
|
6.4.1 Gradual safe elimination of variables |
|
|
138 | (1) |
|
6.4.2 Model fit and goodness-of-fit |
|
|
139 | (1) |
|
|
140 | (3) |
|
6.5.1 Discussion questions |
|
|
140 | (1) |
|
|
141 | (2) |
|
7 Proportional Odds Logistic Regression for Ordered Category Outcomes |
|
|
143 | (20) |
|
|
143 | (5) |
|
7.1.1 Intuition for proportional odds logistic regression |
|
|
143 | (2) |
|
7.1.2 Use cases for proportional odds logistic regression |
|
|
145 | (1) |
|
7.1.3 Walkthrough example |
|
|
145 | (3) |
|
7.2 Modeling ordinal outcomes under the assumption of proportional odds |
|
|
148 | (7) |
|
7.2.1 Using a latent continuous outcome variable to derive a proportional odds model |
|
|
148 | (2) |
|
7.2.2 Running a proportional odds logistic regression model |
|
|
150 | (3) |
|
7.2.3 Calculating the likelihood of an observation being in a specific ordinal category |
|
|
153 | (1) |
|
|
154 | (1) |
|
7.3 Testing the proportional odds assumption |
|
|
155 | (4) |
|
7.3.1 Sighting the coefficients of stratified binomial models |
|
|
156 | (1) |
|
7.3.2 The Brant-Wald test |
|
|
157 | (1) |
|
7.3.3 Alternatives to proportional odds models |
|
|
158 | (1) |
|
|
159 | (4) |
|
7.4.1 Discussion questions |
|
|
159 | (1) |
|
|
160 | (3) |
|
8 Modeling Explicit and Latent Hierarchy in Data |
|
|
163 | (24) |
|
8.1 Mixed models for explicit hierarchy in data |
|
|
164 | (6) |
|
8.1.1 Fixed and random effects |
|
|
164 | (1) |
|
8.1.2 Running a mixed model |
|
|
165 | (5) |
|
8.2 Structural equation models for latent hierarchy in data |
|
|
170 | (15) |
|
8.2.1 Running and assessing the measurement model |
|
|
173 | (7) |
|
8.2.2 Running and interpreting the structural model |
|
|
180 | (5) |
|
|
185 | (2) |
|
8.3.1 Discussion questions |
|
|
185 | (1) |
|
|
185 | (2) |
|
9 Survival Analysis for Modeling Singular Events Over Time |
|
|
187 | (16) |
|
9.1 Tracking and illustrating survival rates over the study period |
|
|
189 | (4) |
|
9.2 Cox proportional hazard regression models |
|
|
193 | (4) |
|
9.2.1 Running a Cox proportional hazard regression model |
|
|
194 | (2) |
|
9.2.2 Checking the proportional hazard assumption |
|
|
196 | (1) |
|
|
197 | (3) |
|
|
200 | (3) |
|
9.4.1 Discussion questions |
|
|
200 | (1) |
|
|
201 | (2) |
|
10 Alternative Technical Approaches in R and Python |
|
|
203 | (18) |
|
10.1 `Tidier' modeling approaches in R |
|
|
204 | (5) |
|
|
204 | (4) |
|
10.1.2 The parsnip package |
|
|
208 | (1) |
|
10.2 Inferential statistical modeling in Python |
|
|
209 | (12) |
|
10.2.1 Ordinary Least Squares (OLS) linear regression |
|
|
209 | (2) |
|
10.2.2 Binomial logistic regression |
|
|
211 | (1) |
|
10.2.3 Multinomial logistic regression |
|
|
212 | (1) |
|
10.2.4 Structural equation models |
|
|
213 | (2) |
|
|
215 | (3) |
|
10.2.6 Other model variants |
|
|
218 | (3) |
|
11 Power Analysis to Estimate Required Sample Sizes for Modeling |
|
|
221 | (14) |
|
11.1 Errors, effect sizes and statistical power |
|
|
222 | (2) |
|
11.2 Power analysis for simple hypothesis tests |
|
|
224 | (4) |
|
11.3 Power analysis for linear regression models |
|
|
228 | (1) |
|
11.4 Power analysis for log-likelihood regression models |
|
|
229 | (2) |
|
11.5 Power analysis for hierarchicarregression models |
|
|
231 | (1) |
|
11.6 Power analysis using Python |
|
|
232 | (3) |
|
12 Further Exercises for Practice |
|
|
235 | (12) |
|
12.1 Analyzing graduate salaries |
|
|
235 | (2) |
|
12.1.1 The graduates data set |
|
|
236 | (1) |
|
12.1.2 Discussion questions |
|
|
236 | (1) |
|
|
236 | (1) |
|
12.2 Analyzing a recruiting process |
|
|
237 | (2) |
|
12.2.1 The recruiting data set |
|
|
238 | (1) |
|
12.2.2 Discussion questions |
|
|
238 | (1) |
|
|
239 | (1) |
|
12.3 Analyzing the drivers of performance ratings |
|
|
239 | (2) |
|
12.3.1 The employee performance data set |
|
|
240 | (1) |
|
12.3.2 Discussion questions |
|
|
240 | (1) |
|
|
241 | (1) |
|
12.4 Analyzing promotion differences between groups |
|
|
241 | (2) |
|
12.4.1 The promotion data set |
|
|
242 | (1) |
|
12.4.2 Discussion questions |
|
|
242 | (1) |
|
|
242 | (1) |
|
12.5 Analyzing feedback on learning programs |
|
|
243 | (4) |
|
12.5.1 The learning data set |
|
|
243 | (1) |
|
12.5.2 Discussion questions |
|
|
244 | (1) |
|
|
244 | (3) |
References |
|
247 | (2) |
Glossary |
|
249 | (4) |
Index |
|
253 | |