Preface |
|
xvii | |
Acknowledgments |
|
xxiii | |
About the Author |
|
xxv | |
|
I Qualitative Methodology |
|
|
1 | (66) |
|
1 Data in Action: A Model of a Dinner Party |
|
|
3 | (22) |
|
1.1 The User Data Disruption |
|
|
4 | (3) |
|
1.1.1 Don't Leave the Users out of the Model |
|
|
4 | (1) |
|
|
5 | (1) |
|
1.1.3 The Opposite of the Misguided Analyst: The Data Guru |
|
|
6 | (1) |
|
1.2 A Model of a Dinner Party |
|
|
7 | (6) |
|
1.2.1 Why Are Social Processes Difficult to Analyze? |
|
|
9 | (1) |
|
1.2.2 A Party Is a Process |
|
|
9 | (1) |
|
1.2.3 A Party Is an Open System |
|
|
10 | (1) |
|
1.2.4 A "Great" Party Is Hard to Define |
|
|
10 | (1) |
|
1.2.5 Party Guests' Motives and Opinions Are Often Unknown |
|
|
11 | (1) |
|
1.2.6 A Party Presents a Variable Search Problem |
|
|
12 | (1) |
|
1.2.7 The Real Secret to a Great Party Is Elusive |
|
|
12 | (1) |
|
1.3 What's Unique about User Data? |
|
|
13 | (10) |
|
1.3.1 Human Behavior Is a Process, Not a Problem |
|
|
13 | (2) |
|
1.3.2 No Clear and Defined Outcomes |
|
|
15 | (3) |
|
1.3.3 Social Systems Have Rampant Problems of Incomplete Information |
|
|
18 | (1) |
|
1.3.4 Social Systems Consist of Millions of Potential Behaviors |
|
|
18 | (1) |
|
1.3.5 Social Systems Are Often Open* Systems |
|
|
19 | (1) |
|
1.3.6 Inferring Causation Is Almost Impossible |
|
|
20 | (3) |
|
1.4 Why Does Causation Matter? |
|
|
23 | (1) |
|
|
24 | (1) |
|
2 Building a Theory of the Social Universe |
|
|
25 | (22) |
|
|
25 | (11) |
|
2.1.1 Won't Fancy Algorithms Solve All Our Problems? |
|
|
26 | (1) |
|
2.1.2 The Pervasive (and Generally Useless) One-Off Fact |
|
|
26 | (1) |
|
2.1.3 The Art of the Typology |
|
|
27 | (1) |
|
2.1.4 The Project Design Process: Theory Building |
|
|
28 | (1) |
|
2.1.5 Steps to a Good Theory |
|
|
29 | (1) |
|
2.1.6 Description: Questions and Goals |
|
|
30 | (1) |
|
2.1.7 Analytical: Theory and Concepts |
|
|
31 | (1) |
|
2.1.8 Qualities of a "Good" Theory |
|
|
32 | (4) |
|
2.2 Conceptualization and Measurement |
|
|
36 | (4) |
|
|
36 | (2) |
|
|
38 | (1) |
|
2.2.3 Hypothesis Generation |
|
|
39 | (1) |
|
2.3 Theories from a Web Product |
|
|
40 | (6) |
|
2.3.1 User Type Purchasing Model |
|
|
40 | (1) |
|
2.3.2 Feed Algorithm Model |
|
|
41 | (1) |
|
2.3.3 Middle School Dance Model |
|
|
42 | (4) |
|
|
46 | (1) |
|
3 The Coveted Goalpost: How to Change Human Behavior |
|
|
47 | (20) |
|
3.1 Understanding Actionable Insight |
|
|
47 | (3) |
|
3.2 It's All about Changing "Your" Behavior |
|
|
50 | (5) |
|
3.2.1 Is It True Behavior Change? |
|
|
51 | (1) |
|
3.2.2 Quitting Smoking: The Herculean Task of Behavioral Change |
|
|
52 | (1) |
|
3.2.3 Measuring Behavior Change |
|
|
53 | (2) |
|
3.3 A Theory about Human Behavioral Change |
|
|
55 | (4) |
|
|
55 | (1) |
|
|
56 | (1) |
|
3.3.3 Randomized Variable Investment Schedule |
|
|
56 | (1) |
|
3.3.4 Outsized Positive Rewards and Mitigated Losses |
|
|
57 | (1) |
|
3.3.5 Fogg Model of Change |
|
|
57 | (1) |
|
3.3.6 ABA Model of Change |
|
|
58 | (1) |
|
3.4 Change in a Web Product |
|
|
59 | (2) |
|
3.5 What Are Realistic Expectations for Behavioral Change? |
|
|
61 | (5) |
|
3.5.1 What Percentage of Users Will See a Real Change in Our Product? |
|
|
61 | (1) |
|
3.5.2 Are Certain Behaviors Easier to Change? |
|
|
62 | (2) |
|
3.5.3 Behavioral Change Worksheet |
|
|
64 | (2) |
|
|
66 | (1) |
|
II Basic Statistical Methods |
|
|
67 | (70) |
|
4 Distributions in User Analytics |
|
|
69 | (16) |
|
4.1 Why Are Metrics Important? |
|
|
69 | (13) |
|
4.1.1 Statistical Tools for Metric Development |
|
|
70 | (1) |
|
|
70 | (1) |
|
4.1.3 Exploring a Distribution |
|
|
71 | (1) |
|
4.1.4 Mean, Median, and Mode |
|
|
72 | (3) |
|
|
75 | (2) |
|
|
77 | (1) |
|
|
78 | (1) |
|
4.1.8 The Exponential Distribution |
|
|
79 | (1) |
|
4.1.9 Bivariate Distribution |
|
|
80 | (2) |
|
|
82 | (3) |
|
5 Retained? Metric Creation and Interpretation |
|
|
85 | (22) |
|
5.1 Period, Age, and Cohort |
|
|
85 | (6) |
|
|
86 | (1) |
|
|
86 | (1) |
|
|
86 | (1) |
|
|
86 | (1) |
|
5.1.5 Period versus Cohort? |
|
|
87 | (1) |
|
|
88 | (1) |
|
|
89 | (1) |
|
|
90 | (1) |
|
|
91 | (15) |
|
|
92 | (1) |
|
|
93 | (1) |
|
5.2.3 Ratio-Based Metrics |
|
|
93 | (4) |
|
|
97 | (3) |
|
|
100 | (2) |
|
|
102 | (1) |
|
|
103 | (3) |
|
|
106 | (1) |
|
6 Why Are My Users Leaving? The Ins and Outs of A/B Testing |
|
|
107 | (30) |
|
|
107 | (2) |
|
6.2 The Curious Case of Free Weekly Events |
|
|
109 | (4) |
|
6.2.1 Spurious Correlation |
|
|
110 | (2) |
|
|
112 | (1) |
|
|
113 | (4) |
|
6.3.1 Proportional Comparisons |
|
|
113 | (1) |
|
6.3.2 Linear Correlations |
|
|
114 | (2) |
|
6.3.3 Nonlinear Relationships |
|
|
116 | (1) |
|
|
117 | (2) |
|
6.5 The Nuts and Bolts of an A/B Test |
|
|
119 | (13) |
|
|
119 | (1) |
|
|
120 | (2) |
|
|
122 | (1) |
|
|
123 | (7) |
|
|
130 | (2) |
|
6.6 Pitfalls in A/B testing |
|
|
132 | (3) |
|
|
132 | (2) |
|
|
134 | (1) |
|
6.6.3 Differing Patterns Between Groups |
|
|
134 | (1) |
|
6.6.4 Differing Patterns in Long- and Short-Run Effects |
|
|
135 | (1) |
|
|
135 | (2) |
|
|
137 | (68) |
|
7 Modeling the User Space: Jr-Means and PCA |
|
|
139 | (12) |
|
|
139 | (1) |
|
7.2 Clustering Techniques |
|
|
140 | (10) |
|
7.2.1 Segmenting Users, Novice Users, and Unsupervised Learning |
|
|
141 | (9) |
|
|
150 | (1) |
|
8 Predicting User Behavior: Regression, Decision Trees, and Support Vector Machines |
|
|
151 | (22) |
|
|
151 | (1) |
|
8.2 Much Ado about Prediction? |
|
|
152 | (2) |
|
8.2.1 Applications of Predictive Algorithms |
|
|
153 | (1) |
|
8.2.2 Prediction in Behavioral Contexts Is Rarely Just Prediction |
|
|
154 | (1) |
|
|
154 | (15) |
|
8.3.1 Simple Explanation: Methods |
|
|
155 | (14) |
|
8.4 Validation of Supervised Learning Models |
|
|
169 | (3) |
|
8.4.1 k-Fold Cross-Validation |
|
|
169 | (1) |
|
8.4.2 Leave-One-Out Cross-Validation |
|
|
170 | (1) |
|
8.4.3 Precision, Recall, and the Fl-Score |
|
|
170 | (2) |
|
|
172 | (1) |
|
|
172 | (1) |
|
9 Forecasting Population Changes in Product: Demographic Projections |
|
|
173 | (32) |
|
9.1 Why Should We Spend Time on the Product Life Cycle? |
|
|
174 | (1) |
|
9.2 Birth, Death, and the Full Life Cycle |
|
|
174 | (3) |
|
9.3 Different Models of Retention |
|
|
177 | (6) |
|
9.3.1 The Transition Matrix |
|
|
178 | (1) |
|
9.3.2 Snowmobile Transition Example |
|
|
179 | (4) |
|
9.4 The Art of Population Prediction |
|
|
183 | (20) |
|
9.4.1 Population Projection Example |
|
|
184 | (1) |
|
9.4.2 User Death by a Thousand Cuts |
|
|
185 | (8) |
|
9.4.3 Exponential Growth Example |
|
|
193 | (10) |
|
|
203 | (2) |
|
IV Causal Inference Methods |
|
|
205 | (72) |
|
10 In Pursuit of the Experiment: Natural Experiments and Difference-ln-Difference Modeling |
|
|
207 | (18) |
|
10.1 Why Causal Inference? |
|
|
208 | (1) |
|
10.2 Causal Inference versus Prediction |
|
|
208 | (3) |
|
10.3 When A/B Testing Doesn't Work |
|
|
211 | (2) |
|
10.3.1 Broader Social Phenomena |
|
|
211 | (1) |
|
|
212 | (1) |
|
|
212 | (1) |
|
10.4 Nuts and Bolts of Causal Inference from Real-World Data |
|
|
213 | (9) |
|
10.4.1 Causal Inference Terminology |
|
|
213 | (1) |
|
10.4.2 Natural Experiments |
|
|
214 | (4) |
|
10.4.3 Operationalizing Geographic Space: Difference-in-Difference Modeling |
|
|
218 | (4) |
|
|
222 | (3) |
|
11 In Pursuit of the Experiment, Continued |
|
|
225 | (18) |
|
11.1 Regression Discontinuity |
|
|
226 | (3) |
|
11.1.1 Nuts and Bolts of RD |
|
|
226 | (1) |
|
11.1.2 Potential RD Designs |
|
|
226 | (1) |
|
11.1.3 The Enemy of the Good: Nonrandom Selection at the Cut Point |
|
|
227 | (1) |
|
|
228 | (1) |
|
|
229 | (1) |
|
11.2 Estimating the Causal Effect of Gaining a Badge |
|
|
229 | (5) |
|
|
230 | (2) |
|
11.2.2 Checking for Selection in Confounding Variables |
|
|
232 | (2) |
|
11.3 Interrupted Time Series |
|
|
234 | (4) |
|
11.3.1 Simple Regression Analysis |
|
|
235 | (1) |
|
11.3.2 Time-Series Modeling |
|
|
236 | (2) |
|
11.4 Seasonality Decomposition |
|
|
238 | (3) |
|
|
241 | (2) |
|
12 Developing Heuristics in Practice |
|
|
243 | (16) |
|
12.1 Determining Causation from Real-World Data |
|
|
243 | (1) |
|
12.2 Statistical Matching |
|
|
244 | (7) |
|
12.2.1 Basics of Matching |
|
|
244 | (1) |
|
12.2.2 What Features "Cause" a User to Buy? |
|
|
245 | (1) |
|
|
245 | (6) |
|
12.3 Problems with Propensity Score Matching |
|
|
251 | (2) |
|
12.3.1 Omitted Variable Bias and Better Matching Methods |
|
|
251 | (2) |
|
|
253 | (1) |
|
12.4 Matching as a Heuristic |
|
|
253 | (1) |
|
|
254 | (3) |
|
|
257 | (1) |
|
|
258 | (1) |
|
|
259 | (18) |
|
|
259 | (1) |
|
|
260 | (1) |
|
13.3 Understanding Uplift |
|
|
261 | (1) |
|
13.4 Prediction and Uplift |
|
|
261 | (1) |
|
13.5 Difficulties with Uplift |
|
|
262 | (13) |
|
|
263 | (1) |
|
|
263 | (1) |
|
13.5.3 Two-Model Approach |
|
|
264 | (1) |
|
|
265 | (1) |
|
13.5.5 Tree-Based Methods |
|
|
266 | (1) |
|
|
267 | (8) |
|
|
275 | (2) |
|
V Basic, Predictive, and Causal Inference Methods In R |
|
|
277 | (110) |
|
|
279 | (30) |
|
|
279 | (1) |
|
14.2 R Fundamentals: A Very Basic Introduction to R and Its Setup |
|
|
280 | (5) |
|
|
280 | (1) |
|
|
281 | (1) |
|
14.2.3 Installing Packages |
|
|
282 | (1) |
|
|
282 | (2) |
|
14.2.5 Reading Data into R |
|
|
284 | (1) |
|
|
284 | (1) |
|
14.3 Sampling from Distributions in R |
|
|
285 | (5) |
|
|
290 | (1) |
|
|
291 | (2) |
|
14.6 Calculating Variance and Higher ! Moments |
|
|
293 | (1) |
|
14.7 Histograms and Binning |
|
|
294 | (7) |
|
14.8 Bivariate Distribution and Correlation |
|
|
301 | (1) |
|
14.8.1 Calculating Metrics |
|
|
302 | (3) |
|
14.9 Parity Progression Ratios |
|
|
305 | (2) |
|
|
307 | (2) |
|
15 A/B Testing, Predictive Modeling, and Population Projection in R |
|
|
309 | (34) |
|
|
309 | (11) |
|
15.1.1 Statistical Testing |
|
|
310 | (8) |
|
|
318 | (2) |
|
|
320 | (4) |
|
|
1 | (320) |
|
15.2.2 Principal Components Analysis |
|
|
321 | (3) |
|
|
324 | (9) |
|
|
324 | (2) |
|
15.3.2 Logistic Regression |
|
|
326 | (1) |
|
|
327 | (3) |
|
15.3.4 Support Vector Machines |
|
|
330 | (1) |
|
|
331 | (2) |
|
15.4 Population Projection |
|
|
333 | (9) |
|
15.4.1 Example 1: User Death by a Thousand Cuts * |
|
|
336 | (3) |
|
15.4.2 Example 2: The Exponential Growth Example |
|
|
339 | (3) |
|
|
342 | (1) |
|
16 Regression Discontinuity, Matching, and Uplift in R |
|
|
343 | (44) |
|
16.1 Difference-in-Difference Modeling |
|
|
343 | (3) |
|
16.2 Regression Discontinuity and Time-Series Modeling |
|
|
346 | (11) |
|
16.2.1 Regression Discontinuity |
|
|
347 | (5) |
|
16.2.2 Interrupted Time Series |
|
|
352 | (4) |
|
16.2.3 Seasonality Decomposition |
|
|
356 | (1) |
|
16.3 Statistical, Matching |
|
|
357 | (13) |
|
|
366 | (1) |
|
|
367 | (1) |
|
|
368 | (2) |
|
|
370 | (13) |
|
16.4.1 Two-Model Solution |
|
|
371 | (1) |
|
|
372 | (3) |
|
16.4.3 Causal Conditional Inference Forest and Uplift Forest Models |
|
|
375 | (8) |
|
|
383 | (1) |
|
|
383 | (4) |
Conclusion |
|
387 | (4) |
Bibliography |
|
391 | (6) |
Index |
|
397 | |