List of Tables |
|
xiii | |
List of Figures |
|
xv | |
Preface |
|
xvii | |
1 Introduction |
|
1 | (31) |
|
|
3 | (4) |
|
|
7 | (3) |
|
|
10 | (17) |
|
1.3.1 Arithmetic Operations |
|
|
10 | (2) |
|
|
12 | (2) |
|
|
14 | (2) |
|
|
16 | (4) |
|
|
20 | (3) |
|
|
23 | (1) |
|
|
24 | (1) |
|
1.3.8 Programming and Learning Tips |
|
|
25 | (2) |
|
|
27 | (1) |
|
|
28 | (4) |
|
1.5.1 Bias in Self-Reported Turnout |
|
|
28 | (1) |
|
1.5.2 Understanding World Population Dynamics |
|
|
29 | (3) |
2 Causality |
|
32 | (43) |
|
2.1 Racial Discrimination in the Labor Market |
|
|
32 | (4) |
|
2.2 Subsetting the Data in R |
|
|
36 | (10) |
|
2.2.1 Logical Values and Operators |
|
|
37 | (2) |
|
2.2.2 Relational Operators |
|
|
39 | (1) |
|
|
40 | (3) |
|
2.2.4 Simple Conditional Statements |
|
|
43 | (1) |
|
|
44 | (2) |
|
2.3 Causal Effects and the Counterfactual |
|
|
46 | (2) |
|
2.4 Randomized Controlled Trials |
|
|
48 | (6) |
|
2.4.1 The Role of Randomization |
|
|
49 | (2) |
|
2.4.2 Social Pressure and Voter Turnout |
|
|
51 | (3) |
|
2.5 Observational Studies |
|
|
54 | (9) |
|
2.5.1 Minimum Wage and Unemployment |
|
|
54 | (3) |
|
|
57 | (3) |
|
2.5.3 Before-and-After and Difference-in-Differences Designs |
|
|
60 | (3) |
|
2.6 Descriptive Statistics for a Single Variable |
|
|
63 | (5) |
|
|
63 | (3) |
|
|
66 | (2) |
|
|
68 | (1) |
|
|
69 | (6) |
|
2.8.1 Efficacy of Small Class Size in Early Education |
|
|
69 | (2) |
|
2.8.2 Changing Minds on Gay Marriage |
|
|
71 | (2) |
|
2.8.3 Success of Leader Assassination as a Natural Experiment |
|
|
73 | (2) |
3 Measurement |
|
75 | (48) |
|
3.1 Measuring Civilian Victimization during Wartime |
|
|
75 | (3) |
|
3.2 Handling Missing Data in R |
|
|
78 | (2) |
|
3.3 Visualizing the Univariate Distribution |
|
|
80 | (8) |
|
|
80 | (1) |
|
|
81 | (4) |
|
|
85 | (2) |
|
3.3.4 Printing and Saving Graphs |
|
|
87 | (1) |
|
|
88 | (8) |
|
3.4.1 The Role of Randomization |
|
|
89 | (4) |
|
3.4.2 Nonresponse and Other Sources of Bias |
|
|
93 | (3) |
|
3.5 Measuring Political Polarization |
|
|
96 | (1) |
|
3.6 Summarizing Bivariate Relationships |
|
|
97 | (11) |
|
|
98 | (3) |
|
|
101 | (4) |
|
3.6.3 Quantile-Quantile Plot |
|
|
105 | (3) |
|
|
108 | (7) |
|
|
108 | (2) |
|
|
110 | (1) |
|
3.7.3 The k-Means Algorithm |
|
|
111 | (4) |
|
|
115 | (1) |
|
|
116 | (7) |
|
3.9.1 Changing Minds on Gay Marriage: Revisited |
|
|
116 | (2) |
|
3.9.2 Political Efficacy in China and Mexico |
|
|
118 | (2) |
|
3.9.3 Voting in the United Nations General Assembly |
|
|
120 | (3) |
4 Prediction |
|
123 | (66) |
|
4.1 Predicting Election Outcomes |
|
|
123 | (16) |
|
|
124 | (3) |
|
4.1.2 General Conditional Statements in R |
|
|
127 | (3) |
|
|
130 | (9) |
|
|
139 | (22) |
|
4.2.1 Facial Appearance and Election Outcomes |
|
|
139 | (2) |
|
4.2.2 Correlation and Scatter Plots |
|
|
141 | (2) |
|
|
143 | (5) |
|
4.2.4 Regression towards the Mean |
|
|
148 | (1) |
|
4.2.5 Merging Data Sets in R |
|
|
149 | (7) |
|
|
156 | (5) |
|
4.3 Regression and Causation |
|
|
161 | (20) |
|
4.3.1 Randomized Experiments |
|
|
162 | (3) |
|
4.3.2 Regression with Multiple Predictors |
|
|
165 | (5) |
|
4.3.3 Heterogenous Treatment Effects |
|
|
170 | (6) |
|
4.3.4 Regression Discontinuity Design |
|
|
176 | (5) |
|
|
181 | (1) |
|
|
182 | (7) |
|
4.5.1 Prediction Based on Betting Markets |
|
|
182 | (2) |
|
4.5.2 Election and Conditional Cash Transfer Program in Mexico |
|
|
184 | (3) |
|
4.5.3 Government Transfer and Poverty Reduction in Brazil |
|
|
187 | (2) |
5 Discovery |
|
189 | (53) |
|
|
189 | (16) |
|
5.1.1 The Disputed Authorship of The Federalist Papers |
|
|
189 | (5) |
|
5.1.2 Document-Term Matrix |
|
|
194 | (1) |
|
|
195 | (5) |
|
5.1.4 Authorship Prediction |
|
|
200 | (2) |
|
|
202 | (3) |
|
|
205 | (15) |
|
5.2.1 Marriage Network in Renaissance Florence |
|
|
205 | (2) |
|
5.2.2 Undirected Graph and Centrality Measures |
|
|
207 | (4) |
|
5.2.3 Twitter-Following Network |
|
|
211 | (2) |
|
5.2.4 Directed Graph and Centrality |
|
|
213 | (7) |
|
|
220 | (15) |
|
5.3.1 The 1854 Cholera Outbreak in London |
|
|
220 | (3) |
|
|
223 | (3) |
|
|
226 | (2) |
|
5.3.4 US Presidential Elections |
|
|
228 | (3) |
|
5.3.5 Expansion of Walmart |
|
|
231 | (2) |
|
|
233 | (2) |
|
|
235 | (1) |
|
|
236 | (6) |
|
5.5.1 Analyzing the Preambles of Constitutions |
|
|
236 | (2) |
|
5.5.2 International Trade Network |
|
|
238 | (1) |
|
5.5.3 Mapping US Presidential Election Results over Time |
|
|
239 | (3) |
6 Probability |
|
242 | (72) |
|
|
242 | (12) |
|
6.1.1 Frequentist versus Bayesian |
|
|
242 | (2) |
|
6.1.2 Definition and Axioms |
|
|
244 | (3) |
|
|
247 | (3) |
|
6.1.4 Sampling with and without Replacement |
|
|
250 | (2) |
|
|
252 | (2) |
|
6.2 Conditional Probability |
|
|
254 | (23) |
|
6.2.1 Conditional, Marginal, and Joint Probabilities |
|
|
254 | (7) |
|
|
261 | (5) |
|
|
266 | (2) |
|
6.2.4 Predicting Race Using Surname and Residence Location |
|
|
268 | (9) |
|
6.3 Random Variables and Probability Distributions |
|
|
277 | (23) |
|
|
278 | (1) |
|
6.3.2 Bernoulli and Uniform Distributions |
|
|
278 | (4) |
|
6.3.3 Binomial Distribution |
|
|
282 | (4) |
|
6.3.4 Normal Distribution |
|
|
286 | (6) |
|
6.3.5 Expectation and Variance |
|
|
292 | (4) |
|
6.3.6 Predicting Election Outcomes with Uncertainty |
|
|
296 | (4) |
|
6.4 Large Sample Theorems |
|
|
300 | (6) |
|
6.4.1 The Law of Large Numbers |
|
|
300 | (2) |
|
6.4.2 The Central Limit Theorem |
|
|
302 | (4) |
|
|
306 | (1) |
|
|
307 | (7) |
|
6.6.1 The Mathematics of Enigma |
|
|
307 | (2) |
|
6.6.2 A Probability Model for Betting Market Election Prediction |
|
|
309 | (1) |
|
6.6.3 Election Fraud in Russia |
|
|
310 | (4) |
7 Uncertainty |
|
314 | (83) |
|
|
314 | (28) |
|
7.1.1 Unbiasedness and Consistency |
|
|
315 | (7) |
|
|
322 | (4) |
|
7.1.3 Confidence Intervals |
|
|
326 | (6) |
|
7.1.4 Margin of Error and Sample Size Calculation in Polls |
|
|
332 | (4) |
|
7.1.5 Analysis of Randomized Controlled Trials |
|
|
336 | (3) |
|
7.1.6 Analysis Based on Student's t-Distribution |
|
|
339 | (3) |
|
|
342 | (28) |
|
7.2.1 Tea-Tasting Experiment |
|
|
342 | (4) |
|
7.2.2 The General Framework |
|
|
346 | (4) |
|
|
350 | (6) |
|
|
356 | (5) |
|
7.2.5 Pitfalls of Hypothesis Testing |
|
|
361 | (2) |
|
|
363 | (7) |
|
7.3 Linear Regression Model with Uncertainty |
|
|
370 | (19) |
|
7.3.1 Linear Regression as a Generative Model |
|
|
370 | (5) |
|
7.3.2 Unbiasedness of Estimated Coefficients |
|
|
375 | (3) |
|
7.3.3 Standard Errors of Estimated Coefficients |
|
|
378 | (2) |
|
7.3.4 Inference about Coefficients |
|
|
380 | (4) |
|
7.3.5 Inference about Predictions |
|
|
384 | (5) |
|
|
389 | (1) |
|
|
390 | (7) |
|
7.5.1 Sex Ratio and the Price of Agricultural Crops in China |
|
|
390 | (2) |
|
7.5.2 File Drawer and Publication Bias in Academic Research |
|
|
392 | (2) |
|
7.5.3 The 1932 German Election in the Weimar Republic |
|
|
394 | (3) |
8 Next |
|
397 | (4) |
General Index |
|
401 | (5) |
R Index |
|
406 | |