Preface |
|
xii | |
Acknowledgments |
|
xiv | |
About this book |
|
xvi | |
About The author |
|
xix | |
About the cover illustration |
|
xx | |
|
1 Optimizing systems by experiment |
|
|
1 | (14) |
|
1.1 Examples of engineering workflows |
|
|
3 | (4) |
|
Machine learning engineer's workflow |
|
|
3 | (1) |
|
Quantitative trader's workflow |
|
|
5 | (1) |
|
Software engineer's workflow |
|
|
6 | (1) |
|
1.2 Measuring by experiment |
|
|
7 | (2) |
|
|
8 | (1) |
|
Practical problems and pitfalls |
|
|
8 | (1) |
|
1.3 Why are experiments necessary? |
|
|
9 | (6) |
|
|
9 | (2) |
|
|
11 | (1) |
|
|
12 | (3) |
|
2 A/B testing: Evaluating a modification to your system |
|
|
15 | (27) |
|
2.1 Take an ad hoc measurement |
|
|
16 | (6) |
|
Simulate the trading system |
|
|
17 | (1) |
|
|
18 | (4) |
|
2.2 Take a precise measurement |
|
|
22 | (8) |
|
Mitigate measurement variation with replication |
|
|
22 | (8) |
|
|
30 | (12) |
|
Analyze your measurements |
|
|
31 | (3) |
|
|
34 | (5) |
|
|
39 | (2) |
|
|
41 | (1) |
|
3 Multi-armed bandits: Maximizing business metrics while experimenting |
|
|
42 | (41) |
|
3.1 Epsilon-greedy: Account for the impact of evaluation on business metrics |
|
|
43 | (18) |
|
A/B testing as a baseline |
|
|
45 | (7) |
|
The epsilon-greedy algorithm |
|
|
52 | (5) |
|
|
57 | (4) |
|
3.2 Evaluating multiple system changes simultaneously |
|
|
61 | (6) |
|
3.3 Thompson sampling: A more efficient MAB algorithm |
|
|
67 | (16) |
|
Estimate the probability that an arm is the best |
|
|
68 | (6) |
|
Randomized probability matching |
|
|
74 | (3) |
|
|
77 | (6) |
|
4 Response surface methodology: Optimizing continuous parameters |
|
|
83 | (39) |
|
4.1 Optimize a single continuous parameter |
|
|
84 | (22) |
|
Design: Choose parameter values to measure |
|
|
85 | (11) |
|
|
96 | (2) |
|
Analyze I Interpolate between measurements |
|
|
98 | (4) |
|
Analyze II Optimize the business metric |
|
|
102 | (1) |
|
Validate the optimal parameter value |
|
|
103 | (3) |
|
4.2 Optimizing two or more continuous parameters |
|
|
106 | (16) |
|
Design the two-parameter experiment |
|
|
108 | (3) |
|
Measure, analyze, and validate the 2D experiment |
|
|
111 | (11) |
|
5 Contextual bandits: Making targeted decisions |
|
|
122 | (26) |
|
5.1 Model a business metric offline to make decisions online |
|
|
123 | (8) |
|
Model the business-metric outcome of a decision |
|
|
124 | (4) |
|
Add the decision-making component |
|
|
128 | (1) |
|
Run and evaluate the greedy recommender |
|
|
129 | (2) |
|
5.2 Explore actions with epsilon-greedy |
|
|
131 | (6) |
|
Missing counter/actuals degrade predictions |
|
|
132 | (2) |
|
Explore with epsilon-greedy to collect counterfactuals |
|
|
134 | (3) |
|
5.3 Explore parameters with Thompson sampling |
|
|
137 | (8) |
|
Create an ensemble of prediction models |
|
|
139 | (4) |
|
Randomized probability matching |
|
|
143 | (2) |
|
5.4 Validate the contextual bandit |
|
|
145 | (3) |
|
6 Bayesian optimization: Automating experimental optimization |
|
|
148 | (30) |
|
6.1 Optimizing a single compiler parameter, a visual explanation |
|
|
149 | (10) |
|
|
151 | (1) |
|
Run the initial experiment |
|
|
152 | (1) |
|
Analyze: Model the response surface |
|
|
153 | (1) |
|
Design: Select the parameter value to measure next |
|
|
154 | (2) |
|
Design: Balance exploration with exploitation |
|
|
156 | (3) |
|
6.2 Model the response surface with Gaussian process regression |
|
|
159 | (12) |
|
Estimate the expected CPU time |
|
|
160 | (7) |
|
Estimate uncertainty with GPR |
|
|
167 | (4) |
|
6.3 Optimize over an acquisition function |
|
|
171 | (2) |
|
Minimize the acquisition function |
|
|
171 | (2) |
|
6.4 Optimize all seven compiler parameters |
|
|
173 | (5) |
|
|
173 | (2) |
|
A complete Bayesian optimization |
|
|
175 | (3) |
|
7 Managing business metrics |
|
|
178 | (12) |
|
7.1 Focus on the business |
|
|
179 | (3) |
|
|
179 | (2) |
|
|
181 | (1) |
|
7.2 Define business metrics |
|
|
182 | (4) |
|
Be specific to your business |
|
|
182 | (1) |
|
Update business metrics periodically |
|
|
183 | (1) |
|
Business metric timescales |
|
|
184 | (2) |
|
7.3 Trade off multiple business metrics |
|
|
186 | (4) |
|
Reduce negative side effects |
|
|
186 | (1) |
|
Evaluate with multiple metrics |
|
|
187 | (3) |
|
8 Practical considerations |
|
|
190 | (19) |
|
8.1 Violations of statistical assumptions |
|
|
191 | (3) |
|
Violation of the iid assumption |
|
|
191 | (2) |
|
|
193 | (1) |
|
|
194 | (4) |
|
8.3 Control family-wise error |
|
|
198 | (4) |
|
Cherry flicking increases the false-positive rate |
|
|
198 | (2) |
|
Control false positives with the Bonferroni correction |
|
|
200 | (2) |
|
8.4 Be aware of common biases |
|
|
202 | (4) |
|
|
203 | (1) |
|
|
203 | (1) |
|
|
204 | (1) |
|
|
205 | (1) |
|
8.5 Replicate to validate results |
|
|
206 | (1) |
|
Validate complex experiments |
|
|
206 | (1) |
|
Monitor changes with a reverse A/B test |
|
|
206 | (1) |
|
Measure quarterly changes with holdouts |
|
|
207 | (1) |
|
|
207 | (2) |
Appendix A Linear regression and the normal equations |
|
209 | (5) |
Appendix B One factor at a time |
|
214 | (3) |
Appendix C Gaussian process regression |
|
217 | (4) |
Index |
|
221 | |