|
1 Models of nucleotide substitution |
|
|
1 | (34) |
|
|
1 | (3) |
|
1.2 Markov models of nucleotide substitution and distance estimation |
|
|
4 | (11) |
|
|
4 | (3) |
|
|
7 | (2) |
|
1.2.3 HKY85, F84, TN93, etc. |
|
|
9 | (4) |
|
1.2.4 The transition/transversion rate ratio |
|
|
13 | (2) |
|
1.3 Variable substitution rates across sites |
|
|
15 | (2) |
|
1.4 Maximum likelihood estimation of distance |
|
|
17 | (9) |
|
|
18 | (4) |
|
|
22 | (1) |
|
1.4.3 Likelihood ratio test of substitution models |
|
|
22 | (2) |
|
1.4.4 Profile and integrated likelihood methods |
|
|
24 | (2) |
|
1.5 Markov chains and distance estimation under general models |
|
|
26 | (6) |
|
|
26 | (1) |
|
1.5.2 Distance under the unrestricted (UNREST) model |
|
|
27 | (2) |
|
1.5.3 Distance under the general time-reversible model |
|
|
29 | (3) |
|
|
32 | (1) |
|
1.6.1 Distance estimation under different substitution models |
|
|
32 | (1) |
|
1.6.2 Limitations of pairwise comparison |
|
|
32 | (1) |
|
|
33 | (2) |
|
2 Models of amino acid and codon substitution |
|
|
35 | (35) |
|
|
35 | (1) |
|
2.2 Models of amino acid replacement |
|
|
35 | (5) |
|
|
35 | (4) |
|
|
39 | (1) |
|
2.2.3 Among-site heterogeneity |
|
|
39 | (1) |
|
2.3 Estimation of distance between two protein sequences |
|
|
40 | (2) |
|
|
40 | (1) |
|
|
41 | (1) |
|
|
41 | (1) |
|
2.4 Models of codon substitution |
|
|
42 | (5) |
|
|
42 | (2) |
|
2.4.2 Variations and extensions |
|
|
44 | (3) |
|
2.5 Estimation of ds and dN |
|
|
47 | (18) |
|
|
47 | (8) |
|
2.5.2 Maximum likelihood method |
|
|
55 | (2) |
|
2.5.3 Comparison of methods |
|
|
57 | (1) |
|
2.5.4 More distances and interpretation of the dN/ds ratio |
|
|
58 | (3) |
|
2.5.5 Estimation of d$ and dN in comparative genomics |
|
|
61 | (2) |
|
2.5.6 Distances based on the physical-site definition |
|
|
63 | (2) |
|
2.5.7 Utility of the distance measures |
|
|
65 | (1) |
|
2.6 Numerical calculation of the transition probability matrix |
|
|
65 | (3) |
|
|
68 | (2) |
|
3 Phytogeny reconstruction: overview |
|
|
70 | (32) |
|
|
70 | (12) |
|
|
70 | (9) |
|
3.1.2 Species trees and gene trees |
|
|
79 | (2) |
|
3.1.3 Classification of tree reconstruction methods |
|
|
81 | (1) |
|
3.2 Exhaustive and heuristic tree search |
|
|
82 | (6) |
|
3.2.1 Exhaustive tree search |
|
|
82 | (1) |
|
3.2.2 Heuristic tree search |
|
|
82 | (2) |
|
|
84 | (2) |
|
3.2.4 Local peaks in the tree space |
|
|
86 | (2) |
|
3.2.5 Stochastic tree search |
|
|
88 | (1) |
|
3.3 Distance matrix methods |
|
|
88 | (7) |
|
3.3.1 Least-squares method |
|
|
89 | (2) |
|
3.3.2 Minimum evolution method |
|
|
91 | (1) |
|
3.3.3 Neighbour-joining method |
|
|
91 | (4) |
|
|
95 | (6) |
|
|
95 | (1) |
|
3.4.2 Counting the minimum number of changes on a tree |
|
|
95 | (1) |
|
3.4.3 Weighted parsimony and dynamic programming |
|
|
96 | (3) |
|
3.4.4 Probabilities of ancestral states |
|
|
99 | (1) |
|
3.4.5 Long-branch attraction |
|
|
99 | (1) |
|
3.4.6 Assumptions of parsimony |
|
|
100 | (1) |
|
|
101 | (1) |
|
4 Maximum likelihood methods |
|
|
102 | (51) |
|
|
102 | (1) |
|
4.2 Likelihood calculation on tree |
|
|
102 | (12) |
|
4.2.1 Data, model, tree, and likelihood |
|
|
102 | (1) |
|
4.2.2 The pruning algorithm |
|
|
103 | (4) |
|
4.2.3 Time reversibility, the root of the tree, and the molecular clock |
|
|
107 | (1) |
|
4.2.4 A numerical example: phylogeny of apes |
|
|
108 | (2) |
|
4.2.5 Amino acid, codon, and RNA models |
|
|
110 | (1) |
|
4.2.6 Missing data, sequence errors, and alignment gaps |
|
|
110 | (4) |
|
4.3 Likelihood calculation under more complex models |
|
|
114 | (11) |
|
4.3.1 Mixture models for variable rates among sites |
|
|
114 | (8) |
|
4.3.2 Mixture models for pattern heterogeneity among sites |
|
|
122 | (1) |
|
4.3.3 Partition models for combined analysis of multiple datasets |
|
|
123 | (2) |
|
4.3.4 Nonhomogeneous and nonstationary models |
|
|
125 | (1) |
|
4.4 Reconstruction of ancestral states |
|
|
125 | (8) |
|
|
125 | (2) |
|
4.4.2 Empirical and hierarchical Bayesian reconstruction |
|
|
127 | (3) |
|
4.4.3 Discrete morphological characters |
|
|
130 | (1) |
|
4.4.4 Systematic biases in ancestral reconstruction |
|
|
131 | (2) |
|
4.5 Numerical algorithms for maximum likelihood estimation |
|
|
133 | (5) |
|
4.5.1 Univariate optimization |
|
|
134 | (2) |
|
4.5.2 Multivariate optimization |
|
|
136 | (2) |
|
4.6 ML optimization in phylogenetics |
|
|
138 | (6) |
|
4.6.1 Optimization on a fixed tree |
|
|
138 | (1) |
|
4.6.2 Multiple local peaks on the likelihood surface for a fixed tree |
|
|
139 | (1) |
|
4.6.3 Search in the tree space |
|
|
140 | (3) |
|
4.6.4 Approximate likelihood method |
|
|
143 | (1) |
|
4.7 Model selection and robustness |
|
|
144 | (7) |
|
4.7.1 Likelihood ratio test applied to rbcL dataset |
|
|
144 | (2) |
|
4.7.2 Test of goodness of fit and parametric bootstrap |
|
|
146 | (1) |
|
4.7.3 Diagnostic tests to detect model violations |
|
|
147 | (1) |
|
4.7.4 Akaike information criterion (AIC and AICC) |
|
|
148 | (1) |
|
4.7.5 Bayesian information criterion |
|
|
149 | (1) |
|
4.7.6 Model adequacy and robustness |
|
|
150 | (1) |
|
|
151 | (2) |
|
5 Comparison of phylogenetic methods and tests on trees |
|
|
153 | (29) |
|
5.1 Statistical performance of tree reconstruction methods |
|
|
153 | (4) |
|
|
154 | (2) |
|
|
156 | (1) |
|
|
157 | (8) |
|
5.2.1 Contrast with conventional parameter estimation |
|
|
157 | (1) |
|
|
158 | (1) |
|
|
159 | (4) |
|
|
163 | (2) |
|
|
165 | (6) |
|
5.3.1 Equivalence with misbehaved likelihood models |
|
|
165 | (3) |
|
5.3.2 Equivalence with well-behaved likelihood models |
|
|
168 | (1) |
|
5.3.3 Assumptions and justifications |
|
|
169 | (2) |
|
5.4 Testing hypotheses concerning trees |
|
|
171 | (10) |
|
|
172 | (5) |
|
5.4.2 Interior-branch test |
|
|
177 | (1) |
|
5.4.3 K-H test and related tests |
|
|
178 | (1) |
|
5.4.4 Example: phytogeny of apes |
|
|
179 | (1) |
|
5.4.5 Indexes used in parsimony analysis |
|
|
180 | (1) |
|
|
181 | (1) |
|
|
182 | (32) |
|
|
182 | (1) |
|
6.2 The Bayesian paradigm |
|
|
183 | (14) |
|
|
183 | (1) |
|
6.2.2 The Bayes theorem in Bayesian statistics |
|
|
184 | (5) |
|
6.2.3 Classical versus Bayesian statistics |
|
|
189 | (8) |
|
|
197 | (6) |
|
6.3.1 Methods of prior specification |
|
|
197 | (1) |
|
|
198 | (1) |
|
6.3.3 Flat or uniform priors |
|
|
199 | (1) |
|
6.3.4 The Jeffreys priors |
|
|
200 | (2) |
|
6.3.5 The reference priors |
|
|
202 | (1) |
|
6.4 Methods of integration |
|
|
203 | (9) |
|
6.4.1 Laplace approximation |
|
|
203 | (1) |
|
6.4.2 Mid-point and trapezoid methods |
|
|
204 | (1) |
|
6.4.3 Gaussian quadrature |
|
|
205 | (1) |
|
6.4.4 Marginal likelihood calculation for JC69 distance estimation |
|
|
206 | (4) |
|
6.4.5 Monte Carlo integration |
|
|
210 | (1) |
|
6.4.6 Importance sampling |
|
|
210 | (2) |
|
|
212 | (2) |
|
7 Bayesian computation (MCMC) |
|
|
214 | (49) |
|
7.1 Markov chain Monte Carlo |
|
|
214 | (7) |
|
7.1.1 Metropolis algorithm |
|
|
214 | (4) |
|
7.1.2 Asymmetrical moves and proposal ratio |
|
|
218 | (1) |
|
7.1.3 The transition kernel |
|
|
219 | (1) |
|
7.1.4 Single-component Metropolis--Hastings algorithm |
|
|
220 | (1) |
|
|
221 | (1) |
|
7.2 Simple moves and their proposal ratios |
|
|
221 | (5) |
|
7.2.1 Sliding window using the uniform proposal |
|
|
222 | (1) |
|
7.2.2 Sliding window using the normal proposal |
|
|
223 | (1) |
|
|
223 | (1) |
|
7.2.4 Sliding window using the multivariate normal proposal |
|
|
224 | (1) |
|
7.2.5 Proportional scaling |
|
|
225 | (1) |
|
7.2.6 Proportional scaling with bounds |
|
|
226 | (1) |
|
7.3 Convergence, mixing, and summary of MCMC |
|
|
226 | (18) |
|
7.3.1 Convergence and tail behaviour |
|
|
226 | (4) |
|
7.3.2 Mixing efficiency, jump probability, and step length |
|
|
230 | (11) |
|
7.3.3 Validating and diagnosing MCMC algorithms |
|
|
241 | (1) |
|
7.3.4 Potential scale reduction statistic |
|
|
242 | (1) |
|
7.3.5 Summary of MCMC output |
|
|
243 | (1) |
|
7.4 Advanced Monte Carlo methods |
|
|
244 | (16) |
|
7.4.1 Parallel tempering (MC3) |
|
|
245 | (2) |
|
7.4.2 Trans-model and trans-dimensional MCMC |
|
|
247 | (9) |
|
7.4.3 Bayes factor and marginal likelihood |
|
|
256 | (4) |
|
|
260 | (3) |
|
|
263 | (45) |
|
|
263 | (3) |
|
8.1.1 Historical background |
|
|
263 | (1) |
|
8.1.2 A sketch MCMC algorithm |
|
|
264 | (1) |
|
8.1.3 The statistical nature of phylogeny estimation |
|
|
264 | (2) |
|
8.2 Models and priors in Bayesian phylogenetics |
|
|
266 | (13) |
|
8.2.1 Priors on branch lengths |
|
|
266 | (3) |
|
8.2.2 Priors on parameters in substitution models |
|
|
269 | (7) |
|
8.2.3 Priors on tree topology |
|
|
276 | (3) |
|
8.3 MCMC proposals in Bayesian phylogenetics |
|
|
279 | (16) |
|
|
279 | (2) |
|
|
281 | (3) |
|
8.3.3 NNI for unrooted trees |
|
|
284 | (3) |
|
8.3.4 SPR for unrooted trees |
|
|
287 | (2) |
|
8.3.5 TBR for unrooted trees |
|
|
289 | (2) |
|
|
291 | (1) |
|
8.3.7 NNI for rooted trees |
|
|
292 | (1) |
|
8.3.8 SPR on rooted trees |
|
|
293 | (1) |
|
|
294 | (1) |
|
8.4 Summarizing MCMC output |
|
|
295 | (1) |
|
8.5 High posterior probabilities for trees |
|
|
296 | (10) |
|
8.5.1 High posterior probabilities for trees or splits |
|
|
296 | (2) |
|
|
298 | (2) |
|
8.5.3 Fair coin paradox, fair balance paradox, and Bayesian model selection |
|
|
300 | (5) |
|
8.5.4 Conservative Bayesian phylogenetics |
|
|
305 | (1) |
|
|
306 | (2) |
|
9 Coalescent theory and species trees |
|
|
308 | (53) |
|
|
308 | (1) |
|
9.2 The coalescent model for a single species |
|
|
309 | (11) |
|
9.2.1 The backward time machine |
|
|
309 | (1) |
|
9.2.2 Fisher-Wright model and the neutral coalescent |
|
|
309 | (3) |
|
9.2.3 A sample of n genes |
|
|
312 | (3) |
|
9.2.4 Simulating the coalescent |
|
|
315 | (1) |
|
9.2.5 Estimation of θ from a sample of DNA sequences |
|
|
316 | (4) |
|
9.3 Population demographic process |
|
|
320 | (5) |
|
9.3.1 Homogeneous and nonhomogeneous Poisson processes |
|
|
321 | (1) |
|
9.3.2 Deterministic population size change |
|
|
322 | (1) |
|
9.3.3 Nonparametric population demographic models |
|
|
323 | (2) |
|
9.4 Multispecies coalescent, species trees and gene trees |
|
|
325 | (24) |
|
9.4.1 Multispecies coalescent |
|
|
325 | (6) |
|
9.4.2 Species tree--gene tree conflict |
|
|
331 | (4) |
|
9.4.3 Estimation of species trees |
|
|
335 | (8) |
|
|
343 | (6) |
|
|
349 | (10) |
|
9.5.1 Species concept and species delimitation |
|
|
349 | (2) |
|
9.5.2 Simple methods for analysing genetic data |
|
|
351 | (1) |
|
9.5.3 Bayesian species delimitation |
|
|
352 | (3) |
|
9.5.4 The impact of guide tree, prior, and migration |
|
|
355 | (3) |
|
9.5.5 Pros and cons of Bayesian species delimitation |
|
|
358 | (1) |
|
|
359 | (2) |
|
10 Molecular clock and estimation of species divergence times |
|
|
361 | (29) |
|
|
361 | (2) |
|
10.2 Tests of the molecular clock |
|
|
363 | (3) |
|
10.2.1 Relative-rate tests |
|
|
363 | (1) |
|
10.2.2 Likelihood ratio test |
|
|
364 | (1) |
|
10.2.3 Limitations of molecular clock tests |
|
|
365 | (1) |
|
10.2.4 Index of dispersion |
|
|
366 | (1) |
|
10.3 Likelihood estimation of divergence times |
|
|
366 | (9) |
|
10.3.1 Global clock model |
|
|
366 | (1) |
|
|
367 | (1) |
|
10.3.3 Heuristic rate-smoothing methods |
|
|
368 | (2) |
|
10.3.4 Uncertainties in calibrations |
|
|
370 | (2) |
|
10.3.5 Dating viral divergences |
|
|
372 | (1) |
|
10.3.6 Dating primate divergences |
|
|
373 | (2) |
|
10.4 Bayesian estimation of divergence times |
|
|
375 | (13) |
|
|
375 | (1) |
|
10.4.2 Approximate calculation of likelihood |
|
|
376 | (1) |
|
10.4.3 Prior on evolutionary rates |
|
|
377 | (1) |
|
10.4.4 Prior on divergence times and fossil calibrations |
|
|
378 | (4) |
|
10.4.5 Uncertainties in time estimates |
|
|
382 | (2) |
|
10.4.6 Dating viral divergences |
|
|
384 | (1) |
|
10.4.7 Application to primate and mammalian divergences |
|
|
385 | (3) |
|
|
388 | (1) |
|
|
389 | (1) |
|
11 Neutral and adaptive protein evolution |
|
|
390 | (28) |
|
|
390 | (1) |
|
11.2 The neutral theory and tests of neutrality |
|
|
391 | (7) |
|
11.2.1 The neutral and nearly neutral theories |
|
|
391 | (2) |
|
11.2.2 Tajima's D statistic |
|
|
393 | (1) |
|
11.2.3 Fu and Li's D, and Fay and Wu's H statistics |
|
|
394 | (1) |
|
11.2.4 McDonald--Kreitman test and estimation of selective strength |
|
|
395 | (2) |
|
11.2.5 Hudson--Kreitman--Aquade test |
|
|
397 | (1) |
|
11.3 Lineages undergoing adaptive evolution |
|
|
398 | (2) |
|
|
398 | (1) |
|
|
399 | (1) |
|
11.4 Amino acid sites undergoing adaptive evolution |
|
|
400 | (8) |
|
|
400 | (2) |
|
11.4.2 Likelihood ratio test of positive selection under random-site models |
|
|
402 | (3) |
|
11.4.3 Identification of sites under positive selection |
|
|
405 | (1) |
|
11.4.4 Positive selection at the human MHC |
|
|
406 | (2) |
|
11.5 Adaptive evolution affecting particular sites and lineages |
|
|
408 | (3) |
|
11.5.1 Branch-site test of positive selection |
|
|
408 | (1) |
|
11.5.2 Other similar models |
|
|
409 | (1) |
|
11.5.3 Adaptive evolution in angiosperm phytochromes |
|
|
410 | (1) |
|
11.6 Assumptions, limitations, and comparisons |
|
|
411 | (3) |
|
11.6.1 Assumptions and limitations of current methods |
|
|
412 | (1) |
|
11.6.2 Comparison of methods for detecting positive selection |
|
|
413 | (1) |
|
11.7 Adaptively evolving genes |
|
|
414 | (2) |
|
|
416 | (2) |
|
12 Simulating molecular evolution |
|
|
418 | (24) |
|
|
418 | (1) |
|
12.2 Random number generator |
|
|
418 | (2) |
|
12.3 Generation of discrete random variables |
|
|
420 | (4) |
|
12.3.1 Inversion method for sampling from a general discrete distribution |
|
|
420 | (1) |
|
12.3.2 The alias method for sampling from a discrete distribution |
|
|
421 | (1) |
|
12.3.3 Discrete uniform distribution |
|
|
422 | (1) |
|
12.3.4 Binomial distribution |
|
|
423 | (1) |
|
12.3.5 The multinomial distribution |
|
|
423 | (1) |
|
12.3.6 The Poisson distribution |
|
|
423 | (1) |
|
12.3.7 The composition method for mixture distributions |
|
|
424 | (1) |
|
12.4 Generation of continuous random variables |
|
|
424 | (6) |
|
12.4.1 The inversion method |
|
|
425 | (1) |
|
12.4.2 The transformation method |
|
|
425 | (1) |
|
12.4.3 The rejection method |
|
|
425 | (3) |
|
12.4.4 Generation of a standard normal variate using the polar method |
|
|
428 | (2) |
|
12.4.5 Gamma, beta, and Dirichlet variables |
|
|
430 | (1) |
|
12.5 Simulation of Markov processes |
|
|
430 | (6) |
|
12.5.1 Simulation of the Poisson process |
|
|
430 | (1) |
|
12.5.2 Simulation of the nonhomogeneous Poisson process |
|
|
431 | (2) |
|
12.5.3 Simulation of discrete-time Markov chains |
|
|
433 | (2) |
|
12.5.4 Simulation of continuous-time Markov chains |
|
|
435 | (1) |
|
12.6 Simulating molecular evolution |
|
|
436 | (3) |
|
12.6.1 Simulation of sequences on a fixed tree |
|
|
436 | (3) |
|
12.6.2 Simulation of random trees |
|
|
439 | (1) |
|
12.7 Validation of the simulation program |
|
|
439 | (1) |
|
|
440 | (2) |
|
|
442 | (8) |
|
Appendix A Functions of random variables |
|
|
442 | (4) |
|
Appendix B The delta technique |
|
|
446 | (2) |
|
Appendix C Phylogenetic software |
|
|
448 | (2) |
References |
|
450 | (38) |
Index |
|
488 | |