Atnaujinkite slapukų nuostatas

El. knyga: Molecular Evolution: A Statistical Approach [Oxford Scholarship Online E-books]

(RA Fisher Professor of Statistical Genetics, Department of Genetics, Evolution and Environment, University College London)
  • Formatas: 512 pages
  • Išleidimo metai: 29-May-2014
  • Leidėjas: Oxford University Press
  • ISBN-13: 9780199602605
  • Oxford Scholarship Online E-books
  • Kaina nežinoma
  • Formatas: 512 pages
  • Išleidimo metai: 29-May-2014
  • Leidėjas: Oxford University Press
  • ISBN-13: 9780199602605
Studies of evolution at the molecular level have experienced phenomenal growth in the last few decades, due to rapid accumulation of genetic sequence data, improved computer hardware and software, and the development of sophisticated analytical methods. The flood of genomic data has generated an acute need for powerful statistical methods and efficient computational algorithms to enable their effective analysis and interpretation.

Molecular Evolution: a statistical approach presents and explains modern statistical methods and computational algorithms for the comparative analysis of genetic sequence data in the fields of molecular evolution, molecular phylogenetics, statistical phylogeography, and comparative genomics. Written by an expert in the field, the book emphasizes conceptual understanding rather than mathematical proofs. The text is enlivened with numerous examples of real data analysis and numerical calculations to illustrate the theory, in addition to the working problems at the end of each chapter. The coverage of maximum likelihood and Bayesian methods are in particular up-to-date, comprehensive, and authoritative.

This advanced textbook is aimed at graduate level students and professional researchers (both empiricists and theoreticians) in the fields of bioinformatics and computational biology, statistical genomics, evolutionary biology, molecular systematics, and population genetics. It will also be of relevance and use to a wider audience of applied statisticians, mathematicians, and computer scientists working in computational biology.
1 Models of nucleotide substitution
1(34)
1.1 Introduction
1(3)
1.2 Markov models of nucleotide substitution and distance estimation
4(11)
1.2.1 The JC69 model
4(3)
1.2.2 The K80 model
7(2)
1.2.3 HKY85, F84, TN93, etc.
9(4)
1.2.4 The transition/transversion rate ratio
13(2)
1.3 Variable substitution rates across sites
15(2)
1.4 Maximum likelihood estimation of distance
17(9)
1.4.1 The JC69 model
18(4)
1.4.2 The K80 model
22(1)
1.4.3 Likelihood ratio test of substitution models
22(2)
1.4.4 Profile and integrated likelihood methods
24(2)
1.5 Markov chains and distance estimation under general models
26(6)
1.5.1 Markov chains
26(1)
1.5.2 Distance under the unrestricted (UNREST) model
27(2)
1.5.3 Distance under the general time-reversible model
29(3)
1.6 Discussions
32(1)
1.6.1 Distance estimation under different substitution models
32(1)
1.6.2 Limitations of pairwise comparison
32(1)
1.7 Problems
33(2)
2 Models of amino acid and codon substitution
35(35)
2.1 Introduction
35(1)
2.2 Models of amino acid replacement
35(5)
2.2.1 Empirical models
35(4)
2.2.2 Mechanistic models
39(1)
2.2.3 Among-site heterogeneity
39(1)
2.3 Estimation of distance between two protein sequences
40(2)
2.3.1 The Poisson model
40(1)
2.3.2 Empirical models
41(1)
2.3.3 Gamma distances
41(1)
2.4 Models of codon substitution
42(5)
2.4.1 The basic model
42(2)
2.4.2 Variations and extensions
44(3)
2.5 Estimation of ds and dN
47(18)
2.5.1 Counting methods
47(8)
2.5.2 Maximum likelihood method
55(2)
2.5.3 Comparison of methods
57(1)
2.5.4 More distances and interpretation of the dN/ds ratio
58(3)
2.5.5 Estimation of d$ and dN in comparative genomics
61(2)
2.5.6 Distances based on the physical-site definition
63(2)
2.5.7 Utility of the distance measures
65(1)
2.6 Numerical calculation of the transition probability matrix
65(3)
2.7 Problems
68(2)
3 Phytogeny reconstruction: overview
70(32)
3.1 Tree concepts
70(12)
3.1.1 Terminology
70(9)
3.1.2 Species trees and gene trees
79(2)
3.1.3 Classification of tree reconstruction methods
81(1)
3.2 Exhaustive and heuristic tree search
82(6)
3.2.1 Exhaustive tree search
82(1)
3.2.2 Heuristic tree search
82(2)
3.2.3 Branch swapping
84(2)
3.2.4 Local peaks in the tree space
86(2)
3.2.5 Stochastic tree search
88(1)
3.3 Distance matrix methods
88(7)
3.3.1 Least-squares method
89(2)
3.3.2 Minimum evolution method
91(1)
3.3.3 Neighbour-joining method
91(4)
3.4 Maximum parsimony
95(6)
3.4.1 Brief history
95(1)
3.4.2 Counting the minimum number of changes on a tree
95(1)
3.4.3 Weighted parsimony and dynamic programming
96(3)
3.4.4 Probabilities of ancestral states
99(1)
3.4.5 Long-branch attraction
99(1)
3.4.6 Assumptions of parsimony
100(1)
3.5 Problems
101(1)
4 Maximum likelihood methods
102(51)
4.1 Introduction
102(1)
4.2 Likelihood calculation on tree
102(12)
4.2.1 Data, model, tree, and likelihood
102(1)
4.2.2 The pruning algorithm
103(4)
4.2.3 Time reversibility, the root of the tree, and the molecular clock
107(1)
4.2.4 A numerical example: phylogeny of apes
108(2)
4.2.5 Amino acid, codon, and RNA models
110(1)
4.2.6 Missing data, sequence errors, and alignment gaps
110(4)
4.3 Likelihood calculation under more complex models
114(11)
4.3.1 Mixture models for variable rates among sites
114(8)
4.3.2 Mixture models for pattern heterogeneity among sites
122(1)
4.3.3 Partition models for combined analysis of multiple datasets
123(2)
4.3.4 Nonhomogeneous and nonstationary models
125(1)
4.4 Reconstruction of ancestral states
125(8)
4.4.1 Overview
125(2)
4.4.2 Empirical and hierarchical Bayesian reconstruction
127(3)
4.4.3 Discrete morphological characters
130(1)
4.4.4 Systematic biases in ancestral reconstruction
131(2)
4.5 Numerical algorithms for maximum likelihood estimation
133(5)
4.5.1 Univariate optimization
134(2)
4.5.2 Multivariate optimization
136(2)
4.6 ML optimization in phylogenetics
138(6)
4.6.1 Optimization on a fixed tree
138(1)
4.6.2 Multiple local peaks on the likelihood surface for a fixed tree
139(1)
4.6.3 Search in the tree space
140(3)
4.6.4 Approximate likelihood method
143(1)
4.7 Model selection and robustness
144(7)
4.7.1 Likelihood ratio test applied to rbcL dataset
144(2)
4.7.2 Test of goodness of fit and parametric bootstrap
146(1)
4.7.3 Diagnostic tests to detect model violations
147(1)
4.7.4 Akaike information criterion (AIC and AICC)
148(1)
4.7.5 Bayesian information criterion
149(1)
4.7.6 Model adequacy and robustness
150(1)
4.8 Problems
151(2)
5 Comparison of phylogenetic methods and tests on trees
153(29)
5.1 Statistical performance of tree reconstruction methods
153(4)
5.1.1 Criteria
154(2)
5.1.2 Performance
156(1)
5.2 Likelihood
157(8)
5.2.1 Contrast with conventional parameter estimation
157(1)
5.2.2 Consistency
158(1)
5.2.3 Efficiency
159(4)
5.2.4 Robustness
163(2)
5.3 Parsimony
165(6)
5.3.1 Equivalence with misbehaved likelihood models
165(3)
5.3.2 Equivalence with well-behaved likelihood models
168(1)
5.3.3 Assumptions and justifications
169(2)
5.4 Testing hypotheses concerning trees
171(10)
5.4.1 Bootstrap
172(5)
5.4.2 Interior-branch test
177(1)
5.4.3 K-H test and related tests
178(1)
5.4.4 Example: phytogeny of apes
179(1)
5.4.5 Indexes used in parsimony analysis
180(1)
5.5 Problems
181(1)
6 Bayesian theory
182(32)
6.1 Overview
182(1)
6.2 The Bayesian paradigm
183(14)
6.2.1 The Bayes theorem
183(1)
6.2.2 The Bayes theorem in Bayesian statistics
184(5)
6.2.3 Classical versus Bayesian statistics
189(8)
6.3 Prior
197(6)
6.3.1 Methods of prior specification
197(1)
6.3.2 Conjugate priors
198(1)
6.3.3 Flat or uniform priors
199(1)
6.3.4 The Jeffreys priors
200(2)
6.3.5 The reference priors
202(1)
6.4 Methods of integration
203(9)
6.4.1 Laplace approximation
203(1)
6.4.2 Mid-point and trapezoid methods
204(1)
6.4.3 Gaussian quadrature
205(1)
6.4.4 Marginal likelihood calculation for JC69 distance estimation
206(4)
6.4.5 Monte Carlo integration
210(1)
6.4.6 Importance sampling
210(2)
6.5 Problems
212(2)
7 Bayesian computation (MCMC)
214(49)
7.1 Markov chain Monte Carlo
214(7)
7.1.1 Metropolis algorithm
214(4)
7.1.2 Asymmetrical moves and proposal ratio
218(1)
7.1.3 The transition kernel
219(1)
7.1.4 Single-component Metropolis--Hastings algorithm
220(1)
7.1.5 Gibbs sampler
221(1)
7.2 Simple moves and their proposal ratios
221(5)
7.2.1 Sliding window using the uniform proposal
222(1)
7.2.2 Sliding window using the normal proposal
223(1)
7.2.3 Bactrian proposal
223(1)
7.2.4 Sliding window using the multivariate normal proposal
224(1)
7.2.5 Proportional scaling
225(1)
7.2.6 Proportional scaling with bounds
226(1)
7.3 Convergence, mixing, and summary of MCMC
226(18)
7.3.1 Convergence and tail behaviour
226(4)
7.3.2 Mixing efficiency, jump probability, and step length
230(11)
7.3.3 Validating and diagnosing MCMC algorithms
241(1)
7.3.4 Potential scale reduction statistic
242(1)
7.3.5 Summary of MCMC output
243(1)
7.4 Advanced Monte Carlo methods
244(16)
7.4.1 Parallel tempering (MC3)
245(2)
7.4.2 Trans-model and trans-dimensional MCMC
247(9)
7.4.3 Bayes factor and marginal likelihood
256(4)
7.5 Problems
260(3)
8 Bayesian phylogenetics
263(45)
8.1 Overview
263(3)
8.1.1 Historical background
263(1)
8.1.2 A sketch MCMC algorithm
264(1)
8.1.3 The statistical nature of phylogeny estimation
264(2)
8.2 Models and priors in Bayesian phylogenetics
266(13)
8.2.1 Priors on branch lengths
266(3)
8.2.2 Priors on parameters in substitution models
269(7)
8.2.3 Priors on tree topology
276(3)
8.3 MCMC proposals in Bayesian phylogenetics
279(16)
8.3.1 Within-tree moves
279(2)
8.3.2 Cross-tree moves
281(3)
8.3.3 NNI for unrooted trees
284(3)
8.3.4 SPR for unrooted trees
287(2)
8.3.5 TBR for unrooted trees
289(2)
8.3.6 Subtree swapping
291(1)
8.3.7 NNI for rooted trees
292(1)
8.3.8 SPR on rooted trees
293(1)
8.3.9 Node slider
294(1)
8.4 Summarizing MCMC output
295(1)
8.5 High posterior probabilities for trees
296(10)
8.5.1 High posterior probabilities for trees or splits
296(2)
8.5.2 Star tree paradox
298(2)
8.5.3 Fair coin paradox, fair balance paradox, and Bayesian model selection
300(5)
8.5.4 Conservative Bayesian phylogenetics
305(1)
8.6 Problems
306(2)
9 Coalescent theory and species trees
308(53)
9.1 Overview
308(1)
9.2 The coalescent model for a single species
309(11)
9.2.1 The backward time machine
309(1)
9.2.2 Fisher-Wright model and the neutral coalescent
309(3)
9.2.3 A sample of n genes
312(3)
9.2.4 Simulating the coalescent
315(1)
9.2.5 Estimation of θ from a sample of DNA sequences
316(4)
9.3 Population demographic process
320(5)
9.3.1 Homogeneous and nonhomogeneous Poisson processes
321(1)
9.3.2 Deterministic population size change
322(1)
9.3.3 Nonparametric population demographic models
323(2)
9.4 Multispecies coalescent, species trees and gene trees
325(24)
9.4.1 Multispecies coalescent
325(6)
9.4.2 Species tree--gene tree conflict
331(4)
9.4.3 Estimation of species trees
335(8)
9.4.4 Migration
343(6)
9.5 Species delimitation
349(10)
9.5.1 Species concept and species delimitation
349(2)
9.5.2 Simple methods for analysing genetic data
351(1)
9.5.3 Bayesian species delimitation
352(3)
9.5.4 The impact of guide tree, prior, and migration
355(3)
9.5.5 Pros and cons of Bayesian species delimitation
358(1)
9.6 Problems
359(2)
10 Molecular clock and estimation of species divergence times
361(29)
10.1 Overview
361(2)
10.2 Tests of the molecular clock
363(3)
10.2.1 Relative-rate tests
363(1)
10.2.2 Likelihood ratio test
364(1)
10.2.3 Limitations of molecular clock tests
365(1)
10.2.4 Index of dispersion
366(1)
10.3 Likelihood estimation of divergence times
366(9)
10.3.1 Global clock model
366(1)
10.3.2 Local clock model
367(1)
10.3.3 Heuristic rate-smoothing methods
368(2)
10.3.4 Uncertainties in calibrations
370(2)
10.3.5 Dating viral divergences
372(1)
10.3.6 Dating primate divergences
373(2)
10.4 Bayesian estimation of divergence times
375(13)
10.4.1 General framework
375(1)
10.4.2 Approximate calculation of likelihood
376(1)
10.4.3 Prior on evolutionary rates
377(1)
10.4.4 Prior on divergence times and fossil calibrations
378(4)
10.4.5 Uncertainties in time estimates
382(2)
10.4.6 Dating viral divergences
384(1)
10.4.7 Application to primate and mammalian divergences
385(3)
10.5 Perspectives
388(1)
10.6 Problems
389(1)
11 Neutral and adaptive protein evolution
390(28)
11.1 Introduction
390(1)
11.2 The neutral theory and tests of neutrality
391(7)
11.2.1 The neutral and nearly neutral theories
391(2)
11.2.2 Tajima's D statistic
393(1)
11.2.3 Fu and Li's D, and Fay and Wu's H statistics
394(1)
11.2.4 McDonald--Kreitman test and estimation of selective strength
395(2)
11.2.5 Hudson--Kreitman--Aquade test
397(1)
11.3 Lineages undergoing adaptive evolution
398(2)
11.3.1 Heuristic methods
398(1)
11.3.2 Likelihood method
399(1)
11.4 Amino acid sites undergoing adaptive evolution
400(8)
11.4.1 Three strategies
400(2)
11.4.2 Likelihood ratio test of positive selection under random-site models
402(3)
11.4.3 Identification of sites under positive selection
405(1)
11.4.4 Positive selection at the human MHC
406(2)
11.5 Adaptive evolution affecting particular sites and lineages
408(3)
11.5.1 Branch-site test of positive selection
408(1)
11.5.2 Other similar models
409(1)
11.5.3 Adaptive evolution in angiosperm phytochromes
410(1)
11.6 Assumptions, limitations, and comparisons
411(3)
11.6.1 Assumptions and limitations of current methods
412(1)
11.6.2 Comparison of methods for detecting positive selection
413(1)
11.7 Adaptively evolving genes
414(2)
11.8 Problems
416(2)
12 Simulating molecular evolution
418(24)
12.1 Introduction
418(1)
12.2 Random number generator
418(2)
12.3 Generation of discrete random variables
420(4)
12.3.1 Inversion method for sampling from a general discrete distribution
420(1)
12.3.2 The alias method for sampling from a discrete distribution
421(1)
12.3.3 Discrete uniform distribution
422(1)
12.3.4 Binomial distribution
423(1)
12.3.5 The multinomial distribution
423(1)
12.3.6 The Poisson distribution
423(1)
12.3.7 The composition method for mixture distributions
424(1)
12.4 Generation of continuous random variables
424(6)
12.4.1 The inversion method
425(1)
12.4.2 The transformation method
425(1)
12.4.3 The rejection method
425(3)
12.4.4 Generation of a standard normal variate using the polar method
428(2)
12.4.5 Gamma, beta, and Dirichlet variables
430(1)
12.5 Simulation of Markov processes
430(6)
12.5.1 Simulation of the Poisson process
430(1)
12.5.2 Simulation of the nonhomogeneous Poisson process
431(2)
12.5.3 Simulation of discrete-time Markov chains
433(2)
12.5.4 Simulation of continuous-time Markov chains
435(1)
12.6 Simulating molecular evolution
436(3)
12.6.1 Simulation of sequences on a fixed tree
436(3)
12.6.2 Simulation of random trees
439(1)
12.7 Validation of the simulation program
439(1)
12.8 Problems
440(2)
Appendices
442(8)
Appendix A Functions of random variables
442(4)
Appendix B The delta technique
446(2)
Appendix C Phylogenetic software
448(2)
References 450(38)
Index 488
Ziheng Yang is currently RA Fisher Professor of Statistical Genetics in University College London. He obtained a Ph. D in agronomy in Beijing Agricultural University in 1992. Since then he held a few postdoctoral researcher positions in the UK and US. He joined UCL in 1997, first as a lecturer, then reader and professor. He teaches statistical genetics. He has published about 150 research papers and book chapters in molecular evolution, phylogenetics, population genetics, and computational biology. His program package paml is widely used in the molecular evolution community. He was elected a Fellow of the Royal Society in 2006.