Big Data in Omics and Imaging: Integrated Analysis and Causal Inference [Kietas viršelis]

(University of Texas School of Public Health, USA)
  • Formatas: Hardback, 736 pages, aukštis x plotis: 235x156 mm, weight: 1378 g, 30 Tables, black and white; 40 Illustrations, black and white
  • Serija: Chapman & Hall/CRC Mathematical and Computational Biology
  • Išleidimo metai: 19-Jun-2018
  • Leidėjas: CRC Press Inc
  • ISBN-10: 0815387105
  • ISBN-13: 9780815387107
Kitos knygos pagal šią temą:
  • Formatas: Hardback, 736 pages, aukštis x plotis: 235x156 mm, weight: 1378 g, 30 Tables, black and white; 40 Illustrations, black and white
  • Serija: Chapman & Hall/CRC Mathematical and Computational Biology
  • Išleidimo metai: 19-Jun-2018
  • Leidėjas: CRC Press Inc
  • ISBN-10: 0815387105
  • ISBN-13: 9780815387107
Kitos knygos pagal šią temą:
Big Data in Omics and Imaging: Integrated Analysis and Causal Inference addresses the recent development of integrated genomic, epigenomic and imaging data analysis and causal inference in big data era. Despite significant progress in dissecting the genetic architecture of complex diseases by genome-wide association studies (GWAS), genome-wide expression studies (GWES), and epigenome-wide association studies (EWAS), the overall contribution of the new identified genetic variants is small and a large fraction of genetic variants is still hidden. Understanding the etiology and causal chain of mechanism underlying complex diseases remains elusive. It is time to bring big data, machine learning and causal revolution to developing a new generation of genetic analysis for shifting the current paradigm of genetic analysis from shallow association analysis to deep causal inference and from genetic analysis alone to integrated omics and imaging data analysis for unraveling the mechanism of complex diseases. FEATURESProvides a natural extension and companion volume to Big Data in Omic and Imaging: Association Analysis, but can be read independently.Introduce causal inference theory to genomic, epigenomic and imaging data analysisDevelop novel statistics for genome-wide causation studies and epigenome-wide causation studies.Bridge the gap between the traditional association analysis and modern causation analysisUse combinatorial optimization methods and various causal models as a general framework for inferring multilevel omic and image causal networksPresent statistical methods and computational algorithms for searching causal paths from genetic variant to diseaseDevelop causal machine learning methods integrating causal inference and machine learningDevelop statistics for testing significant difference in directed edge, path, and graphs, and for assessing causal relationships between two networks The book is designed for graduate students and researchers in genomics, epigenomics, medical image, bioinformatics, and data science. Topics covered are: mathematical formulation of causal inference, information geometry for causal inference, topology group and Haar measure, additive noise models, distance correlation, multivariate causal inference and causal networks, dynamic causal networks, multivariate and functional structural equation models, mixed structural equation models, causal inference with confounders, integer programming, deep learning and differential equations for wearable computing, genetic analysis of function-valued traits, RNA-seq data analysis, causal networks for genetic methylation analysis, gene expression and methylation deconvolution, cell –specific causal networks, deep learning for image segmentation and image analysis, imaging and genomic data analysis, integrated multilevel causal genomic, epigenomic and imaging data analysis.
Preface xxiii
Author xxix
1 Genotype-Phenotype Network Analysis 1(72)
1.1 Undirected Graphs for Genotype Network
1(15)
1.1.1 Gaussian Graphic Model
1(1)
1.1.2 Alternating Direction Method of Multipliers for Estimation of Gaussian Graphical Model
2(4)
1.1.3 Coordinate Descent Algorithm and Graphical Lasso
6(4)
1.1.4 Multiple Graphical Models
10(6)
1.1.4.1 Edge-Based Joint Estimation of Multiple Graphical Models
10(1)
1.1.4.2 Node-Based Joint Estimation of Multiple Graphical Models
11(5)
1.2 Directed Graphs and Structural Equation Models for Networks
16(10)
1.2.1 Directed Acyclic Graphs
16(1)
1.2.2 Linear Structural Equation Models
17(4)
1.2.3 Estimation Methods
21(5)
1.2.3.1 Maximum Likelihood (ML) Estimation
22(1)
1.2.3.2 Two-Stage Least Squares Method
22(2)
1.2.3.3 Three-Stage Least Squares Method
24(2)
1.3 Sparse Linear Structural Equations
26(8)
1.3.1 L1-Penalized Maximum Likelihood Estimation
27(1)
1.3.2 L1-Penalized Two Stage Least Square Estimation
28(3)
1.3.3 L1-Penalized Three-Stage Least Square Estimation
31(3)
1.4 Functional Structural Equation Models for Genotype-Phenotype Networks
34(7)
1.4.1 Functional Structural Equation Models
34(3)
1.4.2 Group Lasso and ADMM for Parameter Estimation in the Functional Structural Equation Models
37(4)
1.5 Causal Calculus
41(19)
1.5.1 Effect Decomposition and Estimation
41(3)
1.5.2 Graphical Tools for Causal Inference in Linear SEMs
44(8)
1.5.2.1 Basics
44(2)
1.5.2.2 Wright's Rules of Tracing and Path Analysis
46(2)
1.5.2.3 Partial Correlation, Regression, and Path Analysis
48(2)
1.5.2.4 Conditional Independence and D-Separation
50(2)
1.5.3 Identification and Single-Door Criterion
52(3)
1.5.4 Instrument Variables
55(3)
1.5.5 Total Effects and Backdoor Criterion
58(1)
1.5.6 Counterfactuals and Linear SEMs
59(1)
1.6 Simulations and Real Data Analysis
60(4)
1.6.1 Simulations for Model Evaluation
60(2)
1.6.2 Application to Real Data Examples
62(2)
Appendix 1.A
64(3)
Appendix 1.B
67(4)
Exercises
71(2)
2 Causal Analysis and Network Biology 73(100)
2.1 Bayesian Networks as a General Framework for Causal Inference
74(1)
2.2 Parameter Estimation and Bayesian Dirichlet Equivalent Uniform Score for Discrete Bayesian Networks
75(3)
2.3 Structural Equations and Score Metrics for Continuous Causal Networks
78(11)
2.3.1 Multivariate SEMs for Generating Node Core Metrics
78(1)
2.3.2 Mixed SEMs for Pedigree-Based Causal Inference
79(10)
2.3.2.1 Mixed SEMs
79(3)
2.3.2.2 Two-Stage Estimate for the Fixed Effects in the Mixed SEMs
82(1)
2.3.2.3 Three-Stage Estimate for the Fixed Effects in the Mixed SEMs
83(1)
2.3.2.4 The Full Information Maximum Likelihood Method
84(2)
2.3.2.5 Reduced Form Representation of the Mixed SEMs
86(3)
2.4 Bayesian Networks with Discrete and Continuous Variables
89(5)
2.4.1 Two-Class Network Penalized Logistic Regression for Learning Hybrid Bayesian Networks
89(3)
2.4.2 Multiple Network Penalized Functional Logistic Regression Models for NGS Data
92(1)
2.4.3 Multi-Class Network Penalized Logistic Regression for Learning Hybrid Bayesian Networks
93(1)
2.5 Other Statistical Models for Quantifying Node Score Function
94(25)
2.5.1 Nonlinear Structural Equation Models
94(10)
2.5.1.1 Nonlinear Additive Noise Models for Bivariate Causal Discovery
94(6)
2.5.1.2 Nonlinear Structural Equations for Causal Network Discovery
100(4)
2.5.2 Mixed Linear and Nonlinear Structural Equation Models
104(5)
2.5.3 Jointly Interventional and Observational Data for Causal Inference
109(10)
2.5.3.1 Structural Equation Model for Interventional and Observational Data
109(3)
2.5.3.2 Maximum Likelihood Estimation of Structural Equation Models from Interventional and Observational Data
112(3)
2.5.3.3 Sparse Structural Equation Models with Joint Interventional and Observational Data
115(4)
2.6 Integer Programming for Causal Structure Leaning
119(13)
2.6.1 Introduction
120(1)
2.6.2 Integer Linear Programming Formulation of DAG Learning
121(5)
2.6.3 Cutting Plane for Integer Linear Programming
126(3)
2.6.4 Branch-and-Cut Algorithm for Integer Linear Programming
129(1)
2.6.5 Sink Finding Primal Heuristic Algorithm
130(2)
2.7 Simulations and Real Data Analysis
132(5)
2.7.1 Simulations
132(2)
2.7.2 Real Data Analysis
134(3)
Software Package
137(1)
Appendix 2.A Introduction to Smoothing Splines
137(25)
Appendix 2.B Penalized Likelihood Function for Jointly Observational and Interventional Data
162(9)
Exercises
171(2)
3 Wearable Computing and Genetic Analysis of Function-Valued Traits 173(74)
3.1 Classification of Wearable Biosensor Data
174(27)
3.1.1 Introduction
174(1)
3.1.2 Functional Data Analysis for Classification of Time Course Wearable Biosensor Data
175(1)
3.1.3 Differential Equations for Extracting Features of the Dynamic Process and for Classification of Time Course Data
176(11)
3.1.3.1 Differential Equations with Constant and Time-Varying Parameters for Modeling a Dynamic System
176(1)
3.1.3.2 Principal Differential Analysis for Estimation of Parameters in Differential Equations
177(2)
3.1.3.3 QRS Complex Example
179(8)
3.1.4 Deep Learning for Physiological Time Series Data Analysis
187(14)
3.1.4.1 Procedures of Convolutional Neural Networks for Time Course Data Analysis
188(1)
3.1.4.2 Convolution is a Powerful Tool for Liner Filter and Signal Processing
188(3)
3.1.4.3 Architecture of CNNs
191(2)
3.1.4.4 Convolutional Layer
193(4)
3.1.4.5 Parameter Estimation
197(4)
3.2 Association Studies of Function-Valued Traits
201(20)
3.2.1 Introduction
201(2)
3.2.2 Functional Linear Models with Both Functional Response and Predictors for Association Analysis of Function-Valued Traits
203(3)
3.2.3 Test Statistics
206(1)
3.2.4 Null Distribution of Test Statistics
207(2)
3.2.5 Power
209(3)
3.2.6 Real Data Analysis
212(5)
3.2.7 Association Analysis of Multiple Function-Valued Traits
217(4)
3.3 Gene-Gene Interaction Analysis of Function-Valued Traits
221(13)
3.3.1 Introduction
221(1)
3.3.2 Functional Regression Models
222(1)
3.3.3 Estimation of Interaction Effect Function
223(3)
3.3.4 Test Statistics
226(1)
3.3.5 Simulations
227(6)
3.3.5.1 Type 1 Error Rates
227(1)
3.3.5.2 Power
228(5)
3.3.6 Real Data Analysis
233(1)
Appendix 3.A Gradient Methods for Parameter Estimation in the Convolutional Neural Networks
234(12)
Exercises
246(1)
4 RNA-Seq Data Analysis 247(184)
4.1 Normalization Methods on RNA-Seq Data Analysis
247(24)
4.1.1 Gene Expression
247(2)
4.1.2 RNA Sequencing Expression Profiling
249(1)
4.1.3 Methods for Normalization
250(21)
4.1.3.1 Total Read Count Normalization
251(1)
4.1.3.2 Upper Quantile Normalization
251(2)
4.1.3.3 Relative Log Expression (RLE)
253(1)
4.1.3.4 Trimmed Mean of M-Values (TMM)
254(1)
4.1.3.5 RPKM, FPKM, and TPM
255(2)
4.1.3.6 Isoform Expression Quantification
257(10)
4.1.3.7 Allele-Specific Expression Estimation from RNA-Seq Data with Diploid Genomes
267(4)
4.2 Differential Expression Analysis for RNA-Seq Data
271(29)
4.2.1 Distribution-Based Approach to Differential Expression Analysis
272(12)
4.2.1.1 Poisson Distribution
272(7)
4.2.1.2 Negative Binomial Distribution
279(5)
4.2.2 Functional Expansion Approach to Differential Expression Analysis of RNA-Seq Data
284(2)
4.2.2.1 Functional Principal Component Expansion of RNA-Seq Data
285(1)
4.2.3 Differential Analysis of Allele Specific Expressions with RNA-Seq Data
286(14)
4.2.3.1 Single-Variate FPCA for Testing ASE or Differential Expression
289(1)
4.2.3.2 Allele-Specific Differential Expression by Bivariate Functional Principal Component Analysis
290(3)
4.2.3.3 Real Data Application
293(7)
4.3 eQTL and eQTL Epistasis Analysis with RNA-Seq Data
300(9)
4.3.1 Matrix Factorization
301(1)
4.3.2 Quadratically Regularized Matrix Factorization and Canonical Correlation Analysis
302(1)
4.3.3 QRFCCA for eQTL and eQTL Epistasis Analysis of RNA-Seq Data
303(3)
4.3.3.1 QRFCCA for eQTL Analysis
303(1)
4.3.3.2 Data Structure for Interaction Analysis
303(1)
4.3.3.3 Multivariate Regression
304(1)
4.3.3.4 CCA for Epistasis Analysis
304(2)
4.3.4 Real Data Analysis
306(3)
4.3.4.1 RNA-Seq Data and NGS Data
306(1)
4.3.4.2 Cis-Trans Interactions
306(3)
4.4 Gene Co-Expression Network and Gene Regulatory Networks
309(7)
4.4.1 Co-Expression Network Construction with RNA-Seq Data by CCA and FCCA
309(3)
4.4.1.1 CCA Methods for Construction of Gene Co-Expression Networks
310(1)
4.4.1.2 Bivariate CCA for Construction of Co-Expression Networks with ASE Data
311(1)
4.4.2 Graphical Gaussian Models
312(2)
4.4.3 Real Data Applications
314(2)
4.5 Directed Graph and Gene Regulatory Networks
316(18)
4.5.1 General Procedures for Inferring Genome-Wide Regulatory Networks
316(2)
4.5.2 Hierarchical Bayesian Networks for Whole Genome Regulatory Networks
318(11)
4.5.2.1 Summary Statistics for Representation of Groups of Gene Expressions
319(3)
4.5.2.2 Low Rank Presentation Induced Causal Network
322(7)
4.5.3 Linear Regulatory Networks
329(1)
4.5.4 Nonlinear Regulatory Networks
330(4)
4.6 Dynamic Bayesian Network and Longitudinal Expression Data Analysis
334(18)
4.6.1 Dynamic Structural Equation Models with Time-Varying Structures and Parameters
335(5)
4.6.2 Estimation and Inference for Dynamic Structural Equation Models with Time-Varying Structures and Parameters
340(5)
4.6.2.1 Maximum Likelihood (ML) Estimation
341(1)
4.6.2.2 Generalized Least Square Estimation
342(3)
4.6.3 Sparse Dynamic Structural Equation Models
345(7)
4.6.3.1 L1-Penalized Maximum Likelihood Estimation
345(4)
4.6.3.2 L1 Penalized Generalized Least Square Estimator
349(3)
4.7 Single Cell RNA-Seq Data Analysis, Gene Expression Deconvolution, and Genetic Screening
352(12)
4.7.1 Cell Type Identification
353(4)
4.7.2 Gene Expression Deconvolution and Cell Type-Specific Expression
357(77)
4.7.2.1 Gene Expression Deconvolution Formulation
357(2)
4.7.2.2 Loss Functions and Regularization
359(2)
4.7.2.3 Algorithms for Fitting Generalized Low Rank Models
361(3)
Software Package
364(1)
Appendix 4.A Variational Bayesian Theory for Parameter Estimation and RNA-Seq Normalization
365(13)
Appendix 4.B Log-linear Model for Differential Expression Analysis of the RNA-Seq Data with Negative Binomial Distribution
378(12)
Appendix 4.C Derivation of ADMM Algorithm
390(4)
Appendix 4.D Low Rank Representation Induced Sparse Structural Equation Models
394(10)
Appendix 4.E Maximum Likelihood (ML) Estimation of Parameters for Dynamic Structural Equation Models
404(3)
Appendix 4.F Generalized Least Squares Estimator of the Parameters m Dynamic Structural Equation Models
407(4)
Appendix 4.G Proximal Algorithm for L1-Penalized Maximum Likelihood Estimation of Dynamic Structural Equation Model
411(6)
Appendix 4.H Proximal Algorithm for L1-Penalized Generalized Least Square Estimation of Parameters in the Dynamic Structural Equation Models
417(3)
Appendix 4.I Multikernel Learning and Spectral Clustering for Cell Type Identification
420(7)
Exercises
427(4)
5 Methylation Data Analysis 431(64)
5.1 DNA Methylation Analysis
431(3)
5.2 Epigenome-Wide Association Studies (EWAS)
434(3)
5.2.1 Single-Locus Test
434(1)
5.2.2 Set-Based Methods
434(3)
5.2.2.1 Logistic Regression Model
434(1)
5.2.2.2 Generalized T2 Test Statistic
435(1)
5.2.2.3 PCA
435(1)
5.2.2.4 Sequencing Kernel Association Test (SKAT)
436(1)
5.2.2.5 Canonical Correlation Analysis
436(1)
5.3 Epigenome-Wide Causal Studies
437(17)
5.3.1 Introduction
437(1)
5.3.2 Additive Functional Model for EWCS
438(16)
5.3.2.1 Mathematic Formulation of EACS
438(1)
5.3.2.2 Parameter Estimation
439(2)
5.3.2.3 Test for Independence
441(11)
5.3.2.4 Test Statistics for Epigenome-Wise Causal Studies
452(2)
5.4 Genome-Wide DNA Methylation Quantitative Trait Locus (mQTL) Analysis
454(2)
5.4.1 Simple Regression Model
454(1)
5.4.2 Multiple Regression Model
454(1)
5.4.3 Multivariate Regression Model
455(1)
5.4.4 Multivariate Multiple Regression Model
455(1)
5.4.5 Functional Linear Models for mQTL Analysis with Whole Genome Sequencing (WGS) Data
455(1)
5.4.6 Functional Linear Models with Both Functional Response and Predictors for mQTL Analysis with Both WGBS and WGS Data
456(1)
5.5 Causal Networks for Genetic-Methylation Analysis
456(28)
5.5.1 Structural Equation Models with Scalar Endogenous Variables and Functional Exogenous Variables
457(7)
5.5.1.1 Models
457(2)
5.5.1.2 The Two-Stage Least Squares Estimator
459(1)
5.5.1.3 Sparse FSEMs
460(4)
5.5.2 Functional Structural Equation Models with Functional Endogenous Variables and Scalar Exogenous Variables (FSEMs)
464(10)
5.5.2.1 Models
464(2)
5.5.2.2 The Two-Stage Least Squares Estimator
466(1)
5.5.2.3 Sparse FSEMs
467(7)
5.5.3 Functional Structural Equation Models with Both Functional Endogenous Variables and Exogenous Variables (FSEMF)
474(22)
5.5.3.1 Model
474(3)
5.5.3.2 Sparse FSEMF for the Estimation of Genotype-Methylation Networks with Sequencing Data
477(7)
Software Package
484(1)
Appendix 5.A Biased and Unbiased Estimators of the HSIC
484(5)
Appendix 5.B Asymptotic Null Distribution of Block-Based HSIC
489(2)
Exercises
491(4)
6 Imaging and Genomics 495(82)
6.1 Introduction
495(1)
6.2 Image Segmentation
496(42)
6.2.1 Unsupervised Learning Methods for Image Segmentation
496(34)
6.2.1.1 Nonnegative Matrix Factorization
496(6)
6.2.1.2 Autoencoders
502(5)
6.2.1.3 Parameter Estimation of Autoencoders
507(9)
6.2.1.4 Convolutional Neural Networks
516(14)
6.2.2 Supervised Deep Learning Methods for Image Segmentation
530(8)
6.2.2.1 Pixel-Level Image Segmentation
530(6)
6.2.2.2 Deconvolution Network for Semantic Segmentation
536(2)
6.3 Two- or Three-Dimensional Functional Principal Component Analysis for Image Data Reduction
538(6)
6.3.1 Formulation
539(1)
6.3.2 Integral Equation and Eigenfunctions
540(1)
6.3.3 Computations for the Function Principal Component Function and the Function Principal Component Score
541(3)
6.4 Association Analysis of Imaging-Genomic Data
544(10)
6.4.1 Multivariate Functional Regression Models for Imaging-Genomic Data Analysis
545(3)
6.4.1.1 Model
545(1)
6.4.1.2 Estimation of Additive Effects
545(2)
6.4.1.3 Test Statistics
547(1)
6.4.2 Multivariate Functional Regression Models for Longitudinal Imaging Genetics Analysis
548(3)
6.4.3 Quadratically Regularized Functional Canonical Correlation Analysis for Gene-Gene Interaction Detection in Imaging Genetic Studies
551(3)
6.4.3.1 Single Image Summary Measure
551(1)
6.4.3.2 Multiple Image Summary Measures
552(1)
6.4.3.3 CCA and Functional CCA for Interaction Analysis
552(2)
6.5 Causal Analysis of Imaging-Genomic Data
554(4)
6.5.1 Sparse SEMs for Joint Causal Analysis of Structural Imaging and Genomic Data
555(1)
6.5.2 Sparse Functional Structural Equation Models for Phenotype and Genotype Networks
556(1)
6.5.3 Conditional Gaussian Graphical Models (CGGMs) for Structural Imaging and Genomic Data Analysis
557(1)
6.6 Time Series SEMs for Integrated Causal Analysis of fMRI and Genomic Data
558(7)
6.6.1 Models
558(2)
6.6.2 Reduced Form Equations
560(1)
6.6.3 Single Equation and Generalized Least Square Estimator
561(1)
6.6.4 Sparse SEMs and Alternating Direction Method of Multipliers
562(3)
6.7 Causal Machine Learning
565(3)
Software Package
568(1)
Appendix 6.A Factor Graphs and Mean Field Methods for Prediction of Marginal Distribution
569(5)
Exercises
574(3)
7 From Association Analysis to Integrated Causal Inference 577(81)
7.1 Genome-Wide Causal Studies
578(52)
7.1.1 Mathematical Formulation of Causal Analysis
579(1)
7.1.2 Basic Causal Assumptions
580(1)
7.1.3 Linear Additive SEMs with Non-Gaussian Noise
581(3)
7.1.4 Information Geometry Approach
584(34)
7.1.4.1 Basics of Information Geometry
584(5)
7.1.4.2 Formulation of Causal Inference in Information Geometry
589(6)
7.1.4.3 Generalization
595(6)
7.1.4.4 Information Geometry for Causal Inference
601(2)
7.1.4.5 Information Geometry-Based Causal Inference Methods
603(15)
7.1.5 Causal Inference on Discrete Data
618(12)
7.1.5.1 Distance Correlation
619(1)
7.1.5.2 Properties of Distance Correlation and Test Statistics
620(2)
7.1.5.3 Distance Correlation for Causal Inference
622(4)
7.1.5.4 Additive Noise Models for Causal Inference on Discrete Data
626(4)
7.2 Multivariate Causal Inference and Causal Networks
630(13)
7.2.1 Markov Condition, Markov Equivalence, Faithfulness, and Minimality
631(4)
7.2.2 Multilevel Causal Networks for Integrative Omics and Imaging Data Analysis
635(8)
7.2.2.1 Introduction
635(1)
7.2.2.2 Additive Noise Models for Multiple Causal Networks
635(7)
7.2.2.3 Integer Programming as a General Framework for Joint Estimation of Multiple Causal Networks
642(1)
7.3 Causal Inference with Confounders
643(15)
7.3.1 Causal Sufficiency
644(1)
7.3.2 Instrumental Variables
644(4)
7.3.3 Confounders with Additive Noise Models
648(10)
7.3.3.1 Models
648(1)
7.3.3.2 Methods for Searching Common Confounder
649(2)
7.3.3.3 Gaussian Process Regression
651(6)
7.3.3.4 Algorithm for Confounder Identification Using Additive Noise Models for Confounder
657(1)
Software Package 658(1)
Appendix 7.A Approximation of Log-Likelihood Ratio for the LiNGAM 659(5)
Appendix 7.B Orthogonality Conditions and Covariance 664(3)
Appendix 7.C Equivalent Formulations Orthogonality Conditions 667(2)
Appendix 7.D M-L Distance in Backward Direction 669(2)
Appendix 7.E Multiplicativity of Traces 671(9)
Appendix 7.F Anisotropy and K-L Distance 680(2)
Appendix 7.G Trace Method for Noise Linear Model 682(5)
Appendix 7.H Characterization of Association 687(1)
Appendix 7.I Algorithm for Sparse Trace Method 687(4)
Appendix 7.J Derivation of the Distribution of the Prediction in the Bayesian Linear Models 691(4)
Exercises 695(2)
References 697(14)
Index 711
Momiao Xiong is a professor of Biostatistics at the University of Texas Health Science Center in Houston where he has worked since 1997. He received his PhD in 1993 from the University of Georgia.