Unsupervised Feature Extraction Applied to Bioinformatics: A PCA Based and TD Based Approach 1st ed. 2020 [Kietas viršelis]

  • Formatas: Hardback, 321 pages, aukštis x plotis: 235x155 mm, weight: 676 g, 94 Illustrations, color; 17 Illustrations, black and white; XVIII, 321 p. 111 illus., 94 illus. in color., 1 Hardback
  • Serija: Unsupervised and Semi-Supervised Learning
  • Išleidimo metai: 05-Sep-2019
  • Leidėjas: Springer Nature Switzerland AG
  • ISBN-10: 3030224554
  • ISBN-13: 9783030224554
Kitos knygos pagal šią temą:
  • Formatas: Hardback, 321 pages, aukštis x plotis: 235x155 mm, weight: 676 g, 94 Illustrations, color; 17 Illustrations, black and white; XVIII, 321 p. 111 illus., 94 illus. in color., 1 Hardback
  • Serija: Unsupervised and Semi-Supervised Learning
  • Išleidimo metai: 05-Sep-2019
  • Leidėjas: Springer Nature Switzerland AG
  • ISBN-10: 3030224554
  • ISBN-13: 9783030224554
Kitos knygos pagal šią temą:

This book proposes applications of tensor decomposition to unsupervised feature extraction and feature selection. The author posits that although supervised methods including deep learning have become popular, unsupervised methods have their own advantages. He argues that this is the case because unsupervised methods are easy to learn since tensor decomposition is a conventional linear methodology. This book starts from very basic linear algebra and reaches the cutting edge methodologies applied to difficult situations when there are many features (variables) while only small number of samples are available. The author includes advanced descriptions about tensor decomposition including Tucker decomposition using high order singular value decomposition as well as higher order orthogonal iteration, and train tenor decomposition. The author concludes by showing unsupervised methods and their application to a wide range of topics. 


  • Allows readers to analyze data sets with small samples and many features;
  • Provides a fast algorithm, based upon linear algebra, to analyze big data;
  • Includes several applications to multi-view data analyses, with a focus on bioinformatics.

Part I Mathematical Preparations
1 Introduction to Linear Algebra
3(20)
1.1 Introduction
3(1)
1.2 Scalars
3(2)
1.2.1 Scalars
3(1)
1.2.2 Dummy Scalars
4(1)
1.2.3 Generating New Features by Arithmetic
5(1)
1.3 Vectors
5(6)
1.3.1 Vectors
5(1)
1.3.2 Geometrical Interpretation of Vectors: One Dimension
6(1)
1.3.3 Geometrical Interpretation of Vectors: Two Dimensions
7(2)
1.3.4 Geometrical Interpretation of Vectors: Features
9(1)
1.3.5 Generating New Features by Arithmetic
10(1)
1.3.6 Dummy Vectors
10(1)
1.4 Matrices
11(5)
1.4.1 Equivalences to Geometrical Representation
12(1)
1.4.2 Matrix Manipulation and Feature Generation
13(3)
1.5 Tensors
16(6)
1.5.1 Introduction of Tensors
16(1)
1.5.2 Geometrical Representation of Tensors
17(2)
1.5.3 Generating New Features
19(1)
1.5.4 Tensor Algebra
19(3)
Appendix
22(1)
Rank
22(1)
2 Matrix Factorization
23(24)
2.1 Introduction
23(1)
2.2 Matrix Factorization
23(7)
2.2.1 Rank Factorization
24(1)
2.2.2 Singular Value Decomposition
25(5)
2.3 Principal Component Analysis
30(1)
2.4 Equivalence Between PC A and SVD
31(2)
2.5 Geometrical Representation of PCA
33(5)
2.5.1 PCA Selects the Axis with the Maximal Variance
33(3)
2.5.2 PCA Selects the Axis with Minimum Residuals
36(1)
2.5.3 Non-equivalence Between Two PCAs
37(1)
2.6 PCA as a Clustering Method
38(5)
Appendix
43(2)
Proof of Theorem 2.1
43(2)
References
45(2)
3 Tensor Decomposition
47(34)
3.1 Three Principal Realizations of TD
47(4)
3.2 Performance of TDs as Tools Reducing the Degrees of Freedoms
51(6)
3.2.1 Tucker Decomposition
51(2)
3.2.2 CP Decomposition
53(2)
3.2.3 Tensor Train Decomposition
55(1)
3.2.4 TDs Are Not Always Interpretable
56(1)
3.3 Various Algorithms to Compute TDs
57(10)
3.3.1 CP Decomposition
58(4)
3.3.2 Tucker Decomposition
62(3)
3.3.3 Tensor Train Decomposition
65(2)
3.4 Interpretation Using TD
67(4)
3.5 Summary
71(3)
3.5.1 CP Decomposition
72(1)
3.5.2 Tucker Decomposition
72(1)
3.5.3 Tensor Train Decomposition
73(1)
3.5.4 Superiority of Tucker Decomposition
73(1)
Appendix
74(4)
Moore-Penrose Pseudoinverse
74(4)
References
78(3)
Part II Feature Extractions
4 PCA Based Unsupervised FE
81(22)
4.1 Introduction: Feature Extraction vs Feature Selection
81(1)
4.2 Various Feature Selection Procedures
82(3)
4.3 PCA Applied to More Complicated Patterns
85(7)
4.4 Identification of Non-sinusoidal Periodicity by PCA Based Unsupervised FE
92(5)
4.5 Null Hypothesis
97(1)
4.6 Feature Selection with Considering P-Values
98(3)
4.7 Stability
101(1)
4.8 Summary
102(1)
Reference
102(1)
5 TD Based Unsupervised FE
103(16)
5.1 TD as a Feature Selection Tool
103(4)
5.2 Comparisons with Other TDs
107(2)
5.3 Generation of a Tensor From Matrices
109(1)
5.4 Reduction of Number of Dimensions of Tensors
110(1)
5.5 Identification of Correlated Features Using Type I Tensor
111(3)
5.6 Identification of Correlated Features Using Type II Tensor
114(1)
5.7 Summary
115(1)
Reference
116(3)
Part III Applications to Bioinformatics
6 Applications of PCA Based Unsupervised FE to Bioinformatics
119(94)
6.1 Introduction
119(1)
6.2 Some Introduction to Genomic Science
119(4)
6.2.1 Central Dogma
120(1)
6.2.2 Regulation of Transcription
120(1)
6.2.3 The Technologies to Measure the Amount of Transcript
121(1)
6.2.4 Various Factors that Regulate the Amount of Transcript
121(1)
6.2.5 Other Factors to Be Considered
122(1)
6.3 Biomarker Identification
123(22)
6.3.1 Biomarker Identification Using Circulating miRNA
123(12)
6.3.2 Circulating miRNAs as Universal Disease Biomarker
135(2)
6.3.3 Biomarker Identification Using Exosomal miRNAs
137(8)
6.4 Integrated Analysis of mRNA and miRNA Expression
145(17)
6.4.1 Understanding Soldier's Heart From the mRNA and miRNA
145(13)
6.4.2 Identifications of Interactions Between miRNAs and mRNAs in Multiple Cancers
158(4)
6.5 Integrated Analysis of Methylation and Gene Expression
162(20)
6.5.1 Aberrant Promoter Methylation and Expression Associated with Metastasis
163(4)
6.5.2 Epigenetic Therapy Target Identification Based upon Gene Expression and Methylation Profile
167(11)
6.5.3 Identification of Genes Mediating Transgenerational Epigenetics Based upon Integrated Analysis of mRNA Expression and Promoter Methylation
178(4)
6.6 Time Development Analysis
182(20)
6.6.1 Identification of Cell Division Cycle Genes
185(9)
6.6.2 Identification of Disease Driving Genes
194(8)
6.7 Gene Selection for Single Cell RNA-seq
202(4)
6.8 Summary
206(1)
References
207(6)
7 Application of TD Based Unsupervised FE to Bioinformatics
213(84)
7.1 Introduction
213(1)
7.2 PTSD Mediated Heart Diseases
213(6)
7.3 Drug Discovery From Gene Expression
219(8)
7.4 Universality of miRNA Transfection
227(4)
7.5 One-Class Differential Expression Analysis for Multiomics Data Set
231(8)
7.6 General Examples of Case I and II Tensors
239(13)
7.6.1 Integrated Analysis of mRNA and miRNA
239(8)
7.6.2 Temporally Differentially Expressed Genes
247(5)
7.7 Gene Expression and Methylation in Social Insects
252(5)
7.8 Drug Discovery From Gene Expression: II
257(4)
7.9 Integrated Analysis of miRNA Expression and Methylation
261(6)
7.10 Summary
267(1)
Appendix
268(26)
Universality of miRNA Transfection
268(12)
Drug Discovery From Gene Expression: II
280(14)
References
294(3)
A Various Implementations of TD 297(4)
A.1 Introduction
297(1)
A.2 R
297(1)
A.2.1 rTensor
297(1)
A.2.2 ttTensor
298(1)
A.3 Python
298(1)
A.3.1 HOTTBOX
298(1)
A.4 MATLAB
298(1)
A.4.1 Tensor Toolbox
298(1)
A.5 julia
299(1)
A.5.1 TensorDecompositions.jl
299(1)
A.6 TensorFlow
299(2)
A.6.1 t3f
299(2)
B List of Published Papers Related to the Methods 301(4)
References
301(4)
Glossary 305(2)
Solutions 307(12)
Index 319
Prof. Taguchi is currently a Professor at Department of Physics, Chuo University. Prof. Taguchi received a master degree in Statistical Physics from Tokyo Institute of Technology, Japan in 1986, and PhD degree in Non-linear Physics from Tokyo Institute of Technology, Tokyo, Japan in 1988. He worked at Tokyo Institute of Technology and Chuo University. He is with Chuo University (Tokyo, Japan) since 1997. He currently holds the Professor position at this university. His main research interests are in the area of Bioinformatics, especially, multi-omics data analysis using linear algebra. Dr. Taguchi has published a book on bioinformatics, more than 100 journal papers, book chapters and papers in conference proceedings.