Atnaujinkite slapukų nuostatas

Kernel-based Data Fusion for Machine Learning: Methods and Applications in Bioinformatics and Text Mining [Kietas viršelis]

  • Formatas: Hardback, 214 pages, aukštis x plotis: 235x155 mm, weight: 554 g, XIV, 214 p., 1 Hardback
  • Serija: Studies in Computational Intelligence 345
  • Išleidimo metai: 26-Mar-2011
  • Leidėjas: Springer-Verlag Berlin and Heidelberg GmbH & Co. K
  • ISBN-10: 3642194052
  • ISBN-13: 9783642194054
Kitos knygos pagal šią temą:
  • Formatas: Hardback, 214 pages, aukštis x plotis: 235x155 mm, weight: 554 g, XIV, 214 p., 1 Hardback
  • Serija: Studies in Computational Intelligence 345
  • Išleidimo metai: 26-Mar-2011
  • Leidėjas: Springer-Verlag Berlin and Heidelberg GmbH & Co. K
  • ISBN-10: 3642194052
  • ISBN-13: 9783642194054
Kitos knygos pagal šią temą:
Data fusion problems arise frequently in many different fields.  This book provides a specific introduction to data fusion problems using support vector machines. In the first part, this book begins with a brief survey of additive models and Rayleigh quotient objectives in machine learning, and then introduces kernel fusion as the additive expansion of support vector machines in the dual problem.  The second part presents several novel kernel fusion algorithms and some real applications in supervised and unsupervised learning. The last part of the book substantiates the value of the proposed theories and algorithms in MerKator, an open software to identify disease relevant genes based on the integration of heterogeneous genomic data sources in multiple species.The topics presented in this book are meant for researchers or students who use support vector machines. Several topics addressed in the book may also be interesting to computational biologists who want to tackle data fusion challenges in real applications. The background required of the reader is a good knowledge of data mining, machine learning and linear algebra.

Data fusion problems arise in many different fields. This book provides a specific introduction to solve data fusion problems using support vector machines. The reader will require a good knowledge of data mining, machine learning and linear algebra.

Recenzijos

From the reviews:

The book provides an introduction to data fusion problems using support vector machines (SVMs). The book is meant for researchers, scientists and engineers using SVMs, or other statistical learning methods, but it also may be used as a reference material for graduate courses in machine learning and data mining. (Florin Gorunescu, Zentralblatt MATH, Vol. 1227, 2012)

1 Introduction
1(26)
1.1 General Background
1(3)
1.2 Historical Background of Multi-source Learning and Data Fusion
4(14)
1.2.1 Canonical Correlation and Its Probabilistic Interpretation
4(1)
1.2.2 Inductive Logic Programming and the Multi-source Learning Search Space
5(1)
1.2.3 Additive Models
6(1)
1.2.4 Bayesian Networks for Data Fusion
7(2)
1.2.5 Kernel-based Data Fusion
9(9)
1.3 Topics of This Book
18(3)
1.4
Chapter by
Chapter Overview
21(6)
References
22(5)
2 Rayleigh Quotient-Type Problems in Machine Learning
27(12)
2.1 Optimization of Rayleigh Quotient
27(3)
2.1.1 Rayleigh Quotient and Its Optimization
27(1)
2.1.2 Generalized Rayleigh Quotient
28(1)
2.1.3 Trace Optimization of Generalized Rayleigh Quotient-Type Problems
28(2)
2.2 Rayleigh Quotient-Type Problems in Machine Learning
30(5)
2.2.1 Principal Component Analysis
30(1)
2.2.2 Canonical Correlation Analysis
30(1)
2.2.3 Fisher Discriminant Analysis
31(1)
2.2.4 k-means Clustering
32(1)
2.2.5 Spectral Clustering
33(1)
2.2.6 Kernel-Laplacian Clustering
33(1)
2.2.7 One Class Support Vector Machine
34(1)
2.3 Summary
35(4)
References
37(2)
3 Ln-norm Multiple Kernel Learning and Least Squares Support Vector Machines
39(50)
3.1 Background
39(1)
3.2 Acronyms
40(2)
3.3 The Norms of Multiple Kernel Learning
42(4)
3.3.1 L∞-norm MKL
42(1)
3.3.2 L2-norm MKL
43(1)
3.3.3 Ln-norm MKL
44(2)
3.4 One Class SVM MKL
46(2)
3.5 Support Vector Machine MKL for Classification
48(5)
3.5.1 The Conic Formulation
48(2)
3.5.2 The Semi Infinite Programming Formulation
50(3)
3.6 Least Squares Support Vector Machines MKL for Classification
53(3)
3.6.1 The Conic Formulation
53(1)
3.6.2 The Semi Infinite Programming Formulation
54(2)
3.7 Weighted SVM MKL and Weighted LSSVM MKL
56(2)
3.7.1 Weighted SVM
56(1)
3.7.2 Weighted SVM MKL
56(1)
3.7.3 Weighted LSSVM
57(1)
3.7.4 Weighted LSSVM MKL
58(1)
3.8 Summary of Algorithms
58(1)
3.9 Numerical Experiments
59(4)
3.9.1 Overview of the Convexity and Complexity
59(1)
3.9.2 QP Formulation Is More Efficient than SOCP
59(1)
3.9.3 SIP Formulation Is More Efficient than QCQP
60(3)
3.10 MKL Applied to Real Applications
63(20)
3.10.1 Experimental Setup and Data Sets
63(4)
3.10.2 Results
67(16)
3.11 Discussions
83(1)
3.12 Summary
84(5)
References
84(5)
4 Optimized Data Fusion for Kernel k-means Clustering
89(20)
4.1 Introduction
89(1)
4.2 Objective of k-means Clustering
90(2)
4.3 Optimizing Multiple Kernels for k-means
92(2)
4.4 Bi-level Optimization of k-means on Multiple Kernels
94(5)
4.4.1 The Role of Cluster Assignment
94(1)
4.4.2 Optimizing the Kernel Coefficients as KFD
94(2)
4.4.3 Solving KFD as LSSVM Using Multiple Kernels
96(2)
4.4.4 Optimized Data Fusion for Kernel k-means Clustering (OKKC)
98(1)
4.4.5 Computational Complexity
98(1)
4.5 Experimental Results
99(4)
4.5.1 Data Sets and Experimental Settings
99(2)
4.5.2 Results
101(2)
4.6 Summary
103(6)
References
105(4)
5 Multi-view Text Mining for Disease Gene Prioritization and Clustering
109(36)
5.1 Introduction
109(1)
5.2 Background: Computational Gene Prioritization
110(1)
5.3 Background: Clustering by Heterogeneous Data Sources
111(1)
5.4 Single View Gene Prioritization: A Fragile Model with Respect to the Uncertainty
112(1)
5.5 Data Fusion for Gene Prioritization: Distribution Free Method
112(4)
5.6 Multi-view Text Mining for Gene Prioritization
116(8)
5.6.1 Construction of Controlled Vocabularies from Multiple Bio-ontologies
116(3)
5.6.2 Vocabularies Selected from Subsets of Ontologies
119(1)
5.6.3 Merging and Mapping of Controlled Vocabularies
119(3)
5.6.4 Text Mining
122(1)
5.6.5 Dimensionality Reduction of Gene-By-Term Data by Latent Semantic Indexing
122(1)
5.6.6 Algorithms and Evaluation of Gene Prioritization Task
123(1)
5.6.7 Benchmark Data Set of Disease Genes
124(1)
5.7 Results of Multi-view Prioritization
124(6)
5.7.1 Multi-view Performs Better than Single View
124(2)
5.7.2 Effectiveness of Multi-view Demonstrated on Various Number of Views
126(1)
5.7.3 Effectiveness of Multi-view Demonstrated on Disease Examples
127(3)
5.8 Multi-view Text Mining for Gene Clustering
130(3)
5.8.1 Algorithms and Evaluation of Gene Clustering Task
130(2)
5.8.2 Benchmark Data Set of Disease Genes
132(1)
5.9 Results of Multi-view Clustering
133(6)
5.9.1 Multi-view Performs Better than Single View
133(2)
5.9.2 Dimensionality Reduction of Gene-By-Term Profiles for Clustering
135(2)
5.9.3 Multi-view Approach Is Better than Merging Vocabularies
137(1)
5.9.4 Effectiveness of Multi-view Demonstrated on Various Numbers of Views
137(1)
5.9.5 Effectiveness of Multi-view Demonstrated on Disease Examples
137(2)
5.10 Discussions
139(1)
5.11 Summary
140(5)
References
141(4)
6 Optimized Data Fusion for k-means Laplacian Clustering
145(28)
6.1 Introduction
145(1)
6.2 Acronyms
146(3)
6.3 Combine Kernel and Laplacian for Clustering
149(2)
6.3.1 Combine Kernel and Laplacian as Generalized Rayleigh Quotient for Clustering
149(1)
6.3.2 Combine Kernel and Laplacian as Additive Models for Clustering
150(1)
6.4 Clustering by Multiple Kernels and Laplacians
151(5)
6.4.1 Optimize A with Given θ
153(1)
6.4.2 Optimize θ with Given A
153(2)
6.4.3 Algorithm: Optimized Kernel Laplacian Clustering
155(1)
6.5 Data Sets and Experimental Setup
156(2)
6.6 Results
158(12)
6.7 Summary
170(3)
References
171(2)
7 Weighted Multiple Kernel Canonical Correlation
173(18)
7.1 Introduction
173(1)
7.2 Acronyms
174(1)
7.3 Weighted Multiple Kernel Canonical Correlation
175(3)
7.3.1 Linear CCA on Multiple Data Sets
175(1)
7.3.2 Multiple Kernel CCA
175(2)
7.3.3 Weighted Multiple Kernel CCA
177(1)
7.4 Computational Issue
178(3)
7.4.1 Standard Eigenvalue Problem for WMKCCA
178(1)
7.4.2 Incomplete Cholesky Decomposition
179(1)
7.4.3 Incremental Eigenvalue Solution for WMKCCA
180(1)
7.5 Learning from Heterogeneous Data Sources by WMKCCA
181(2)
7.6 Experiment
183(6)
7.6.1 Classification in the Canonical Spaces
183(2)
7.6.2 Efficiency of the Incremental EVD Solution
185(1)
7.6.3 Visualization of Data in the Canonical Spaces
185(4)
7.7 Summary
189(2)
References
190(1)
8 Cross-Species Candidate Gene Prioritization with MerKator
191(16)
8.1 Introduction
191(1)
8.2 Data Sources
192(2)
8.3 Kernel Workflow
194(3)
8.3.1 Approximation of Kernel Matrices Using Incomplete Cholesky Decomposition
194(1)
8.3.2 Kernel Centering
195(2)
8.3.3 Missing Values
197(1)
8.4 Cross-Species Integration of Prioritization Scores
197(3)
8.5 Software Structure and Interface
200(1)
8.6 Results and Discussion
201(2)
8.7 Summary
203(4)
References
204(3)
9 Conclusion
207(2)
Index 209