Introduction |
|
xv | |
|
|
xix | |
About the Companion Website |
|
xxiii | |
|
Part I Feature Extraction from Big Multimedia Data |
|
|
1 | (88) |
|
1 Representation Learning on Large and Small Data |
|
|
3 | (28) |
|
|
|
|
|
|
|
3 | (2) |
|
1.2 Representative Deep CNNs |
|
|
5 | (10) |
|
|
6 | (1) |
|
1.2.1.1 ReLU Nonlinearity |
|
|
6 | (1) |
|
1.2.1.2 Data Augmentation |
|
|
7 | (1) |
|
|
8 | (1) |
|
|
8 | (1) |
|
1.2.2.1 MLP Convolutional Layer |
|
|
9 | (1) |
|
1.2.2.2 Global Average Pooling |
|
|
9 | (1) |
|
|
10 | (1) |
|
1.2.3.1 Very Small Convolutional Filters |
|
|
10 | (1) |
|
1.2.3.2 Multi-scale Training |
|
|
11 | (1) |
|
|
11 | (1) |
|
1.2.4.1 Inception Modules |
|
|
11 | (1) |
|
1.2.4.2 Dimension Reduction |
|
|
12 | (1) |
|
|
13 | (1) |
|
1.2.5.1 Residual Learning |
|
|
13 | (1) |
|
1.2.5.2 Identity Mapping by Shortcuts |
|
|
14 | (1) |
|
1.2.6 Observations and Remarks |
|
|
15 | (1) |
|
1.3 Transfer Representation Learning |
|
|
15 | (9) |
|
1.3.1 Method Specifications |
|
|
17 | (1) |
|
1.3.2 Experimental Results and Discussion |
|
|
18 | (1) |
|
1.3.2.1 Results of Transfer Representation Learning for OM |
|
|
19 | (1) |
|
1.3.2.2 Results of Transfer Representation Learning for Melanoma |
|
|
20 | (1) |
|
1.3.2.3 Qualitative Evaluation: Visualization |
|
|
21 | (2) |
|
1.3.3 Observations and Remarks |
|
|
23 | (1) |
|
|
24 | (1) |
|
|
25 | (6) |
|
2 Concept-Based and Event-Based Video Search in Large Video Collections |
|
|
31 | (30) |
|
|
|
|
|
|
|
32 | (1) |
|
2.2 Video preprocessing and Machine Learning Essentials |
|
|
33 | (2) |
|
2.2.1 Video Representation |
|
|
33 | (1) |
|
2.2.2 Dimensionality Reduction |
|
|
34 | (1) |
|
2.3 Methodology for Concept Detection and Concept-Based Video Search |
|
|
35 | (13) |
|
|
35 | (2) |
|
2.3.2 Cascades for Combining Different Video Representations |
|
|
37 | (1) |
|
2.3.2.1 Problem Definition and Search Space |
|
|
37 | (1) |
|
|
38 | (2) |
|
2.3.3 Multi-Task Learning for Concept Detection and Concept-Based Video Search |
|
|
40 | (1) |
|
2.3.4 Exploiting Label Relations |
|
|
41 | (1) |
|
|
42 | (1) |
|
2.3.5.1 Dataset and Experimental Setup |
|
|
42 | (1) |
|
2.3.5.2 Experimental Results |
|
|
43 | (4) |
|
2.3.5.3 Computational Complexity |
|
|
47 | (1) |
|
2.4 Methods for Event Detection and Event-Based Video Search |
|
|
48 | (6) |
|
|
48 | (1) |
|
2.4.2 Learning from Positive Examples |
|
|
49 | (1) |
|
2.4.3 Learning Solely from Textual Descriptors: Zero-Example Learning |
|
|
50 | (2) |
|
|
52 | (1) |
|
2.4.4.1 Dataset and Experimental Setup |
|
|
52 | (1) |
|
2.4.4.2 Experimental Results: Learning from Positive Examples |
|
|
53 | (1) |
|
2.4.4.3 Experimental Results: Zero-Example Learning |
|
|
53 | (1) |
|
|
54 | (1) |
|
|
55 | (1) |
|
|
55 | (6) |
|
3 Big Data Multimedia Mining: Feature Extraction Facing Volume, Velocity, and Variety |
|
|
61 | (28) |
|
|
|
|
|
|
|
61 | (3) |
|
3.2 Scalability through Parallelization |
|
|
64 | (1) |
|
3.2.1 Process Parallelization |
|
|
64 | (1) |
|
3.2.2 Data Parallelization |
|
|
64 | (1) |
|
3.3 Scalability through Feature Engineering |
|
|
65 | (3) |
|
3.3.1 Feature Reduction through Spatial Transformations |
|
|
66 | (1) |
|
3.3.2 Laplacian Matrix Representation |
|
|
66 | (2) |
|
3.3.3 Parallel latent Dirichlet allocation and bag of words |
|
|
68 | (1) |
|
3.4 Deep Learning-Based Feature Learning |
|
|
68 | (8) |
|
3.4.1 Adaptability that Conquers both Volume and Velocity |
|
|
70 | (2) |
|
3.4.2 Convolutional Neural Networks |
|
|
72 | (1) |
|
3.4.3 Recurrent Neural Networks |
|
|
73 | (1) |
|
3.4.4 Modular Approach to Scalability |
|
|
74 | (2) |
|
|
76 | (5) |
|
|
76 | (1) |
|
3.5.2 Spectrogram Creation |
|
|
77 | (1) |
|
3.5.3 CNN-Based Feature Extraction |
|
|
77 | (1) |
|
3.5.4 Structure of the CNNs |
|
|
78 | (1) |
|
3.5.5 Process Parallelization |
|
|
79 | (1) |
|
|
80 | (1) |
|
|
81 | (1) |
|
|
82 | (1) |
|
|
82 | (7) |
|
Part II Learning Algorithms for Large-Scale Multimedia |
|
|
89 | (120) |
|
4 Large-Scale Video Understanding with Limited Training Labels |
|
|
91 | (30) |
|
|
|
|
|
|
91 | (1) |
|
4.2 Video Retrieval with Hashing |
|
|
91 | (12) |
|
|
91 | (2) |
|
4.2.2 Unsupervised Multiple Feature Hashing |
|
|
93 | (1) |
|
|
93 | (1) |
|
4.2.2.2 The Objective Function of MFH |
|
|
93 | (2) |
|
|
95 | (1) |
|
4.2.2.3.1 Complexity Analysis |
|
|
96 | (1) |
|
4.2.3 Submodular Video Hashing |
|
|
97 | (1) |
|
|
97 | (1) |
|
|
97 | (1) |
|
4.2.3.3 Submodular Video Hashing |
|
|
98 | (1) |
|
|
99 | (1) |
|
4.2.4.1 Experiment Settings |
|
|
99 | (1) |
|
|
99 | (1) |
|
4.2.4.1.2 Visual Features |
|
|
99 | (1) |
|
4.2.4.1.3 Algorithms for Comparison |
|
|
100 | (1) |
|
|
100 | (1) |
|
|
100 | (1) |
|
4.2.4.2.2 Combined Dataset |
|
|
100 | (1) |
|
4.2.4.3 Evaluation of SVH |
|
|
101 | (1) |
|
|
102 | (1) |
|
4.3 Graph-Based Model for Video Understanding |
|
|
103 | (13) |
|
|
103 | (1) |
|
4.3.2 Optimized Graph Learning for Video Annotation |
|
|
104 | (1) |
|
|
104 | (1) |
|
|
104 | (1) |
|
4.3.2.2.1 Terms and Notations |
|
|
104 | (1) |
|
4.3.2.2.2 Optimal Graph-Based SSL |
|
|
105 | (1) |
|
4.3.2.2.3 Iterative Optimization |
|
|
106 | (1) |
|
4.3.3 Context Association Model for Action Recognition |
|
|
107 | (1) |
|
|
108 | (1) |
|
4.3.4 Graph-based Event Video Summarization |
|
|
109 | (1) |
|
|
109 | (1) |
|
4.3.4.2 Temporal Alignment |
|
|
110 | (1) |
|
4.3.5 TGIF: A New Dataset and Benchmark on Animated GIF Description |
|
|
111 | (1) |
|
|
111 | (1) |
|
|
112 | (2) |
|
|
114 | (1) |
|
4.3.6.1 Experimental Settings |
|
|
114 | (1) |
|
|
114 | (1) |
|
|
114 | (1) |
|
4.3.6.1.3 Baseline Methods and Evaluation Metrics |
|
|
114 | (1) |
|
|
115 | (1) |
|
4.4 Conclusions and Future Work |
|
|
116 | (1) |
|
|
116 | (5) |
|
5 Multimodal Fusion of Big Multimedia Data |
|
|
121 | (36) |
|
|
|
|
|
|
5.1 Multimodal Fusion in Multimedia Retrieval |
|
|
122 | (8) |
|
5.1.1 Unsupervised Fusion in Multimedia Retrieval |
|
|
123 | (1) |
|
5.1.1.1 Linear and Non-linear Similarity Fusion |
|
|
123 | (1) |
|
5.1.1.2 Cross-modal Fusion of Similarities |
|
|
124 | (1) |
|
5.1.1.3 Random Walks and Graph-based Fusion |
|
|
124 | (2) |
|
5.1.1.4 A Unifying Graph-based Model |
|
|
126 | (1) |
|
5.1.2 Partial Least Squares Regression |
|
|
127 | (1) |
|
5.1.3 Experimental Comparison |
|
|
128 | (1) |
|
5.1.3.1 Dataset Description |
|
|
128 | (1) |
|
|
129 | (1) |
|
|
129 | (1) |
|
5.1 A Late Fusion of Multiple Multimedia Rankings |
|
|
130 | (2) |
|
|
131 | (1) |
|
|
132 | (1) |
|
5.1.4.2.1 Borda Count Fusion |
|
|
132 | (1) |
|
5.1.4.2.2 Reciprocal Rank Fusion |
|
|
132 | (1) |
|
5.1.4.2.3 Condorcet Fusion |
|
|
132 | (1) |
|
5.2 Multimodal Fusion in Multimedia Classification |
|
|
132 | (19) |
|
|
134 | (2) |
|
5.2.2 Problem Formulation |
|
|
136 | (1) |
|
5.2.3 Probabilistic Fusion in Active Learning |
|
|
137 | (1) |
|
|
138 | (1) |
|
|
138 | (1) |
|
5.2.3.3 Incorporating Informativeness in the Selection (P(S\T)) |
|
|
139 | (1) |
|
5.2.3 A Measuring Oracle's Confidence (P(S\T)) |
|
|
139 | (1) |
|
|
140 | (1) |
|
5.2.4 Experimental Comparison |
|
|
141 | (1) |
|
|
141 | (1) |
|
|
142 | (1) |
|
|
143 | (1) |
|
5.2.4.3.1 Expanding with Positive, Negative or Both |
|
|
143 | (2) |
|
5.2.4.3.2 Comparing with Sample Selection Approaches |
|
|
145 | (2) |
|
5.2.4.3.3 Comparing with Fusion Approaches |
|
|
147 | (1) |
|
5.2.4.3.4 Parameter Sensitivity Investigation |
|
|
147 | (1) |
|
5.2.4.3.5 Comparing with Existing Methods |
|
|
148 | (3) |
|
|
151 | (1) |
|
|
152 | (5) |
|
6 Large-Scale Social Multimedia Analysis |
|
|
157 | (26) |
|
|
|
|
6.1 Social Multimedia in Social Media Streams |
|
|
157 | (10) |
|
|
157 | (1) |
|
6.1.2 Social Multimedia Streams |
|
|
158 | (2) |
|
6.1.3 Analysis of the Twitter Firehose |
|
|
160 | (1) |
|
6.1.3.1 Dataset: Overview |
|
|
160 | (1) |
|
6.1.3.2 Linked Resource Analysis |
|
|
160 | (2) |
|
6.1.3.3 Image Content Analysis |
|
|
162 | (2) |
|
6.1.3.4 Geographic Analysis |
|
|
164 | (2) |
|
|
166 | (1) |
|
6.2 Large-Scale Analysis of Social Multimedia |
|
|
167 | (3) |
|
6.2.1 Large-Scale Processing of Social Multimedia Analysis |
|
|
167 | (1) |
|
6.2.1.1 Batch-Processing Frameworks |
|
|
167 | (1) |
|
6.2.1.2 Stream-Processing Frameworks |
|
|
168 | (1) |
|
6.2.1.3 Distributed Processing Frameworks |
|
|
168 | (1) |
|
6.2.2 Analysis of Social Multimedia |
|
|
169 | (1) |
|
6.2.2.1 Analysis of Visual Content |
|
|
169 | (1) |
|
6.2.2.2 Analysis of Textual Content |
|
|
169 | (1) |
|
6.2.2.3 Analysis of Geographical Content |
|
|
170 | (1) |
|
6.2.2.4 Analysis of User Content |
|
|
170 | (1) |
|
6.3 Large-Scale Multimedia Opinion Mining System |
|
|
170 | (8) |
|
|
171 | (1) |
|
6.3.2 Implementation Details |
|
|
171 | (1) |
|
6.3.2.1 Social Media Data Crawler |
|
|
172 | (1) |
|
6.3.2.2 Social Multimedia Analysis |
|
|
173 | (1) |
|
6.3.2.3 Analysis of Visual Content |
|
|
174 | (1) |
|
6.3.3 Evaluations: Analysis of Visual Content |
|
|
175 | (1) |
|
6.3.3.1 Filtering of Synthetic Images |
|
|
175 | (2) |
|
6.3.3.2 Near-Duplicate Detection |
|
|
177 | (1) |
|
|
178 | (1) |
|
|
179 | (4) |
|
7 Privacy and Audiovisual Content: Protecting Users as Big Multimedia Data Grows Bigger |
|
|
183 | (26) |
|
|
|
|
|
|
|
|
183 | (5) |
|
7.1.1 The Dark Side of Big Multimedia Data |
|
|
184 | (1) |
|
7.1.2 Defining Multimedia Privacy |
|
|
184 | (4) |
|
7.2 Protecting User Privacy |
|
|
188 | (4) |
|
|
188 | (1) |
|
|
189 | (2) |
|
|
191 | (1) |
|
|
192 | (4) |
|
7.3.1 Privacy and Multimedia Big Data |
|
|
192 | (2) |
|
7.3.2 Privacy Threats of Multimedia Data |
|
|
194 | (1) |
|
|
194 | (1) |
|
|
195 | (1) |
|
7.3.2.3 Multimodal Threats |
|
|
195 | (1) |
|
7.4 Privacy-Related Multimedia Analysis Research |
|
|
196 | (3) |
|
7.4.1 Multimedia Analysis Filters |
|
|
196 | (2) |
|
7.4.2 Multimedia Content Masking |
|
|
198 | (1) |
|
7.5 The Larger Research Picture |
|
|
199 | (3) |
|
7.5.1 Multimedia Security and Trust |
|
|
199 | (1) |
|
|
200 | (2) |
|
7.6 Outlook on Multimedia Privacy Challenges |
|
|
202 | (3) |
|
7.6.1 Research Challenges |
|
|
202 | (1) |
|
7.6.1.1 Multimedia Analysis |
|
|
202 | (1) |
|
|
202 | (1) |
|
|
203 | (1) |
|
7.6.2 Research Reorientation |
|
|
204 | (1) |
|
7.6.2.1 Professional Paranoia |
|
|
204 | (1) |
|
7.6.2.2 Privacy as a Priority |
|
|
204 | (1) |
|
7.6.2.3 Privacy in Parallel |
|
|
205 | (1) |
|
|
205 | (4) |
|
Part III Scalability in Multimedia Access |
|
|
209 | (58) |
|
8 Data Storage and Management for Big Multimedia |
|
|
211 | (28) |
|
|
|
|
|
|
211 | (6) |
|
8.1.1 Multimedia Applications and Scale |
|
|
212 | (1) |
|
8.1.2 Big Data Management |
|
|
213 | (1) |
|
8.1.3 System Architecture Outline |
|
|
213 | (1) |
|
8.1.4 Metadata Storage Architecture |
|
|
214 | (1) |
|
8.1.4.1 Lambda Architecture |
|
|
214 | (1) |
|
|
215 | (1) |
|
|
216 | (1) |
|
|
216 | (1) |
|
|
216 | (1) |
|
8.1.5 Summary and Chapter Outline |
|
|
217 | (1) |
|
|
217 | (5) |
|
|
217 | (1) |
|
8.2.1.1 Secondary Storage |
|
|
218 | (1) |
|
8.2.1.2 The Five-Minute Rule |
|
|
218 | (1) |
|
8.2.1.3 Emerging Trends for Local Storage |
|
|
219 | (1) |
|
8.2.2 Distributed Storage |
|
|
220 | (1) |
|
8.2.2.1 Distributed Hash Tables |
|
|
221 | (1) |
|
8.2.2.2 The CAP Theorem and the PACELC Formulation |
|
|
221 | (1) |
|
8.2.2.3 The Hadoop Distributed File System |
|
|
221 | (1) |
|
|
222 | (1) |
|
|
222 | (1) |
|
|
222 | (4) |
|
8.3.1 Metadata Extraction |
|
|
223 | (1) |
|
|
223 | (1) |
|
8.3.2.1 Map-Reduce and Hadoop |
|
|
224 | (1) |
|
|
225 | (1) |
|
|
226 | (1) |
|
|
226 | (1) |
|
|
226 | (4) |
|
8.4.1 Distributed In-Memory Buffering |
|
|
227 | (1) |
|
8.4.1.1 Memcached and Redis |
|
|
227 | (1) |
|
|
227 | (1) |
|
8.4.1.3 Content Distribution Networks |
|
|
228 | (1) |
|
8.4.2 Metadata Retrieval and NoSQL Systems |
|
|
228 | (1) |
|
|
229 | (1) |
|
|
229 | (1) |
|
8.4.2.3 Wide Column Stores |
|
|
229 | (1) |
|
|
229 | (1) |
|
|
229 | (1) |
|
8.5 Case Studies: Facebook |
|
|
230 | (1) |
|
8.5.1 Data Popularity: Hot, Warm or Cold |
|
|
230 | (1) |
|
|
231 | (1) |
|
8.6 Conclusions and Future Work |
|
|
231 | (1) |
|
|
232 | (1) |
|
|
232 | (7) |
|
9 Perceptual Hashing for Large-Scale Multimedia Search |
|
|
239 | (28) |
|
|
|
|
|
240 | (5) |
|
|
240 | (1) |
|
9.1.2 Definitions and Properties of Perceptual Hashing |
|
|
241 | (2) |
|
9.1.3 Multimedia Search using Perceptual Hashing |
|
|
243 | (1) |
|
9.1.4 Applications of Perceptual Hashing |
|
|
243 | (1) |
|
9.1.5 Evaluating Perceptual Hash Algorithms |
|
|
244 | (1) |
|
9.2 Unsupervised Perceptual Hash Algorithms |
|
|
245 | (5) |
|
|
245 | (1) |
|
9.2.2 Iterative Quantization |
|
|
246 | (1) |
|
|
247 | (2) |
|
9.2.4 Kernelized Locality Sensitive Hashing |
|
|
249 | (1) |
|
9.3 Supervised Perceptual Hash Algorithms |
|
|
250 | (7) |
|
9.3.1 Semi-Supervised Hashing |
|
|
250 | (2) |
|
9.3.2 Kernel-Based Supervised Hashing |
|
|
252 | (1) |
|
9.3.3 Restricted Boltzmann Machine-Based Hashing |
|
|
253 | (2) |
|
9.3.4 Supervised Semantic-Preserving Deep Hashing |
|
|
255 | (2) |
|
9.4 Constructing Perceptual Hash Algorithms |
|
|
257 | (3) |
|
|
257 | (1) |
|
|
258 | (2) |
|
9.5 Conclusion and Discussion |
|
|
260 | (1) |
|
|
261 | (6) |
|
Part IV Applications of Large-Scale Multimedia Search |
|
|
267 | (63) |
|
10 Image Tagging with Deep Learning: Fine-Grained Visual Analysis |
|
|
269 | (20) |
|
|
|
|
269 | (1) |
|
10.2 Basic Deep Learning Models |
|
|
270 | (2) |
|
10.3 Deep Image Tagging for Fine-Grained Image Recognition |
|
|
272 | (9) |
|
10.3.1 Attention Proposal Network |
|
|
274 | (1) |
|
10.3.2 Classification and Ranking |
|
|
275 | (1) |
|
10.3.3 Multi-Scale Joint Representation |
|
|
276 | (1) |
|
10.3.4 Implementation Details |
|
|
276 | (1) |
|
10.3.5 Experiments on CUB-200-2011 |
|
|
277 | (3) |
|
10.3.6 Experiments on Stanford Dogs |
|
|
280 | (1) |
|
10.4 Deep Image Tagging for Fine-Grained Sentiment Analysis |
|
|
281 | (3) |
|
10.4.1 Learning Deep Sentiment Representation |
|
|
282 | (1) |
|
10.4.2 Sentiment Analysis |
|
|
283 | (1) |
|
10.4.3 Experiments on SentiBank |
|
|
283 | (1) |
|
|
284 | (1) |
|
|
285 | (4) |
|
11 Visually Exploring Millions of Images using Image Maps and Graphs |
|
|
289 | (28) |
|
|
|
11.1 Introduction and Related Work |
|
|
290 | (3) |
|
11.2 Algorithms for Image Sorting |
|
|
293 | (2) |
|
11.2.1 Self-Organizing Maps |
|
|
293 | (1) |
|
|
294 | (1) |
|
11.2.3 Evolutionary Algorithms |
|
|
295 | (1) |
|
11.3 Improving SOMs for Image Sorting |
|
|
295 | (3) |
|
11.3.1 Reducing SOM Sorting Complexity |
|
|
295 | (2) |
|
11.3.2 Improving SOM Projection Quality |
|
|
297 | (1) |
|
11.3.3 Combining SOMs and SSMs |
|
|
297 | (1) |
|
11.4 Quality Evaluation of Image Sorting Algorithms |
|
|
298 | (3) |
|
|
298 | (1) |
|
11.4.2 Normalized Cross-Correlation |
|
|
299 | (1) |
|
11.4.3 A New Image Sorting Quality Evaluation Scheme |
|
|
299 | (2) |
|
|
301 | (3) |
|
|
301 | (1) |
|
|
302 | (2) |
|
11.6 Demo System for Navigating 2D Image Maps |
|
|
304 | (2) |
|
11.7 Graph-Based Image Browsing |
|
|
306 | (6) |
|
11.7.1 Generating Semantic Image Features |
|
|
306 | (1) |
|
11.7.2 Building the Image Graph |
|
|
307 | (3) |
|
11.7.3 Visualizing and Navigating the Graph |
|
|
310 | (2) |
|
11.7 A Prototype for Image Graph Navigation |
|
|
312 | (1) |
|
11.8 Conclusion and Future Work |
|
|
313 | (1) |
|
|
313 | (4) |
|
12 Medical Decision Support Using Increasingly Large Multimodal Data Sets |
|
|
317 | (13) |
|
|
|
|
317 | (3) |
|
12.2 Methodology for Reviewing the Literature in this chapter |
|
|
320 | (1) |
|
12.3 Data, Ground Truth, and Scientific Challenges |
|
|
321 | (2) |
|
12.3.1 Data Annotation and Ground Truthing |
|
|
321 | (1) |
|
12.3.2 Scientific Challenges and Evaluation as a Service |
|
|
321 | (1) |
|
12.3.3 Other Medical Data Resources Available |
|
|
322 | (1) |
|
12.4 Techniques used for Multimodal Medical Decision Support |
|
|
323 | (3) |
|
12.4.1 Visual and Non-Visual Features Describing the Image Content |
|
|
323 | (1) |
|
12.4.2 General Machine Learning and Deep Learning |
|
|
323 | (3) |
|
12.5 Application Types of Image-Based Decision Support |
|
|
326 | (2) |
|
|
326 | (1) |
|
|
326 | (1) |
|
|
327 | (1) |
|
|
327 | (1) |
|
|
327 | (1) |
|
12.5.6 Automatic Image Annotation |
|
|
328 | (1) |
|
12.5.7 Other Application Types |
|
|
328 | (1) |
|
12.6 Discussion on Multimodal Medical Decision Support |
|
|
328 | (1) |
|
12.7 Outlook or the Next Steps of Multimodal Medical Decision Support |
|
|
329 | (1) |
References |
|
330 | (7) |
Conclusions and Future Trends |
|
337 | (2) |
Index |
|
339 | |