Foreword |
|
xv | |
Preface |
|
xviii | |
Acknowledgments |
|
xx | |
1 Introduction |
|
1 | (28) |
|
1.1 What are Data Science, Machine Learning and Artificial Intelligence? |
|
|
2 | (6) |
|
|
8 | (2) |
|
1.3 From Strategy to Use Cases |
|
|
10 | (8) |
|
|
11 | (5) |
|
|
16 | (1) |
|
1.3.3 Modeling and Analysis |
|
|
17 | (1) |
|
1.4 Use Case Implementation |
|
|
18 | (4) |
|
1.4.1 Iterative Exploration of Use Cases |
|
|
19 | (2) |
|
1.4.2 End-to-End Data Processing |
|
|
21 | (1) |
|
|
22 | (1) |
|
1.5 Real-Life Use Case Examples |
|
|
22 | (3) |
|
1.5.1 Value Chain Digitization (VCD) |
|
|
22 | (1) |
|
1.5.2 Marketing Segment Analytics |
|
|
23 | (1) |
|
1.5.3 360° View of the Customer |
|
|
23 | (1) |
|
1.5.4 NGO and Sustainability Use Cases |
|
|
24 | (1) |
|
|
25 | (2) |
|
|
27 | (2) |
2 Infrastructure |
|
29 | (40) |
|
|
|
29 | (2) |
|
|
31 | (7) |
|
2.2.1 Distributed Systems |
|
|
34 | (3) |
|
2.2.2 Hardware for AI Applications |
|
|
37 | (1) |
|
2.3 Linux Essentials for Data Professionals |
|
|
38 | (16) |
|
|
54 | (4) |
|
|
58 | (10) |
|
|
61 | (4) |
|
2.5.2 Cloud-native Solutions |
|
|
65 | (3) |
|
|
68 | (1) |
3 Data Architecture |
|
69 | (32) |
|
|
|
69 | (5) |
|
3.1.1 Maslow's Hierarchy of Needs for Data |
|
|
69 | (2) |
|
3.1.2 Data Architecture Requirements |
|
|
71 | (1) |
|
3.1.3 The Structure of a Typical Data Architecture |
|
|
71 | (1) |
|
3.1.4 ETL (Extract, Transform, Load) |
|
|
72 | (1) |
|
3.1.5 ELT (Extract, Load, Transform) |
|
|
73 | (1) |
|
|
73 | (1) |
|
3.2 Data Ingestion and Integration |
|
|
74 | (5) |
|
|
74 | (1) |
|
3.2.2 Traditional File Formats |
|
|
75 | (2) |
|
3.2.3 Modern File Formats |
|
|
77 | (2) |
|
|
79 | (1) |
|
3.3 Data Warehouses, Data Lakes, and Lakehouses |
|
|
79 | (7) |
|
|
79 | (4) |
|
3.3.2 Data Lakes and the Lakehouse |
|
|
83 | (2) |
|
3.3.3 Summary: Comparing Data Warehouses to Lakehouses |
|
|
85 | (1) |
|
3.4 Data Processing and Transformation |
|
|
86 | (8) |
|
3.4.1 Big Data & Apache Spark |
|
|
86 | (7) |
|
|
93 | (1) |
|
3.5 Workflow Orchestration |
|
|
94 | (2) |
|
3.6 A Data Architecture Use Case |
|
|
96 | (4) |
|
|
100 | (1) |
4 Data Engineering |
|
101 | (30) |
|
|
|
|
102 | (23) |
|
|
102 | (6) |
|
4.1.2 Designing Data Pipelines |
|
|
108 | (2) |
|
|
110 | (2) |
|
4.1.4 Programming Languages |
|
|
112 | (3) |
|
4.1.5 Kafka as Reference ETL Tool |
|
|
115 | (4) |
|
|
119 | (1) |
|
4.1.7 Automation of the Stages |
|
|
120 | (1) |
|
4.1.8 Six Building Blocks of the Data Pipeline |
|
|
120 | (5) |
|
4.2 Managing Analytical Models |
|
|
125 | (5) |
|
|
126 | (1) |
|
|
127 | (1) |
|
4.2.3 Model or Parameter Update |
|
|
128 | (1) |
|
|
128 | (1) |
|
4.2.5 Feedback into the Operational Processes |
|
|
129 | (1) |
|
|
130 | (1) |
5 Data Management |
|
131 | (22) |
|
|
|
|
133 | (10) |
|
|
134 | (2) |
|
|
136 | (4) |
|
|
140 | (1) |
|
5.1.4 Master Data Management |
|
|
141 | (1) |
|
|
142 | (1) |
|
|
143 | (8) |
|
5.2.1 Data Classification |
|
|
144 | (1) |
|
|
145 | (2) |
|
|
147 | (2) |
|
|
149 | (1) |
|
|
150 | (1) |
|
|
151 | (2) |
6 Mathematics |
|
153 | (26) |
|
|
|
154 | (9) |
|
6.1.1 Vectors and Matrices |
|
|
154 | (3) |
|
6.1.2 Operations between Vectors and Matrices |
|
|
157 | (3) |
|
6.1.3 Linear Transformations |
|
|
160 | (1) |
|
6.1.4 Eigenvalues, Eigenvectors, and Eigendecomposition |
|
|
161 | (1) |
|
6.1.5 Other Matrix Decompositions |
|
|
162 | (1) |
|
6.2 Calculus and Optimization |
|
|
163 | (7) |
|
|
164 | (2) |
|
6.2.2 Gradient and Hessian |
|
|
166 | (1) |
|
|
167 | (2) |
|
6.2.4 Constrained Optimization |
|
|
169 | (1) |
|
|
170 | (7) |
|
6.3.1 Discrete and Continuous Random Variables |
|
|
171 | (3) |
|
6.3.2 Expected Value, Variance, and Covariance |
|
|
174 | (2) |
|
6.3.3 Independence, Conditional Distributions, and Bayes' Theorem |
|
|
176 | (1) |
|
|
177 | (2) |
7 Statistics - Basics |
|
179 | (22) |
|
|
|
|
|
180 | (1) |
|
7.2 Simple Linear Regression |
|
|
181 | (8) |
|
7.3 Multiple Linear Regression |
|
|
189 | (2) |
|
|
191 | (7) |
|
7.5 How Good is Our Model? |
|
|
198 | (1) |
|
|
199 | (2) |
8 Machine Learning |
|
201 | (38) |
|
|
|
|
|
201 | (2) |
|
8.2 Basics: Feature Spaces |
|
|
203 | (3) |
|
8.3 Classification Models |
|
|
206 | (3) |
|
8.3.1 K-Nearest-Neighbor-Classifier |
|
|
206 | (1) |
|
8.3.2 Support Vector Machine |
|
|
207 | (1) |
|
|
208 | (1) |
|
|
209 | (6) |
|
|
210 | (1) |
|
8.4.2 Bagging: Random Forests |
|
|
211 | (4) |
|
|
215 | (1) |
|
8.5 Artificial Neural Networks and the Perceptron |
|
|
215 | (3) |
|
8.6 Learning without Labels - Finding Structure |
|
|
218 | (3) |
|
|
218 | (1) |
|
|
219 | (1) |
|
|
220 | (1) |
|
8.7 Reinforcement Learning |
|
|
221 | (2) |
|
|
223 | (1) |
|
8.9 Into the Depth - Deep Learning |
|
|
224 | (11) |
|
8.9.1 Convolutional Neural Networks |
|
|
224 | (1) |
|
8.9.2 Training Convolutional Neural Networks |
|
|
225 | (2) |
|
8.9.3 Recurrent Neural Networks |
|
|
227 | (1) |
|
8.9.4 Long Short-Term Memory |
|
|
228 | (2) |
|
8.9.5 Autoencoders and U-Nets |
|
|
230 | (1) |
|
8.9.6 Adversarial Training Approaches |
|
|
231 | (1) |
|
8.9.7 Generative Adversarial Networks |
|
|
232 | (2) |
|
8.9.8 Cycle GANs and Style GANs |
|
|
234 | (1) |
|
8.9.9 Other Architectures and Learning Strategies |
|
|
235 | (1) |
|
8.10 Validation Strategies for Machine Learning Techniques |
|
|
235 | (2) |
|
|
237 | (1) |
|
|
237 | (2) |
9 Building Great Artificial Intelligence |
|
239 | (34) |
|
|
9.1 How AI Relates to Data Science and Machine Learning |
|
|
239 | (4) |
|
9.2 A Brief History of AI |
|
|
243 | (2) |
|
9.3 Five Recommendations for Designing an AI Solution |
|
|
245 | (23) |
|
9.3.1 Recommendation No. 1: Be pragmatic |
|
|
245 | (2) |
|
9.3.2 Recommendation No. 2: Make it easier for machines to learn - create inductive biases |
|
|
247 | (5) |
|
9.3.3 Recommendation No. 3; Perform analytics |
|
|
252 | (2) |
|
9.3.4 Recommendation No. 4: Beware of the scaling trap |
|
|
254 | (9) |
|
9.3.5 Recommendation No. 5: Beware of the generality trap (there is no such a thing as free lunch) |
|
|
263 | (5) |
|
9.4 Human-level Intelligence |
|
|
268 | (2) |
|
|
270 | (3) |
10 Natural Language Processing (NLP) |
|
273 | (44) |
|
|
10.1 What is NLP and Why is it so Valuable? |
|
|
273 | (2) |
|
10.2 NLP Data Preparation Techniques |
|
|
275 | (8) |
|
|
275 | (6) |
|
10.2.2 Converting the Input Format for Machine Learning |
|
|
281 | (2) |
|
10.3 NLP Tasks and Methods |
|
|
283 | (29) |
|
10.3.1 Rule-Based (Symbolic) NLP |
|
|
284 | (3) |
|
10.3.2 Statistical Machine Learning Approaches |
|
|
287 | (8) |
|
|
295 | (6) |
|
|
301 | (11) |
|
10.4 At the Cutting Edge: Current Research Focuses for NLP |
|
|
312 | (2) |
|
|
314 | (3) |
11 Computer Vision |
|
317 | (30) |
|
|
11.1 What is Computer Vision? |
|
|
317 | (2) |
|
11.2 A Picture Paints a Thousand Words |
|
|
319 | (9) |
|
|
319 | (2) |
|
11.2.2 Image Acquisition Principle |
|
|
321 | (5) |
|
11.2.3 Digital File Formats |
|
|
326 | (1) |
|
|
327 | (1) |
|
11.3 I Spy With My Little Eye Something That Is... |
|
|
328 | (6) |
|
11.3.1 Computational Photography and Image Manipulation |
|
|
330 | (4) |
|
11.4 Computer Vision Applications & Future Directions |
|
|
334 | (7) |
|
11.4.1 Image Retrieval Systems |
|
|
334 | (3) |
|
11.4.2 Object Detection, Classification and Tracking |
|
|
337 | (1) |
|
11.4.3 Medical Computer Vision |
|
|
338 | (3) |
|
|
341 | (2) |
|
|
343 | (4) |
12 Modelling and Simulation - Create your own Models |
|
347 | (38) |
|
|
|
|
347 | (2) |
|
|
349 | (1) |
|
12.3 Modelling to Answer Questions |
|
|
349 | (2) |
|
12.4 Reproducibility and Model Lifecycle |
|
|
351 | (10) |
|
12.4.1 The Lifecycle of a Modelling and Simulation Question |
|
|
352 | (2) |
|
12.4.2 Parameter and Output Definition |
|
|
354 | (3) |
|
|
357 | (1) |
|
12.4.4 Verification and Validation |
|
|
357 | (4) |
|
|
361 | (10) |
|
12.5.1 Ordinary Differential Equations (ODES) |
|
|
361 | (1) |
|
12.5.2 System Dynamics (SD) |
|
|
362 | (3) |
|
12.5.3 Discrete Event Simulation |
|
|
365 | (3) |
|
12.5.4 Agent-Based Modelling |
|
|
368 | (3) |
|
12.6 Modelling and Simulation Examples |
|
|
371 | (10) |
|
12.6.1 Dynamic Modelling of Railway Networks for Optimal Pathfinding Using Agent-based Methods and Reinforcement Learning |
|
|
371 | (2) |
|
12.6.2 Agent-Based Covid Modelling Strategies |
|
|
373 | (5) |
|
12.6.3 Deep Reinforcement Learning Approach for Optimal Replenishment Policy in a VMI Setting |
|
|
378 | (3) |
|
12.7 Summary and Lessons Learned |
|
|
381 | (1) |
|
|
381 | (4) |
13 Data Visualization |
|
385 | (26) |
|
|
|
386 | (5) |
|
|
391 | (2) |
|
13.3 Types of Data Visualizations |
|
|
393 | (7) |
|
|
394 | (1) |
|
|
394 | (1) |
|
13.3.3 Column and Bar Charts |
|
|
395 | (1) |
|
|
396 | (1) |
|
|
397 | (1) |
|
|
398 | (1) |
|
|
398 | (1) |
|
|
399 | (1) |
|
13.3.9 Other Types of Visualizations |
|
|
400 | (1) |
|
13.4 Select the right Data Visualization |
|
|
400 | (2) |
|
|
402 | (5) |
|
13.6 Presentation of Data Visualization |
|
|
407 | (1) |
|
|
407 | (4) |
14 Data Driven Enterprises |
|
411 | (24) |
|
|
|
14.1 The three Levels of a Data Driven Enterprise |
|
|
412 | (1) |
|
|
412 | (14) |
|
14.2.1 Corporate Strategy for Data |
|
|
413 | (2) |
|
14.2.2 The Current State Analysis |
|
|
415 | (2) |
|
14.2.3 Culture and Organization of a Successful Data Organisation |
|
|
417 | (7) |
|
14.2.4 Core Problem: The Skills Gap |
|
|
424 | (2) |
|
|
426 | (5) |
|
14.3.1 The Impact of Open Source |
|
|
426 | (1) |
|
|
426 | (1) |
|
|
427 | (1) |
|
14.3.4 Data Lake from a Business Perspective |
|
|
427 | (1) |
|
|
428 | (1) |
|
|
428 | (1) |
|
14.3.7 Revolution in Architecture: The Data Mesh |
|
|
429 | (2) |
|
|
431 | (2) |
|
14.4.1 Buy and Share Data |
|
|
431 | (1) |
|
14.4.2 Analytical Use Case Implementation |
|
|
432 | (1) |
|
14.4.3 Self-service Analytics |
|
|
433 | (1) |
|
|
433 | (2) |
15 Legal foundation of Data Science |
|
435 | (18) |
|
|
|
435 | (1) |
|
|
436 | (1) |
|
15.3 General Data Protection Regulation |
|
|
437 | (9) |
|
15.3.1 Fundamental Rights of GDPR |
|
|
437 | (1) |
|
15.3.2 Declaration of Consent |
|
|
438 | (2) |
|
|
440 | (1) |
|
15.3.4 Anonymization and Pseudo-anonymization |
|
|
441 | (1) |
|
15.3.5 Types of Anonymization |
|
|
442 | (2) |
|
15.3.6 Lawful and Transparent Data Processing |
|
|
444 | (1) |
|
15.3.7 Right to Data Deletion and Correction |
|
|
445 | (1) |
|
|
446 | (1) |
|
15.3.9 Privacy by Default |
|
|
446 | (1) |
|
|
446 | (1) |
|
15.5 Data Protection Officer |
|
|
447 | (1) |
|
15.5.1 International Data Export in Foreign Countries |
|
|
447 | (1) |
|
|
448 | (1) |
|
|
449 | (1) |
|
15.7 CCPA compared to GDPR |
|
|
449 | (2) |
|
|
450 | (1) |
|
15.7.2 Opt-in vs. Opt-out |
|
|
450 | (1) |
|
15.7.3 Right of Data Export |
|
|
450 | (1) |
|
15.7.4 Right Not to be Discriminated Against |
|
|
451 | (1) |
|
|
451 | (2) |
16 AI in Different Industries |
|
453 | (48) |
|
|
|
|
|
|
|
456 | (5) |
|
|
457 | (1) |
|
|
458 | (1) |
|
|
458 | (1) |
|
|
459 | (2) |
|
|
461 | (2) |
|
|
461 | (1) |
|
|
462 | (1) |
|
|
462 | (1) |
|
|
463 | (1) |
|
|
463 | (3) |
|
|
464 | (1) |
|
|
464 | (1) |
|
|
465 | (1) |
|
|
466 | (1) |
|
|
466 | (3) |
|
|
466 | (1) |
|
|
467 | (1) |
|
|
467 | (2) |
|
|
469 | (1) |
|
|
469 | (3) |
|
|
470 | (1) |
|
|
471 | (1) |
|
|
471 | (1) |
|
|
471 | (1) |
|
|
472 | (4) |
|
|
472 | (1) |
|
|
473 | (1) |
|
|
473 | (3) |
|
|
476 | (1) |
|
|
476 | (2) |
|
|
477 | (1) |
|
|
477 | (1) |
|
|
477 | (1) |
|
|
478 | (1) |
|
|
478 | (3) |
|
|
479 | (1) |
|
|
479 | (1) |
|
|
479 | (1) |
|
|
480 | (1) |
|
|
481 | (3) |
|
|
481 | (1) |
|
|
481 | (1) |
|
|
482 | (2) |
|
|
484 | (1) |
|
|
484 | (3) |
|
|
484 | (1) |
|
|
485 | (1) |
|
|
485 | (1) |
|
|
486 | (1) |
|
|
487 | (2) |
|
|
487 | (1) |
|
|
487 | (1) |
|
|
488 | (1) |
|
|
488 | (1) |
|
16.12 Telecommunications Provider |
|
|
489 | (3) |
|
|
489 | (1) |
|
|
490 | (1) |
|
|
490 | (2) |
|
|
492 | (1) |
|
|
492 | (2) |
|
|
492 | (1) |
|
|
493 | (1) |
|
|
493 | (1) |
|
|
494 | (1) |
|
16.14 Teaching and Training |
|
|
494 | (3) |
|
|
495 | (1) |
|
|
496 | (1) |
|
|
496 | (1) |
|
|
497 | (1) |
|
16.15 The Digital Society |
|
|
497 | (2) |
|
|
499 | (2) |
17 Mindset and Community |
|
501 | (18) |
|
|
|
501 | (3) |
|
17.2 Data Science Culture |
|
|
504 | (6) |
|
17.2.1 Start-up or Consulting Firm? |
|
|
504 | (1) |
|
17.2.2 Labs Instead of Corporate Policy |
|
|
505 | (1) |
|
17.2.3 Keiretsu Instead of Lone Wolf |
|
|
505 | (2) |
|
17.2.4 Agile Software Development |
|
|
507 | (1) |
|
17.2.5 Company and Work Culture |
|
|
507 | (3) |
|
|
510 | (7) |
|
17.3.1 Devaluation of Domain Expertise |
|
|
510 | (1) |
|
17.3.2 IT Will Take Care of It |
|
|
511 | (1) |
|
17.3.3 Resistance to Change |
|
|
511 | (1) |
|
17.3.4 Know-it-all Mentality |
|
|
512 | (1) |
|
|
513 | (1) |
|
|
513 | (1) |
|
|
514 | (1) |
|
17.3.8 Control over Resources |
|
|
514 | (1) |
|
17.3.9 Blind Faith in Resources |
|
|
515 | (1) |
|
17.3.10 The Swiss Army Knife |
|
|
516 | (1) |
|
|
516 | (1) |
|
|
517 | (2) |
18 Trustworthy AI |
|
519 | (20) |
|
|
18.1 Legal and Soft-Law Framework |
|
|
520 | (4) |
|
|
522 | (1) |
|
|
522 | (2) |
|
|
524 | (1) |
|
|
525 | (8) |
|
|
526 | (3) |
|
|
529 | (3) |
|
18.3.3 Mitigating Unwanted Bias in AI Systems |
|
|
532 | (1) |
|
18.4 Transparency of AI Systems |
|
|
533 | (5) |
|
18.4.1 Documenting the Data |
|
|
534 | (1) |
|
18.4.2 Documenting the Model |
|
|
535 | (1) |
|
|
536 | (2) |
|
|
538 | (1) |
|
|
538 | (1) |
19 The authors |
|
539 | (6) |
Index |
|
545 | |