Atnaujinkite slapukų nuostatas

Data Science Handbook: Generate Value from Data with Data Analysis and Machine Learning [Minkštas viršelis]

  • Formatas: Paperback / softback, 573 pages, aukštis x plotis x storis: 237x166x36 mm, weight: 862 g
  • Išleidimo metai: 30-Nov-2022
  • Leidėjas: Hanser Publications
  • ISBN-10: 1569908869
  • ISBN-13: 9781569908860
Kitos knygos pagal šią temą:
  • Formatas: Paperback / softback, 573 pages, aukštis x plotis x storis: 237x166x36 mm, weight: 862 g
  • Išleidimo metai: 30-Nov-2022
  • Leidėjas: Hanser Publications
  • ISBN-10: 1569908869
  • ISBN-13: 9781569908860
Kitos knygos pagal šią temą:
Data Science, Big Data, and Artificial Intelligence are currently some of the most talked-about concepts in industry, government, and society, and yet also the most misunderstood. This book will clarify these concepts and provide you with practical knowledge to apply them. Featuring: - A comprehensive overview of the various fields of application of data science - Case studies from practice to make the described concepts tangible - Practical examples to help you carry out simple data analysis projects

The book approaches the topic of data science from several sides. Crucially, it will show you how to build data platforms and apply data science tools and methods. Along the way, it will help you understand - and explain to various stakeholders - how to generate value from these techniques, such as applying data science to help organizations make faster decisions, reduce costs, and open up new markets. Furthermore, it will bring fundamental concepts related to data science to life, including statistics, mathematics, and legal considerations. Finally, the book outlines practical case studies that illustrate how knowledge generated from data is changing various industries over the long term.

Contains these current issues: - Mathematics basics: Mathematics for Machine Learning to help you understand and utilize various ML algorithms. - Machine Learning: From statistical to neural and from Transformers and GPT-3 to AutoML, we introduce common frameworks for applying ML in practice - Natural Language Processing: Tools and techniques for gaining insights from text data and developing language technologies - Computer vision: How can we gain insights from images and videos with data science? - Modeling and Simulation: Model the behavior of complex systems, such as the spread of COVID-19, and do a What-If analysis covering different scenarios. - ML and AI in production: How to turn experimentation into a working data science product? - Presenting your results: Essential presentation techniques for data scientists

Contributors: Stefan Papp / Wolfgang Weidinger / Katherine Munro / Bernhard Ortner / Annalisa Cadonna / Georg Langs / Roxane Licandro / Mario Meir-Huber / Danko Nikoli? / Zoltan Toth / Barbora Vesela / Rania Wazir / Günther Zauner
Foreword xv
Preface xviii
Acknowledgments xx
1 Introduction 1(28)
1.1 What are Data Science, Machine Learning and Artificial Intelligence?
2(6)
1.2 Data Strategy
8(2)
1.3 From Strategy to Use Cases
10(8)
1.3.1 Data Teams
11(5)
1.3.2 Data and Platforms
16(1)
1.3.3 Modeling and Analysis
17(1)
1.4 Use Case Implementation
18(4)
1.4.1 Iterative Exploration of Use Cases
19(2)
1.4.2 End-to-End Data Processing
21(1)
1.4.3 Data Products
22(1)
1.5 Real-Life Use Case Examples
22(3)
1.5.1 Value Chain Digitization (VCD)
22(1)
1.5.2 Marketing Segment Analytics
23(1)
1.5.3 360° View of the Customer
23(1)
1.5.4 NGO and Sustainability Use Cases
24(1)
1.6 Delivering Results
25(2)
1.7 In a Nutshell
27(2)
2 Infrastructure 29(40)
Stefan Papp
2.1 Introduction
29(2)
2.2 Hardware
31(7)
2.2.1 Distributed Systems
34(3)
2.2.2 Hardware for AI Applications
37(1)
2.3 Linux Essentials for Data Professionals
38(16)
2.4 Terraform
54(4)
2.5 Cloud
58(10)
2.5.1 Basic Services
61(4)
2.5.2 Cloud-native Solutions
65(3)
2.6 In a Nutshell
68(1)
3 Data Architecture 69(32)
Zoltan C. Toth
3.1 Overview
69(5)
3.1.1 Maslow's Hierarchy of Needs for Data
69(2)
3.1.2 Data Architecture Requirements
71(1)
3.1.3 The Structure of a Typical Data Architecture
71(1)
3.1.4 ETL (Extract, Transform, Load)
72(1)
3.1.5 ELT (Extract, Load, Transform)
73(1)
3.1.6 ETLT
73(1)
3.2 Data Ingestion and Integration
74(5)
3.2.1 Data Sources
74(1)
3.2.2 Traditional File Formats
75(2)
3.2.3 Modern File Formats
77(2)
3.2.4 Summary
79(1)
3.3 Data Warehouses, Data Lakes, and Lakehouses
79(7)
3.3.1 Data Warehouses
79(4)
3.3.2 Data Lakes and the Lakehouse
83(2)
3.3.3 Summary: Comparing Data Warehouses to Lakehouses
85(1)
3.4 Data Processing and Transformation
86(8)
3.4.1 Big Data & Apache Spark
86(7)
3.4.2 Databricks
93(1)
3.5 Workflow Orchestration
94(2)
3.6 A Data Architecture Use Case
96(4)
3.7 In a Nutshell
100(1)
4 Data Engineering 101(30)
Stefan Papp
Bernhard Ortner
4.1 Data Integration
102(23)
4.1.1 Data Pipelines
102(6)
4.1.2 Designing Data Pipelines
108(2)
4.1.3 CI/CD
110(2)
4.1.4 Programming Languages
112(3)
4.1.5 Kafka as Reference ETL Tool
115(4)
4.1.6 Design Patterns
119(1)
4.1.7 Automation of the Stages
120(1)
4.1.8 Six Building Blocks of the Data Pipeline
120(5)
4.2 Managing Analytical Models
125(5)
4.2.1 Model Delivery
126(1)
4.2.2 Model Update
127(1)
4.2.3 Model or Parameter Update
128(1)
4.2.4 Model Scaling
128(1)
4.2.5 Feedback into the Operational Processes
129(1)
4.3 In a Nutshell
130(1)
5 Data Management 131(22)
Stefan Papp
Bernhard Ortner
5.1 Data Governance
133(10)
5.1.1 Data Catalog
134(2)
5.1.2 Data Discovery
136(4)
5.1.3 Data Quality
140(1)
5.1.4 Master Data Management
141(1)
5.1.5 Data Sharing
142(1)
5.2 Information Security
143(8)
5.2.1 Data Classification
144(1)
5.2.2 Privacy Protection
145(2)
5.2.3 Encryption
147(2)
5.2.4 Secrets Management
149(1)
5.2.5 Defense in Depth
150(1)
5.3 In a Nutshell
151(2)
6 Mathematics 153(26)
Annalisa Cadonna
6.1 Linear Algebra
154(9)
6.1.1 Vectors and Matrices
154(3)
6.1.2 Operations between Vectors and Matrices
157(3)
6.1.3 Linear Transformations
160(1)
6.1.4 Eigenvalues, Eigenvectors, and Eigendecomposition
161(1)
6.1.5 Other Matrix Decompositions
162(1)
6.2 Calculus and Optimization
163(7)
6.2.1 Derivatives
164(2)
6.2.2 Gradient and Hessian
166(1)
6.2.3 Gradient Descent
167(2)
6.2.4 Constrained Optimization
169(1)
6.3 Probability Theory
170(7)
6.3.1 Discrete and Continuous Random Variables
171(3)
6.3.2 Expected Value, Variance, and Covariance
174(2)
6.3.3 Independence, Conditional Distributions, and Bayes' Theorem
176(1)
6.4 In a Nutshell
177(2)
7 Statistics - Basics 179(22)
Rania Wazir
Georg Langs
Annalisa Cadonna
7.1 Data
180(1)
7.2 Simple Linear Regression
181(8)
7.3 Multiple Linear Regression
189(2)
7.4 Logistic Regression
191(7)
7.5 How Good is Our Model?
198(1)
7.6 In a Nutshell
199(2)
8 Machine Learning 201(38)
Georg Langs
Katherine Munro
Rania Wazir
8.1 Introduction
201(2)
8.2 Basics: Feature Spaces
203(3)
8.3 Classification Models
206(3)
8.3.1 K-Nearest-Neighbor-Classifier
206(1)
8.3.2 Support Vector Machine
207(1)
8.3.3 Decision Tree
208(1)
8.4 Ensemble Methods
209(6)
8.4.1 Bias and Variance
210(1)
8.4.2 Bagging: Random Forests
211(4)
8.4.3 Boosting: AdaBoost
215(1)
8.5 Artificial Neural Networks and the Perceptron
215(3)
8.6 Learning without Labels - Finding Structure
218(3)
8.6.1 Clustering
218(1)
8.6.2 Manifold Learning
219(1)
8.6.3 Generative Models
220(1)
8.7 Reinforcement Learning
221(2)
8.8 Overarching Concepts
223(1)
8.9 Into the Depth - Deep Learning
224(11)
8.9.1 Convolutional Neural Networks
224(1)
8.9.2 Training Convolutional Neural Networks
225(2)
8.9.3 Recurrent Neural Networks
227(1)
8.9.4 Long Short-Term Memory
228(2)
8.9.5 Autoencoders and U-Nets
230(1)
8.9.6 Adversarial Training Approaches
231(1)
8.9.7 Generative Adversarial Networks
232(2)
8.9.8 Cycle GANs and Style GANs
234(1)
8.9.9 Other Architectures and Learning Strategies
235(1)
8.10 Validation Strategies for Machine Learning Techniques
235(2)
8.11 Conclusion
237(1)
8.12 In a Nutshell
237(2)
9 Building Great Artificial Intelligence 239(34)
Danko Nikolic
9.1 How AI Relates to Data Science and Machine Learning
239(4)
9.2 A Brief History of AI
243(2)
9.3 Five Recommendations for Designing an AI Solution
245(23)
9.3.1 Recommendation No. 1: Be pragmatic
245(2)
9.3.2 Recommendation No. 2: Make it easier for machines to learn - create inductive biases
247(5)
9.3.3 Recommendation No. 3; Perform analytics
252(2)
9.3.4 Recommendation No. 4: Beware of the scaling trap
254(9)
9.3.5 Recommendation No. 5: Beware of the generality trap (there is no such a thing as free lunch)
263(5)
9.4 Human-level Intelligence
268(2)
9.5 In a Nutshell
270(3)
10 Natural Language Processing (NLP) 273(44)
Katherine Munro
10.1 What is NLP and Why is it so Valuable?
273(2)
10.2 NLP Data Preparation Techniques
275(8)
10.2.1 The NLP Pipeline
275(6)
10.2.2 Converting the Input Format for Machine Learning
281(2)
10.3 NLP Tasks and Methods
283(29)
10.3.1 Rule-Based (Symbolic) NLP
284(3)
10.3.2 Statistical Machine Learning Approaches
287(8)
10.3.3 Neural NLP
295(6)
10.3.4 Transfer Learning
301(11)
10.4 At the Cutting Edge: Current Research Focuses for NLP
312(2)
10.5 In a Nutshell
314(3)
11 Computer Vision 317(30)
Roxane Licandro
11.1 What is Computer Vision?
317(2)
11.2 A Picture Paints a Thousand Words
319(9)
11.2.1 The Human Eye
319(2)
11.2.2 Image Acquisition Principle
321(5)
11.2.3 Digital File Formats
326(1)
11.2.4 Image Compression
327(1)
11.3 I Spy With My Little Eye Something That Is...
328(6)
11.3.1 Computational Photography and Image Manipulation
330(4)
11.4 Computer Vision Applications & Future Directions
334(7)
11.4.1 Image Retrieval Systems
334(3)
11.4.2 Object Detection, Classification and Tracking
337(1)
11.4.3 Medical Computer Vision
338(3)
11.5 Making Humans See
341(2)
11.6 In a Nutshell
343(4)
12 Modelling and Simulation - Create your own Models 347(38)
Gunther Zaunet
Wolfgang Weidinger
12.1 Introduction
347(2)
12.2 General Aspects
349(1)
12.3 Modelling to Answer Questions
349(2)
12.4 Reproducibility and Model Lifecycle
351(10)
12.4.1 The Lifecycle of a Modelling and Simulation Question
352(2)
12.4.2 Parameter and Output Definition
354(3)
12.4.3 Documentation
357(1)
12.4.4 Verification and Validation
357(4)
12.5 Methods
361(10)
12.5.1 Ordinary Differential Equations (ODES)
361(1)
12.5.2 System Dynamics (SD)
362(3)
12.5.3 Discrete Event Simulation
365(3)
12.5.4 Agent-Based Modelling
368(3)
12.6 Modelling and Simulation Examples
371(10)
12.6.1 Dynamic Modelling of Railway Networks for Optimal Pathfinding Using Agent-based Methods and Reinforcement Learning
371(2)
12.6.2 Agent-Based Covid Modelling Strategies
373(5)
12.6.3 Deep Reinforcement Learning Approach for Optimal Replenishment Policy in a VMI Setting
378(3)
12.7 Summary and Lessons Learned
381(1)
12.8 In a Nutshell
381(4)
13 Data Visualization 385(26)
Barbora Vesela
13.1 History
386(5)
13.2 Which Tools to Use
391(2)
13.3 Types of Data Visualizations
393(7)
13.3.1 Scatter Plot
394(1)
13.3.2 Line Chart
394(1)
13.3.3 Column and Bar Charts
395(1)
13.3.4 Histogram
396(1)
13.3.5 Pie Chart
397(1)
13.3.6 Box Plot
398(1)
13.3.7 Heat Map
398(1)
13.3.8 Tree Diagram
399(1)
13.3.9 Other Types of Visualizations
400(1)
13.4 Select the right Data Visualization
400(2)
13.5 Tips and Tricks
402(5)
13.6 Presentation of Data Visualization
407(1)
13.7 In a Nutshell
407(4)
14 Data Driven Enterprises 411(24)
Mario Meir-Huber
Stefan Papp
14.1 The three Levels of a Data Driven Enterprise
412(1)
14.2 Culture
412(14)
14.2.1 Corporate Strategy for Data
413(2)
14.2.2 The Current State Analysis
415(2)
14.2.3 Culture and Organization of a Successful Data Organisation
417(7)
14.2.4 Core Problem: The Skills Gap
424(2)
14.3 Technology
426(5)
14.3.1 The Impact of Open Source
426(1)
14.3.2 Cloud
426(1)
14.3.3 Vendor Selection
427(1)
14.3.4 Data Lake from a Business Perspective
427(1)
14.3.5 The Role of IT
428(1)
14.3.6 Data Science Labs
428(1)
14.3.7 Revolution in Architecture: The Data Mesh
429(2)
14.4 Business
431(2)
14.4.1 Buy and Share Data
431(1)
14.4.2 Analytical Use Case Implementation
432(1)
14.4.3 Self-service Analytics
433(1)
14.5 In a Nutshell
433(2)
15 Legal foundation of Data Science 435(18)
Bernhard Ortner
15.1 Introduction
435(1)
15.2 Categories of Data
436(1)
15.3 General Data Protection Regulation
437(9)
15.3.1 Fundamental Rights of GDPR
437(1)
15.3.2 Declaration of Consent
438(2)
15.3.3 Risk-assessment
440(1)
15.3.4 Anonymization and Pseudo-anonymization
441(1)
15.3.5 Types of Anonymization
442(2)
15.3.6 Lawful and Transparent Data Processing
444(1)
15.3.7 Right to Data Deletion and Correction
445(1)
15.3.8 Privacy by Design
446(1)
15.3.9 Privacy by Default
446(1)
15.4 ePrivacy-Regulation
446(1)
15.5 Data Protection Officer
447(1)
15.5.1 International Data Export in Foreign Countries
447(1)
15.6 Security Measures
448(1)
15.6.1 Data Encryption
449(1)
15.7 CCPA compared to GDPR
449(2)
15.7.1 Territorial Scope
450(1)
15.7.2 Opt-in vs. Opt-out
450(1)
15.7.3 Right of Data Export
450(1)
15.7.4 Right Not to be Discriminated Against
451(1)
15.8 In a Nutshell
451(2)
16 AI in Different Industries 453(48)
Stefan Papp
Mario Meir-Huber
Wolfgang Weidinger
Thomas Treml
Marek Danis
16.1 Automotive
456(5)
16.1.1 Vision
457(1)
16.1.2 Data
458(1)
16.1.3 Use Cases
458(1)
16.1.4 Challenges
459(2)
16.2 Aviation
461(2)
16.2.1 Vision
461(1)
16.2.2 Data
462(1)
16.2.3 Use cases
462(1)
16.2.4 Challenges
463(1)
16.3 Energy
463(3)
16.3.1 Vision
464(1)
16.3.2 Data
464(1)
16.3.3 Use Cases
465(1)
16.3.4 Challenges
466(1)
16.4 Finance
466(3)
16.4.1 Vision
466(1)
16.4.2 Data
467(1)
16.4.3 Use Cases
467(2)
16.4.4 Challenges
469(1)
16.5 Health
469(3)
16.5.1 Vision
470(1)
16.5.2 Data
471(1)
16.5.3 Use Cases
471(1)
16.5.4 Challenges
471(1)
16.6 Government
472(4)
16.6.1 Vision
472(1)
16.6.2 Data
473(1)
16.6.3 Use Cases
473(3)
16.6.4 Challenges
476(1)
16.7 Art
476(2)
16.7.1 Vision
477(1)
16.7.2 Data
477(1)
16.7.3 Use cases
477(1)
16.7.4 Challenges
478(1)
16.8 Manufacturing
478(3)
16.8.1 Vision
479(1)
16.8.2 Data
479(1)
16.8.3 Use Cases
479(1)
16.8.4 Challenges
480(1)
16.9 Oil and Gas
481(3)
16.9.1 Vision
481(1)
16.9.2 Data
481(1)
16.9.3 Use Cases
482(2)
16.9.4 Challenges
484(1)
16.10 Safety at Work
484(3)
16.10.1 Vision
484(1)
16.10.2 Data
485(1)
16.10.3 Use Cases
485(1)
16.10.4 Challenges
486(1)
16.11 Retail
487(2)
16.11.1 Vision
487(1)
16.11.2 Data
487(1)
16.11.3 Use Cases
488(1)
16.11.4 Challenges
488(1)
16.12 Telecommunications Provider
489(3)
16.12.1 Vision
489(1)
16.12.2 Data
490(1)
16.12.3 Use Cases
490(2)
16.12.4 Challenges
492(1)
16.13 Transport
492(2)
16.13.1 Vision
492(1)
16.13.2 Data
493(1)
16.13.3 Use Cases
493(1)
16.13.4 Challenges
494(1)
16.14 Teaching and Training
494(3)
16.14.1 Vision
495(1)
16.14.2 Data
496(1)
16.14.3 Use Cases
496(1)
16.14.4 Challenges
497(1)
16.15 The Digital Society
497(2)
16.16 In a Nutshell
499(2)
17 Mindset and Community 501(18)
Stefan Papp
17.1 Data-Driven Mindset
501(3)
17.2 Data Science Culture
504(6)
17.2.1 Start-up or Consulting Firm?
504(1)
17.2.2 Labs Instead of Corporate Policy
505(1)
17.2.3 Keiretsu Instead of Lone Wolf
505(2)
17.2.4 Agile Software Development
507(1)
17.2.5 Company and Work Culture
507(3)
17.3 Antipatterns
510(7)
17.3.1 Devaluation of Domain Expertise
510(1)
17.3.2 IT Will Take Care of It
511(1)
17.3.3 Resistance to Change
511(1)
17.3.4 Know-it-all Mentality
512(1)
17.3.5 Doom and Gloom
513(1)
17.3.6 Penny-pinching
513(1)
17.3.7 Fear Culture
514(1)
17.3.8 Control over Resources
514(1)
17.3.9 Blind Faith in Resources
515(1)
17.3.10 The Swiss Army Knife
516(1)
17.3.11 Over-Engineering
516(1)
17.4 In a Nutshell
517(2)
18 Trustworthy AI 519(20)
Rania Wazir
18.1 Legal and Soft-Law Framework
520(4)
18.1.1 Standards
522(1)
18.1.2 Regulations
522(2)
18.2 AI Stakeholders
524(1)
18.3 Fairness in AI
525(8)
18.3.1 Bias
526(3)
18.3.2 Fairness Metrics
529(3)
18.3.3 Mitigating Unwanted Bias in AI Systems
532(1)
18.4 Transparency of AI Systems
533(5)
18.4.1 Documenting the Data
534(1)
18.4.2 Documenting the Model
535(1)
18.4.3 Explainability
536(2)
18.5 Conclusion
538(1)
18.6 In a Nutshell
538(1)
19 The authors 539(6)
Index 545
The team of authors consists of data experts from business and academia. The spectrum ranges from executives to data engineers who create production systems, to data scientists who generate value from data. All authors are members of the Vienna Data Science Group (VDSG), an NGO that aims to establish a platform for exchanging knowledge on the application of data science, AI and Machine Learning and raising awareness of the opportunities and potential risks of these technologies.