SAS for R Users: A Book for Data Scientists [Minkštas viršelis]

  • Formatas: Paperback / softback, 208 pages, aukštis x plotis x storis: 226x151x10 mm, weight: 346 g
  • Išleidimo metai: 13-Aug-2019
  • Leidėjas: Wiley-Blackwell
  • ISBN-10: 1119256410
  • ISBN-13: 9781119256410
Kitos knygos pagal šią temą:
  • Formatas: Paperback / softback, 208 pages, aukštis x plotis x storis: 226x151x10 mm, weight: 346 g
  • Išleidimo metai: 13-Aug-2019
  • Leidėjas: Wiley-Blackwell
  • ISBN-10: 1119256410
  • ISBN-13: 9781119256410
Kitos knygos pagal šią temą:
BRIDGES THE GAP BETWEEN SAS AND R, ALLOWING USERS TRAINED IN ONE LANGUAGE TO EASILY LEARN THE OTHER SAS and R are widely-used, very different software environments. Prized for its statistical and graphical tools, R is an open-source programming language that is popular with statisticians and data miners who develop statistical software and analyze data. SAS (Statistical Analysis System) is the leading corporate software in analytics thanks to its faster data handling and smaller learning curve. SAS for R Users enables entry-level data scientists to take advantage of the best aspects of both tools by providing a cross-functional framework for users who already know R but may need to work with SAS. Those with knowledge of both R and SAS are of far greater value to employers, particularly in corporate settings. Using a clear, step-by-step approach, this book presents an analytics workflow that mirrors that of the everyday data scientist. This up-to-date guide is compatible with the latest R packages as well as SAS University Edition. Useful for anyone seeking employment in data science, this book: Instructs both practitioners and students fluent in one language seeking to learn the other Provides command-by-command translations of R to SAS and SAS to R Offers examples and applications in both R and SAS Presents step-by-step guidance on workflows, color illustrations, sample code, chapter quizzes, and more Includes sections on advanced methods and applications Designed for professionals, researchers, and students, SAS for R Users is a valuable resource for those with some knowledge of coding and basic statistics who wish to enter the realm of data science and business analytics. AJAY OHRI is the founder of analytics startup Decisionstats.com. His research interests include spreading open source analytics, analyzing social media manipulation with mechanism design, simpler interfaces to cloud computing, investigating climate change, and knowledge flows. He currently advises startups in analytics off shoring, analytics services, and analytics. He is the author of Python for R Users: A Data Science Approach (Wiley), R for Business Analytics, and R for Cloud Computing.
Preface xiii
Scope xiv
1 About SAS and R
1(6)
1.1 About SAS
1(1)
1.1.1 Installation
2(1)
1.2 About R
2(2)
1.2.1 The R Environment
3(1)
1.2.2 Installation of R
3(1)
1.3 Notable Points in SAS and R Languages
4(1)
1.4 Some Important Functions with Comparative Comparisons Respectively
4(1)
1.5 Summary
5(1)
1.6 Quiz Questions
5(2)
Quiz Answers
6(1)
2 Data Input, Import and Print
7(12)
2.1 Importing Data
7(1)
2.1.1 Packages in R
7(1)
2.2 Importing Data in SAS
8(2)
2.2.1 Data Input in SAS
8(2)
2.2.2 Using Proc Import to Import a Raw File
10(1)
2.2.3 Creating a temporary dataset from a permanent one using "set"
10(1)
2.3 Importing Data in R
10(3)
2.3.1 Importing from Comma Separated Value (CSV) Files
11(1)
2.3.2 Importing from Excel Files
11(1)
2.3.3 Importing from SAS
12(1)
2.3.4 Importing from SPSS and STATA
12(1)
2.3.5 Assigning the Values Imported to a Data Object in R
12(1)
2.4 Providing Data Input
13(1)
2.4.1 Data Input in R
13(1)
2.4.1.1 Using the c() function is the simplest way to create a list in R
13(1)
2.4.1.2 Providing missing values to the vector
13(1)
2.4.1.3 To Input multiple columns of data
14(1)
2.4.1.4 Using loops to input
14(1)
2.5 Data Input in SAS
14(2)
2.6 Printing Data
16(1)
2.6.1 Print in SAS
16(1)
2.6.2 Print in R
16(1)
2.7 Summary
17(1)
2.8 Quiz Questions
17(2)
Quiz Answers
18(1)
3 Data Inspection and Cleaning
19(14)
3.1 Introduction
19(1)
3.2 Data Inspection
19(3)
3.2.1 Data Inspection in SAS
19(1)
3.2.2 Data Inspection in R
20(2)
3.3 Missing Values
22(7)
3.3.1 Missing Values in SAS
22(4)
3.3.2 Missing Values in R
26(3)
3.4 Data Cleaning
29(2)
3.4.1 Data Cleaning in SAS
29(2)
3.4.2 Data Cleaning in R
31(1)
3.5 Quiz Questions
31(2)
Quiz Answers
32(1)
4 Handling Dates, Strings, Numbers
33(18)
4.1 Working with Numeric Data
33(4)
4.1.1 Handling Numbers in SAS
33(2)
4.1.2 Numeric Data in R
35(2)
4.2 Working with Date Data
37(5)
4.2.1 Handling Dates in SAS
37(2)
4.2.2 Handling Dates in R
39(3)
4.3 Handling Strings Data
42(6)
4.3.1 Handling Strings Data in SAS
42(4)
4.3.2 Handling Strings Data in R
46(2)
4.4 Quiz Questions
48(3)
Quiz Answers
49(2)
5 Numerical Summary and Groupby Analysis
51(24)
5.1 Numerical Summary and Groupby Analysis
51(1)
5.2 Numerical Summary and Groupby Analysis in SAS
51(7)
5.3 Numerical Summary and Group by Analysis in R
58(13)
5.3.1 Hmisc and Data.Table Packages
60(3)
5.3.2 Dplyr Package
63(8)
5.4 Quiz Questions
71(4)
Quiz Answers
72(3)
6 Frequency Distributions and Cross Tabulations
75(10)
6.1 Frequency Distributions in SAS
75(3)
6.2 Frequency Distributions in R
78(4)
6.2.1 Frequency Tabulations in R
78(3)
6.2.2 Frequency Tabulations in R with Other Variables Statistics
81(1)
6.3 Quiz Questions
82(3)
Quiz Answers
82(3)
7 Using SQL with SAS and R
85(34)
7.1 What is SQL?
85(1)
7.1.1 Basic Terminology
85(1)
7.1.2 CAP Theorem
85(1)
7.1.3 SQL in SAS and R
86(1)
7.2 SQL Select
86(26)
7.2.1 SQL WHERE
89(1)
7.2.2 SQL Order By
89(1)
7.2.3 AND, OR, NOT in SQL
90(3)
7.2.4 SQL Select Distinct
93(1)
7.2.5 SQL INSERT INTO
94(2)
7.2.6 SQL Delete
96(1)
7.2.7 SQL Aggregate Functions
97(1)
7.2.8 SQL ALIASES
98(1)
7.2.9 SQL ALTER TABLE
99(1)
7.2.10 SQL UPDATE
100(2)
7.2.11 SQL IS NULL
102(1)
7.2.12 SQL LIKE and BETWEEN
103(1)
7.2.13 SQL GROUP BY
104(1)
7.2.14 SQL HAVING
105(1)
7.2.15 SQL CREATE TABLE and SQL CONSTRAINTS
106(2)
7.2.16 SQL UNION
108(2)
7.2.17 SQL JOINS
110(2)
7.3 Merges
112(5)
7.4 Summary
117(1)
7.5 Quiz Questions
117(2)
Quiz Answers
118(1)
8 Functions, Loops, Arrays, Macros
119(10)
8.1 Functions
119(1)
8.2 Loops
119(2)
8.3 Arrays
121(1)
8.4 Macros
122(4)
8.5 Quiz Questions
126(3)
Quiz Answers
127(2)
9 Data Visualization
129(22)
9.1 Importance of Data Visualization
129(1)
9.2 Data Visualization in SAS
130(13)
9.3 Data Visualization in R
143(5)
9.4 Quiz Questions
148(3)
Quiz Answers
149(2)
10 Data Output
151(8)
10.1 Data Output in SAS
151(2)
10.2 Data Output in R
153(3)
10.3 Quiz Questions
156(3)
Quiz Answers
157(2)
11 Statistics for Data Scientists
159(24)
11.1 Types of Variables
159(1)
11.2 Statistical Methods for Data Analysis
160(1)
11.3 Distributions
160(1)
11.4 Descriptive Statistics
161(1)
11.4.1 Measures of Central Tendency: It is the Measure of Location that Gives an Overall Idea of the Dataset
161(1)
11.4.2 Measures of Dispersion
161(1)
11.4.3 Skewness and Kurtosis
162(1)
11.4.4 Central Limit Theorem
162(1)
11.5 Inferential Statistics
162(4)
11.5.1 Hypothesis Testing
163(2)
11.5.2 Probability
165(1)
11.5.3 Bayes Theorem
166(1)
11.6 Algorithms in Data Science
166(15)
11.6.1 Cross Validation
167(1)
11.6.2 Types of Regression
167(1)
11.6.3 Metrics to Evaluate Regression
168(1)
11.6.4 Types of Classification
169(2)
11.6.5 Metrics to Evaluate Classification
171(3)
11.6.6 Types of Clustering
174(3)
11.6.7 Types of Time Series Analysis
177(3)
11.6.8 Types of Dimensionality Reduction
180(1)
11.6.9 Types of Text Mining
180(1)
11.7 Quiz Questions
181(2)
Quiz Answers
181(2)
Further Reading 183(2)
Index 185
AJAY OHRI is the founder of analytics startup Decisionstats.com. His research interests include spreading open source analytics, analyzing social media manipulation with mechanism design, simpler interfaces to cloud computing, investigating climate change, and knowledge flows. He currently advises startups in analytics off shoring, analytics services, and analytics. He is the author of Python for R Users: A Data Science Approach (Wiley), R for Business Analytics, and R for Cloud Computing.