Introduction to Data Science with R
- What is Data Science?
- Significance of Data Science in today’s data-driven world, applications of Data Science, lifecycle of Data Science,and its components
- Introduction to Big Data Hadoop, Machine Learning, and Deep Learning
- Introduction to R programming and RStudio
Hands-on Exercise
- Installation of RStudio
- Implementing simple mathematical operations and logic using R operators, loops, if statements, and switch cases
Data Exploration
- Introduction to data exploration
- Importing and exporting data to/from external sources
- What are data exploratory analysis and data importing?
- Data Frames, working with them, accessing individual elements, vectors, factors, operators, in-built functions, conditional and looping statements, user-defined functions, and data types
Hands-on Exercise
- Accessing individual elements of customer churn data
- Modifying and extracting results from the dataset using user-defined functions in R
Data Manipulation
- Need for data manipulation
- Introduction to the dplyr package
- Selecting one or more columns with select(), filtering records on the basis of a condition with filter(), adding new columns with mutate(), sampling, and counting
- Combining different functions with the pipe operator and implementing SQL-like operations with sqldf
Hands-on Exercise
- Implementing dplyr
- Performing various operations for manipulating data and storing it
Data Visualization
- Introduction to visualization
- Different types of graphs, the grammar of graphics, the ggplot2 package, categorical distribution with geom_bar(), numerical distribution with geom_hist(), building frequency polygons with geom_freqpoly(), and making a scatterplot with geom_pont()
- Multivariate analysis with geom_boxplot
- Univariate analysis with a barplot, a histogram and a density plot, and multivariate distribution
- Creating bar plots for categorical variables using geom_bar(), and adding themes with the theme() layer
- Visualization with plotly, frequency plots with geom_freqpoly(), multivariate distribution with scatter plots and smooth lines, continuous distribution vs categorical distribution with box-plots, and sub grouping plots
- Working with coordinates and themes to make graphs more presentable, understanding plotly and various plots, and visualization with ggvis
- Geographic visualization with ggmap() and building web applications with shinyR
Hands-on Exercise
- Creating data visualization to understand the customer churn ratio using ggplot2 charts
- Using plotly for importing and analyzing data
- Visualizing tenure, monthly charges, total charges, and other individual columns using a scatter plot
Introduction to Statistics
- Why do we need statistics?
- Categories of statistics, statistical terminology, types of data, measures of central tendency, and measures ofspread
- Correlation and covariance, standardization and normalization, probability and the types, hypothesis testing, chi-square testing, ANOVA, normal distribution, and binary distribution
Hands-on Exercise
- Building a statistical analysis model that uses quantification, representations, and experimental data
- Reviewing, analyzing, and drawing conclusions from the data