This course introduces participants to the world of R statistical programming. Through hands-on exercises, students will transition from performing simple data analysis tasks using R to more advanced modeling and visualization techniques that R's powerful ecosystem enables. Using R scripts, students will learn how to implement the data analytics pipeline, from data ingestion and manipulation to statistical modeling, visualization and reporting. The use of R scripts will enable the pipeline to be reproducible, allowing for the standardization and repeatability of analyses.
What You Will Learn
The first day is designed to introduce students to the R programming language. They will learn how to load data into R from different sources, clean and transform the data into analytic datasets, and perform basic visualizations for data exploration. On the second day, students will learn how to visualize data through various R packages, including the powerful ggplot2 package, and they'll learn how to create interactive visualizations and dashboards. On the third day, students will learn to specify models using R’s formula interface, run variations of statistical and machine learning models, and create automated reports.
This course will cover the following topics:
Day 1: Data Loading, Munging, and Exploration
Why use R?
Installing R and RStudio
Installing packages in R
Loading data into R
Data exploration and cleaning
Data summaries and pivoting
Cleaning data from file sources
Automated reporting in R
Day 2: Data Exploration Through Visualization
The ggplot2 package
Plotting a single variable - histograms, density plots, bar plots
Plotting two variables - scatter plots, box plots, stacked bar plots
Plotting multiple variables - shapes, colors, legends, and panels
Annotations - adding text, lines, and arrows
Cleaning up graphs
Data munging for graphs
Maps using R - shapefiles, choropleths, and other map-based graphics
Putting it together with dashboards and Shiny
Day 3: Modeling and Reporting with R
The formula interface in R
Hypothesis tests in R
Basic regression models
More advanced models
Time series models
Sampling, permutations, and the bootstrap
Machine learning models
This course will require attendees to have access to computers with R, the Rstudio IDE and a selection of R packages installed. Participants should also have Internet access to download and install public datasets and additional R packages.