This course introduces participants to the world of R statistical programming. Through hands-on exercises, students will transition from performing simple data analysis tasks using R to more advanced modeling and visualization techniques that R's powerful ecosystem enables. Using R scripts, students will learn how to implement the data analytics pipeline, from data ingestion and manipulation to statistical modeling, visualization and reporting. The use of R scripts will enable the pipeline to be reproducible, allowing for the standardization and repeatability of analyses.

This three-day training kicks off with an introduction of R scripts. Using a select set of R packages, students will learn to ingest, clean and manipulate data. Day two will explore data visualization through the powerful ggplot2 package and interfaces with various Javascript libraries. Day three will delve into modeling by introducing both statistical and predictive machine learning models.

What You Will Learn

The first day is designed to introduce students to the R programming language. They will learn how to load data into R from different sources, clean and transform the data into analytic datasets, and perform basic visualizations for data exploration. On the second day, students will learn how to visualize data through various R packages, including the powerful ggplot2 package, and they'll learn how to create interactive visualizations and dashboards. On the third day, students will learn to specify models using R’s formula interface, run variations of statistical and machine learning models, and create automated reports.

Course Outline

This course will cover the following topics:

Day 1: Data Loading, Munging, and Exploration

  • Why use R?

  • Installing R and RStudio

  • Installing packages in R

  • Loading data into R

  • Data exploration and cleaning

  • Data munging

  • Data summaries and pivoting

  • Cleaning data from file sources

  • Automated reporting in R

Day 2: Data Exploration Through Visualization

  • The ggplot2 package

  • Plotting a single variable - histograms, density plots, bar plots

  • Plotting two variables - scatter plots, box plots, stacked bar plots

  • Plotting multiple variables - shapes, colors, legends, and panels

  • Annotations - adding text, lines, and arrows

  • Cleaning up graphs

  • Data munging for graphs

  • Maps using R - shapefiles, choropleths, and other map-based graphics

  • Interactive graphics

  • Putting it together with dashboards and Shiny

Day 3: Modeling and Reporting with R

  • The formula interface in R

  • Hypothesis tests in R

  • Basic regression models

  • More advanced models

  • Time series models

  • Sampling, permutations, and the bootstrap

  • Verifying assumptions

  • Machine learning models

  • Predictive analytics

Course Requirements

This course will require attendees to have access to computers with R, the Rstudio IDE and a selection of R packages installed. Participants should also have Internet access to download and install public datasets and additional R packages.


Interested in having us teach this course at your organization?