visual analytics with python

 

Overview

Though visual representations of quantitative information were traditionally cast as the end phase of the data analysis pipeline, visualizations can play important roles throughout the analytic process and are critical to the work of the data scientist. For example, Anscombe’s Quartet is a group of four different datasets that, statistically, appear identical. While they share the same mean and standard deviation, the same correlation value, and the same linear regression, it isn’t until they are graphed as a scatterplot that each dataset truly reveals itself. Too often as data scientists, we get stuck staring at tabular data, throwing models at numbers, and trying to interpret numeric results and scores. In many of these cases, if we could simply have seen what the data looked like, we might have rapidly made inferences that led to deeper insights.

Visual analytics are particularly important for effective machine learning. When all it takes is few lines of Python to instantiate and fit a predictive model, visual analysis can help navigate the feature selection process, build intuition around model selection, identify common pitfalls like local minima and overfit, and support hyperparameter tuning to render more successful predictive models.

Python provides many visualization tools for inspecting small to medium-sized datasets in both a static and interactive fashion. However, because Python is an expressive programming language - the vast number of options for data visualization can feel impenetrable, requiring voodoo-like incantations to make images appear. In this course, we will decode the voodoo, exploring some of the best visualization libraries within Python so that we can make visual exploration of data an essential part of our workflow.


What You Will Learn

In this course we will explore several visualization libraries in Python - from the standard Matplotlib, to the wrappers in Pandas, to new visualization frameworks like Seaborn and Bokeh. We will explore a visual analytics methodology that will allow us to understand multi-dimensional datasets, and explore how combining statistical and visual techniques can lead to more insightful results. Finally, we’ll see how to combine these visual tools with the machine learning library Scikit-Learn to support more informed predictive modeling from preliminary feature analysis through model selection, evaluation, and tuning.


Course Outline

The workshop will cover the following topics  

  • Simple data visualizations using Pandas and Matplotlib
  • Other visualization grammars: Seaborn and Bokeh
  • Multidimensional data visualizations: scatter matrices, parallel coordinates, radviz plots
  • Visual diagnostics for more informed machine learning: model evaluation and tuning

Upon completion of the course, you will understand how to perform visual exploration of data using the suite of Python tools for data visualization. You will be able to use these techniques to quickly graph, chart, summarize, and analyze any type of data either from a CSV or a DataFrame. You will also see how visual diagnostics can be used to facilitate machine learning: to support feature engineering and feature selection, to diagnose common problems, evaluate models, and conduct visual steering for more successful results.


Prerequisites

You must be familiar with Python before participating in this course, and have familiarity with the command line. You must also have all software installed and ready for your particular operating system. Ensure that you perform the following tasks and are familiar with the concepts at the following links.


Instructor: Dr. Rebecca Bilbro

rebecca_bilbro

Dr. Rebecca Bilbro is a data scientist at the Commerce Data Service in Washington, DC, where she uses machine learning and Python for precision policy to support and stimulate the U.S. economy. As faculty at District Data Labs, she conducts research on semantic network extraction, high-dimensional information visualization, and entity resolution. Rebecca also serves as an organizer for Data Innovation DC and on the Board of Directors for Data Community DC, one the largest collections of data-related meetup groups in the country. Before coming to the Washington metropolitan area, Rebecca earned her doctorate from the University of Illinois, Urbana-Champaign, where her research centered on communication and visualization practices in engineering. 


DATE & TIME: 
SATURDAY, October 1st 2016
9AM-5PM 

LOCATION: 
4601 FAIRFAX DRIVE
ARLINGTON, VA 22203

REGULAR PRICE: $300
EARLY BIRD PRICE: $250
(EXPIRES 9/10/2016)


Buy a course bundle and save!

Two Workshop Bundle - Save 25%

Price

Bundle Price: $450
($225 per workshop)

Description

Attend any two workshops and save 25% off the regular price!
Perfect for those looking to skill-up in a couple data science topics.

Purchase

To purchase this bundle, go to our course bundle registration page.


Three Workshop Bundle - Save 33%

Price

Bundle Price: $600
($200 per workshop)

Description

Attend any three workshops and save 33% off the regular price!
Perfect for those who need a little more exposure to data science.

Purchase

To purchase this bundle, go to our course bundle registration page.


Four Workshop Bundle - Save 42%

Price

Bundle Price: $700
($175 per workshop)

Description

Attend any four workshops and save 42% off the regular price!
Perfect for those looking to gain exposure to several topics.

Purchase

To purchase this bundle, go to our course bundle registration page.