Though visual representations of quantitative information were traditionally cast as the end phase of the data analysis pipeline, visualizations can play important roles throughout the analytic process and are critical to the work of the data scientist. Too often as data scientists, we get stuck staring at tabular data, throwing models at numbers, and trying to interpret numeric results and scores. In many of these cases, if we could simply have seen what the data looked like, we might have rapidly made inferences that led to deeper insights.

Visual analytics are particularly important for effective machine learning. While all it takes is a few lines of Python to instantiate and fit a predictive model, visual analysis can help navigate the feature selection process, build intuition around model selection, identify common pitfalls like local minima and overfit, and support hyperparameter tuning to render more successful predictive models.

Python provides many visualization tools for inspecting small to medium-sized datasets in both a static and interactive fashion. However, because Python is an expressive programming language, the vast number of options for data visualization can feel impenetrable. In this course, we will explore some of Python’s best visualization libraries so that we can transform visual exploration of data into an essential part of our workflow.

What You Will Learn

In this course we will explore several visualization libraries in Python - from the standard Matplotlib, to the wrappers in Pandas, to new visualization frameworks like Seaborn and Bokeh. We will explore a visual analytics methodology that will allow us to understand multi-dimensional datasets, and we will explore how combining statistical and visual techniques can lead to more insightful results. Finally, we’ll see how to combine these visual tools with the machine learning library Scikit-Learn to support more informed predictive modeling from preliminary feature analysis through model selection, evaluation, and parameter tuning.

Course Outline

This course will cover the following topics  

  • Simple data visualizations

    • Pandas

    • Matplotlib

  • Other visualization grammars:

    • Seaborn

    • Bokeh

  • Multidimensional data visualizations:

    • Scatter matrices

    • Parallel coordinates

    • Radviz plots

  • Visual diagnostics for more informed machine learning:

    • Visualizing model evaluation

    • Visual hyperparameter tuning

Upon completion of the course, you will understand how to perform visual exploration of data using the suite of Python tools for data visualization. You will have developed the ability to quickly graph, chart, summarize, and analyze any type of data either from a CSV or a DataFrame. You will also understand visual diagnostics and their utility in facilitating machine learning: to support feature engineering and feature selection, to diagnose common problems, evaluate models, and to conduct visual steering for improved results.

Course Requirements

Attendees should be familiar with Python and with the command line before participating in this course. They should also have the required software installed and operational on their computers.