Data products are software applications that derive their value from data and produce more valuable data in return. They aren’t simply apps that rely on data or one-time analyses that produce insights - they are operational and interactive. The rise of these types of applications has directly contributed to the ascent of the modern web as well as the job title data scientist.

These applications have been largely built with Python - a programming language that is flexible enough to enable fast development on many different types of servers and has a rich tradition in web applications. Python contributes to every stage of the data science pipeline including real time ingestion and the production of APIs, and it is powerful enough to perform machine learning computations. In this course, we’ll leverage every stage of the data science pipeline to produce a functional data product with Python.

What You Will Learn

Python is one of the most popular programming languages for data analysis. Because of this, it is important to have a basic working knowledge of the language to access more complex topics in data science and natural language processing.  The purpose of this course is to introduce the development process in Python using a project-based, hands-on approach. You will learn how to structure a data product using every stage of the data science pipeline including ingesting data from the web, wrangling data into a structured database, computing a non-negative matrix factorization with Python, and then producing a web-based report.

Course Outline

The workshop will cover the following topics:

  • Basic project structure of a Python application

  • virtualenv & virtualenvwrapper

  • Managing requirements outside the stdlib

  • Creating a testing framework with nose

  • Ingesting data with requests.py

  • Wrangling data into SQLite Databases using SQLAlchemy

  • Building a recommender system with Python

  • Computing a matrix factorization with Numpy

  • Storing computational models using pickles

  • Reporting data with JSON

  • Data visualization with Jinja2

After this course, you will understand how to build a data product using Python and will have built a recommender system that implements the entire data science pipeline.

Course requirements

Attendees should be familiar with Python and with the command line before participating in this course. They should also have the required software installed and operational on their computers.