Building Data apps with
python WORKSHOP

 

Overview

Data products are usually software applications that derive their value from data by leveraging the data science pipeline and generate data through their operation. They aren’t apps with data, nor are they one time analyses that produce insights - they are operational and interactive. The rise of these types of applications has directly contributed to the rise of the data scientist and the idea that data scientists are professionals “who are better at statistics than any software engineer and better at software engineering than any statistician.”

These applications have been largely built with Python. Python is flexible enough to develop extremely quickly on many different types of servers and has a rich tradition in web applications. Python contributes to every stage of the data science pipeline including real time ingestion and the production of APIs, and it is powerful enough to perform machine learning computations. In this class we’ll produce a data product with Python, leveraging every stage of the data science pipeline to produce a book recommender.


What You Will Learn

Python is one of the most popular programming languages for data analysis. Because of this, it is important to have a basic working knowledge of the language in order to access more complex topics in data science and natural language processing.  The purpose of this one-day course is to introduce the development process in Python using a project-based, hands-on approach. In particular you will learn how to structure a data product using every stage of the data science pipeline including ingesting data from the web, wrangling data into a structured database, computing a non-negative matrix factorization with Python, and then producing a web based report.


Course Outline

The workshop will cover the following topics:

  • Basic project structure of a Python application
  • virtualenv & virtualenvwrapper
  • Managing requirements outside the stdlib
  • Creating a testing framework with nose
  • Ingesting data with requests.py
  • Wrangling data into SQLite Databases using SQLAlchemy
  • Building a recommender system with Python
  • Computing a matrix factorization with Numpy
  • Storing computational models using pickles
  • Reporting data with JSON
  • Data visualization with Jinja2

After this course you should understand how to build a data product using Python and will have built a recommender system that implements the entire data science pipeline.

The project that you will build is called “Octavo” and is a recommender system for a data science book club! You can view the code on Github: https://github.com/DistrictDataLabs/science-bookclub


Prerequisites

You must be familiar with Python before participating in this course, and have familiarity with the command line. You must also have all software installed and ready for your particular operating system. Ensure that you perform the following tasks and are familiar with the concepts at the following links.


Instructor: Laura Lorenz

Laura

Laura received her Bachelor’s from James Madison University, where she first started programming by using Python to manage autonomous computations while studying bacterial genomics. She now works as a data and software engineer, implementing and operating diverse solutions with both the data and web team using Python tools and frameworks such as Django, Flask, pandas, and scikit-learn. She is an advocate of technology literacy, operating District Data Labs' Incubator program for entry- to mid-level data scientists and teaching introductory workshops on programming and web development.


DATE & TIME: 
SATURDAY, Dec 3RD 2016
9AM-5PM 

LOCATION: 
4601 FAIRFAX DRIVE
ARLINGTON, VA 22203

REGULAR PRICE: $300
EARLY BIRD PRICE: $250
(EXPIRES 11/12/2016)


Buy a course bundle and save!

Two Workshop Bundle - Save 25%

Price

Bundle Price: $450
($225 per workshop)

Description

Attend any two workshops and save 25% off the regular price!
Perfect for those looking to skill-up in a couple data science topics.

Purchase

To purchase this bundle, go to our course bundle registration page.


Three Workshop Bundle - Save 33%

Price

Bundle Price: $600
($200 per workshop)

Description

Attend any three workshops and save 33% off the regular price!
Perfect for those who need a little more exposure to data science.

Purchase

To purchase this bundle, go to our course bundle registration page.


Four Workshop Bundle - Save 42%

Price

Bundle Price: $700
($175 per workshop)

Description

Attend any four workshops and save 42% off the regular price!
Perfect for those looking to gain exposure to several topics.

Purchase

To purchase this bundle, go to our course bundle registration page.