## Overview

Data science in the real world often involves the management of data flows for a specific purpose - the modeling of some hypothesis. Machine learning is the art of training models by using existing data along with a statistical method to create a parametric representation that fits the data. In other words, a machine learning algorithm uses statistical processes to learn from examples and then applies what it has learned to future inputs to predict an outcome. These models can be used in data products as engines to create more data and actionable results.

Machine learning can classically be summarized with two methodologies - supervised and unsupervised learning.

In supervised learning, the “correct answers” are annotated ahead of time and the algorithm tries to fit a decision space based on those answers.

In unsupervised learning, algorithms try to group like examples together, inferring similarities usually via distance metrics.

These learning types allow us to explore data and categorize them in a meaningful way, predicting where new data will fit into our models.

## What You Will Learn

Scikit-Learn is a powerful machine learning library implemented in Python with numeric and scientific computing powerhouses Numpy, Scipy, and Matplotlib for extremely fast analysis of small to medium sized data sets. It is open source, commercially usable and contains many modern machine learning algorithms for classification, regression, clustering, feature extraction, and optimization.

The purpose of this course is to serve as an introduction to machine learning with Python. We will explore several clustering, classification, and regression algorithms and see how they can help us perform a variety of machine learning tasks. We will then apply what we have learned to generate predictions and perform segmentation on real-world data sets. In particular, we will structure our machine learning models as though we were producing a data product, an actionable model that can be used in larger programs.

## Course Outline

This course will cover the following topics:

An introduction to machine learning

Loading datasets

Building models and model persistence

Feature extraction from data sets

Regression

Classification

Clustering

Model selection and evaluation

Building machine learning pipelines

After this course you should understand the basics of machine learning and how to implement machine learning algorithms on your data sets using Python. Specifically, you should understand basic regression, classification, and clustering algorithms and how to fit a model and use it to predict future outcomes.

## Course Requirements

Attendees should be familiar with Python and with the command line before participating in this course. They should also have the required software installed and operational on their computers.