Content OPtimization with
Multi-Armed Bandits & Python



Whenever you have multiple items to choose from, and are not sure which will result in the highest level of engagement or action, you have to make a choice. A/B testing can help you, but there is a better way, a quicker, less-wasteful way, and it is called a multi-armed bandit. Bandits are a type of reinforcement learning, a branch of machine learning that typically flies under the radar (unless you are trying to teach a robot how to juggle while navigating a maze, or trying to teach a computer program how to learn to play (and win) Pac Man).

Reinforcement learning deals with trial-and-error and searching for the best action to take, and is classified as a type of online learning, in contrast with offline learning, or batch learning. In online learning, you start with no knowledge, and you learn as you go, sequentially making optimal decisions. Hence bandits are perfect for recommendation engines when you know nothing about your users (the first day your app is up and running, a situation known as a cold start). Bandits balance exploring what you don’t know with exploiting what you know, a situation commonly referred to as the exploration-exploitation dilemma.

Typical applications of multi-armed bandits include subject line testing for emails, button colors, page design/layout, and headline optimization. Anything you can test in the A/B fashion, you can do with bandits, but bandits will ensure you quickly converge to your best option, saving you time and money, and saving your users from viewing irrelevant content. 

What You Will Learn

You will learn different strategies for balancing exploration and exploitation in order to learn the best action to take when you initially know nothing about the payoffs of the different actions. You will learn how to implement these algorithms, tune them, and incorporate them into various apps. In short, this course will give you the tools to make optimal decision in the face of uncertainty. 

Course Outline

This course covers the following topics: 

  • Visualization: Overview of visualization in Python.
  • Multi-Armed Bandits: Bandits are a way to maximize reward given uncertain payoffs.
  • Bandit algorithms we will cover: greedy, epsilon greedy, epsilon decreasing, exponential, upper confidence bound, and Bayesian
  • Data Types: Static, Restless, and Volatile data will be covered.
  • Static rewards exist forever and their expected payoff never changes
  • Restless rewards exist forever and their expected payoff changes over time
  • Volatile rewards exist for a certain period of time, then become unavailable
  • Simulation: Simulate bandit systems and visualize the results.
  • Application 1: Command line application that uses bandits.
  • Application 2: Website that uses bandits. 


Students should be familiar with any high-level programming language (C++, Java, Python). You should have installed the Python 2.7 version of Anaconda, by Continuum Analytics. Useful links are below:

  1.     Installing Python:
  2.      Install virtualenv and virtualenvwrapper:
  3.      Get a Github account:
  4.      Python Hello World:,_World!
  5.      Using the terminal:
  6.     Python programming:
  7.      Anaconda:

When you launch Anaconda, the Launcher program has many sample iPython notebooks on the right side. These are great tutorials for data analysis and visualization in python. 

Instructor: Kris Wright


Kris is an experienced data scientist who has worked in academia and industry. He is a PhD candidate at Old Dominion University in the Department of Modeling, Simulation, and Visualization Engineering. His dissertation focuses on the social network analysis and machine learning (predicting things on graphs). He is also a full-time data scientist as Cognitiv, a deep learning company located in Bethesda, MD, where he works on computational advertising and image recognition.