In our data driven world, managing massive data sets and information pipelines is a challenge faced by nearly every organization. Enter the data engineer. The role of a data engineer is to take disparate data sets, combine them, and store them in ways that enable downstream analytics. This requires the ability to collect and clean data as well as to define appropriate data models and identify optimal ways to store data at scale. This course is designed to teach you these skills using the most cutting edge and in-demand data technologies.

In this course, attendees learn how to extract features from web data and how to clean and model it. They gain a broad understanding of modern data storage technologies and learn specific techniques for working with SQL and NoSQL data warehouses. Participants will also learn how to leverage Amazon Web Services (AWS) for building highly scalable and resilient data pipelines.

What You Will Learn

This course will teach you how to collect, clean, model, and store large volumes of data. You will develop the ability to create and manage big data pipelines and you will learn how to select the right technology for the problem at hand. You will learn how to work with Amazon Web Services to create on-demand, serverless pipelines which are key components of many organizations’ overall analytics strategy.

Course Outline

This course will cover the following topics:

  • Data acquisition using APIs and web scraping

  • Feature extraction and data modeling

  • Data wrangling and normalization with Python

  • Stream processing with AWS Lambda

  • Storing and querying structured data with SQL

  • NoSQL data storage with Elasticsearch

  • Creating and leveraging data lakes with AWS S3

  • Building and using OLAP data warehouses

  • Big data querying and analytics with AWS Athena

  • Building scalable and resilient multi-component data pipelines

  • Visualizing data pipelines with Kibana

Course requirements

Attendees should have a basic familiarity with Python and with the command line before participating in this course. They should also have the required software installed and operational on their computers.