In our data driven world, managing massive data sets and information pipelines is a challenge faced by nearly every organization. Enter the data engineer. The role of a data engineer is to take disparate data sets, combine them, and store them in ways that enable downstream analytics. This requires the ability to collect and clean data as well as to define appropriate data models and identify optimal ways to store data at scale. This course is designed to teach you these skills using the most cutting edge and in-demand data technologies.
In this course, attendees learn how to extract features from web data and how to clean and model it. They gain a broad understanding of modern data storage technologies and learn specific techniques for working with SQL and NoSQL data warehouses. Participants will also learn how to leverage Amazon Web Services (AWS) for building highly scalable and resilient data pipelines.
What You Will Learn
This course will teach you how to collect, clean, model, and store large volumes of data. You will develop the ability to create and manage big data pipelines and you will learn how to select the right technology for the problem at hand. You will learn how to work with Amazon Web Services to create on-demand, serverless pipelines which are key components of many organizations’ overall analytics strategy.
This course will cover the following topics:
Data acquisition using APIs and web scraping
Feature extraction and data modeling
Data wrangling and normalization with Python
Stream processing with AWS Lambda
Storing and querying structured data with SQL
NoSQL data storage with Elasticsearch
Creating and leveraging data lakes with AWS S3
Building and using OLAP data warehouses
Big data querying and analytics with AWS Athena
Building scalable and resilient multi-component data pipelines
Visualizing data pipelines with Kibana
Attendees should have a basic familiarity with Python and with the command line before participating in this course. They should also have the required software installed and operational on their computers.