Skip to content

Latest commit

 

History

History
executable file
·
29 lines (15 loc) · 2.19 KB

File metadata and controls

executable file
·
29 lines (15 loc) · 2.19 KB

Udacity Data Engineer Nanodegree

This coursework was completed as part of Udacity's Data Engineer Nanodegree. I obtained my certification in 2021, and this repository is a collection of the projects I undertook during the program.

This repository showcases a portfolio of my work from the Udacity Data Engineer Nanodegree. The projects encompass a variety of skills including designing data models, building data warehouses and data lakes, automating data pipelines, and working with massive datasets.

1. Data Modelling with PostgreSQL

This project explores fundamental concepts of Data Modelling using PostgreSQL. We design and create a database schema, then populate the database using optimized queries for a fictitious music streaming app, Sparkify.

2. ETL in Cloud Data Warehouses

In this project, we build an ETL pipeline that extracts data from S3, stages them in Redshift, and transforms data into a set of dimensional tables for Sparkify's analytics team. The process introduces the hands-on implementation of cloud data warehouses.

3. Data Lakes with Spark

This project focuses on the construction of data lakes using Apache Spark. We build an ETL pipeline that extracts data from S3, processes it using Spark, and loads the processed data back into S3. This project highlights working with big data from different sources and in different formats.

4. Data Pipelines with Airflow

We dive into the world of automated data pipelines using Apache Airflow. By scheduling and monitoring data pipelines, we ensure high data quality for analytics and enable consistent data availability. The project also involves source data extraction from S3 to Redshift.

5. Data Engineering Final Capstone Project: US Migration Data ETL Pipeline with Spark

The Capstone project integrates the skills learned throughout the nanodegree. We construct an ETL pipeline to analyze US immigration data. We use Apache Spark to handle large datasets, enabling comprehensive analysis of migration patterns.

Closing Remarks

Feel free to explore the repository, clone projects, and get hands-on experience with real-world Data Engineering scenarios. Your feedback is always welcome.