Skip to content

marystory/Data-Engineering-Capstone-Project

Repository files navigation

Data-Engineering-Capstone-Project

Project Summary

The aim of the project to create an ETL pipeline script to create an star schema for Immigration and Airport data in order to enable Analysis of data in an optimized manner.

I94 Immigration Data: This data comes from the US National Tourism and Trade Office. A data dictionary is included in the workspace. This is where the data comes from. There's a sample file so you can take a look at the data in csv format before reading it all in. You do not have to use the entire dataset, just use what you need to accomplish the goal you set at the beginning of the project.

Airport Code Table: This is a simple table of airport codes and corresponding cities. It comes from here.

US State Table : This table has a list of valid us state names and their abbreviations. It comes from here

The project follows the follow steps:

  • Step 1: Scope the Project and Gather Data
  • Step 2: Explore and Assess the Data
  • Step 3: Define the Data Model
  • Step 4: Run ETL to Model the Data
  • Step 5: Data quality

About

Udacity data engineering nano degree project [#6]

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published