This repo contains homework and code for the Data Engineering Zoomcamp by Datatalks.Club.
During the course, we will replicate the following architecture
Week 1 covers the following topics:
- Course overview
- Introduction to GCP
- Docker and docker-compose
- Running Postgres locally with Docker
- Setting up infrastructure on GCP with Terraform
- Preparing the environment for the course
- Homework
Week 2 covers the following topics:
- Introduction to Prefect
- ETL with GCP & Prefect
- From Google Cloud Storage to Big Query
- Parametrizing Flow & Deployments
- Schedules & Docker Storage with Infrastructure
- Prefect Cloud and Additional Resources
- Data Warehouse
- BigQuery
- Partitioning and Clustering
- BigQuery Best Practices
- Internals of BigQuery
- BigQuery for Machine Learning
- Basics of Analytics Engineering
- dbt (data build tool)
- BigQuery and dbt
- dbt models
- Testing and Documenting
- Deployment to the cloud and locally
- Visualizing the data with Google Data Studio
- Report Link: https://lookerstudio.google.com/reporting/2cfd57b1-a163-4210-9455-0e014aaa0b4d
- Docker
- Google Cloud Platform (GCP): Google Cloud Storage and Google BigQuery
- Postgres
- Terraform
- Prefect
- DBT