This project explores and analyzes data from the Tokyo 2021 Olympics using a publicly available dataset from Kaggle. The focus is on implementing ETL (Extract, Transform, Load) and ELT (Extract, Load, Transform) processes within the Azure Cloud environment, enhancing my skills in data engineering and cloud technologies.
The Tokyo 2021 Olympics Data Engineering Project focuses on exploring and analyzing data from the Tokyo 2021 Olympics using a dataset sourced from Kaggle. The project employs a robust ETL (Extract, Transform, Load) approach within the Azure Cloud ecosystem to enhance my data engineering skills. To begin, I utilized Azure Data Factory to ingest the raw data into Azure Data Lake Storage Gen2, which serves as a secure and scalable data lake for storing unprocessed data. Following ingestion, I leveraged Azure Databricks to perform data transformations using Apache Spark, applying various data cleaning and preprocessing techniques to prepare the data for analysis. Once the data was transformed, I stored the processed data back into Azure Data Lake Storage Gen2 for efficient access. Finally, I utilized Azure Synapse Analytics to query the processed data, allowing me to derive meaningful insights related to Olympic events, athlete performances, and medal counts. This project not only showcases the application of various Azure services, such as Data Factory, Databricks, and Synapse Analytics, but also provides valuable insights into the dynamics of the Tokyo 2021 Olympics. Future enhancements may include incorporating advanced analytics and machine learning capabilities to further enrich the analysis.