This repository contains the code and analysis for the 2024 Citadel Spring Invitation Datathon, focused on understanding the role of petroleum/oil in the United States and the world. Our team, consisting of Ishita Jain (Harvey Mudd College), Chengyi Tang (Harvey Mudd College), Zara Cook (University of Waterloo), and Anjie Liu (University of Texas at Austin), was one of 100 students invited to compete in this prestigious Data Science Competition.
The goal of this datathon was to explore the complex relationships between oil production, consumption, and various economic and geopolitical factors. Our approach involved leveraging a range of techniques, including:
- Time Series Analysis and Machine Learning
- Mutual Information and Lag Time Optimization
- Granger Causality for Non-Linear Correlations
- Stochasticity in Natural Language Processing (NLP) Models
- Structural Causal Models (SCM)
- Web Scraping
Our findings and insights were compiled into a comprehensive 30-page report.
-
Time Series Analysis: We explored time series data using machine learning techniques and evaluated non-linear correlations using Granger Causality, achieving a score of 0.02.
-
Lag Time Optimization: We optimized the lag time of features using the Mutual Information Index, with a score of 0.16, and extracted valuable analysis from the lagged data.
-
Sentiment Analysis and Causal Inference: Our team collaborated effectively to conduct sentiment analysis on relevant speeches and establish causal relationships using Structural Causal Models (SCM).
-
Web Scraping: We employed web scraping techniques to gather additional data sources, enhancing the depth and breadth of our analysis.
While this repository represents the work completed during the datathon, contributions for further analysis or improvements are welcome. If you find any issues or have suggestions, please open an issue or submit a pull request.
This project is licensed under the MIT License.
We would like to express our gratitude to Citadel for organizing this invaluable learning experience and to our professors and peers for their support throughout the datathon.