Small challenge project to build a pyspark etl for processing a customers portfolio of investments and calculating total emissions of their investments.
To execute pipeline
spark-submit --packages com.crealytics:spark-excel_2.12:3.3.1_0.18.5 src/pipeline.py ./data SamplePortfolio1 SamplePortfolio2