Skip to content

Predict the electricity production and consumption of prosumers in Estonia.

Notifications You must be signed in to change notification settings

JW-Shen/Enefit-39th-Place-Solution

 
 

Repository files navigation

Enefit - Predict Energy Behavior of Prosumers - Public 71st Place Solution (Private 39th)

Solution writeup: Public 71st Solution Writeup (Private 39th)

Overview

In this competition, competitors need to build ML models to predict energy production and consumption patterns of prosumers in Estonia. Specifically speaking, our solution is mainly composed of lightweight feature engineering with conservative selection, target engineering, and a large model pool with simple ensemble.

How to Run

1. Download Dataset

You need to download the dataset following the instruction on the data tab,

kaggle competitions download -c predict-energy-behavior-of-prosumers

Then, you can unzip the dataset and put the raw data under ./data/raw/.

2. Generate Processed Data

To support iterative model development, you can run the following commands to generate reusable processed data including the complete feature set,

python -m data.preparation.gen_data

After the process finishes, a file base_feats.parquet will be dumped under ./data/processed/.

3. Train Models

With hydra-based configuration system, it's easy to modify configuration setup and do iterative experiments. Each experiment is mainly controlled via data and model configuration. After setup, you can train models by running,

# Train production model with raw target 
python -m tools.main_ml +model_type="p_raw" 'data.tgt_types=[prod]' data.dp.tgt_col="target"

# Train consumption model with target minus target_lag2d
python -m tools.main_ml +model_type="c_raw" 'data.tgt_types=[cons]' data.dp.tgt_col="target_diff_lag2d" 'data.dp.tgt_aux_cols=[target_lag2d]'

# Train domestic consumption model with target divided by installed_capacity
python -m tools.main_ml +model_type="cc_dcap" 'data.tgt_types=[cons_c]' data.dp.tgt_col="target_div_cap_lag2d" 'data.dp.tgt_aux_cols=[installed_capacity_lag2d]'

The output objects (e.g., models, log file train_eval.log, feature importance feat_imps.parquet) will be dumped under the path ./output/<%m%d-%H_%M_%S>/.

4. Upload Models to Kaggle for Online Inference

After models are trained, you can upload model objects to Kaggle for online inference by following steps,

  1. Initialize Kaggle datasets.
kaggle datasets init -p ./output/<exp_id-goes-here>/
  1. Fill dataset metadata in ./output/<exp_id-goes-here>/dataset-metadata.json.
  2. Create Kaggle dataset and upload.
kaggle datasets create -p ./output/<exp_id-goes-here>/ -r zip  # Choose compressed upload

After uploading, you can add the corresponding dataset into the inference notebook for submission.

Experimental Results

We focus on local cross-validation following chronological order and observe whether the result is sync with public LB or not.

CV and LB scores (still waiting...) are shown as follows,

CV Fold2 (202209 ~ 202211) CV Fold1 (202210 ~ 202301) CV Fold0 (202303 ~ 202305) 3-Fold Avg Public LB (202306 ~ 202308) Private LB (202402 ~ 202404)
MAE 30.47 27.06 51.32 36.29 x 59.20
MAE 29.71 26.32 51.00 35.68 x 58.29

About

Predict the electricity production and consumption of prosumers in Estonia.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 99.8%
  • Python 0.2%