You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am trying to optimize patchTST model for multivariate forecasting on dataset with ~27000 samples and 8 columns. The splits are train=70,Valid=20 and Test 10. I have tried implementing tutorial notebook on patctTST with my data and used the same scaling method as in notebook. The results are no matter what I do there is almost always a difference between training and validation loss(overfitting). So, I tried using sklearn standrd scaler what happened was that the overfittingw as reduced considerably;however, the overall MAE loss stands at ~4.019 while validation loss stands at ~4.703. Comparaively speaking, using TSSTandard Scaler leads to mae loss of ~0.57 and ~0.67 for training and validation respectively. To avoid this gap, I tried using dropout, decreasing or increasing complexity of model architecture etc. but no luck. So, my question is how can I deal with this?
Note: After getting the splits, I fitted the sklean standard scaler on train data only and then trasnformed both valdiationa and test datasets. Also, optimal learning rate was found using lr_find.
Model Architecture:
PatchTST (Input shape: 256 x 8 x 192)
============================================================================
Layer (type) Output Shape Param # Trainable
============================================================================
256 x 8 x 2
RevIN 16 True
____________________________________________________________________________
256 x 8 x 196
ReplicationPad1d
____________________________________________________________________________
256 x 8 x 48
Unfold
____________________________________________________________________________
256 x 8 x 48 x 256
Linear 2304 True
Dropout
Linear 65792 True
Linear 65792 True
Linear 65792 True
Dropout
Linear 65792 True
Dropout
Dropout
____________________________________________________________________________
256 x 256 x 48
Transpose
BatchNorm1d 512 True
____________________________________________________________________________
256 x 48 x 256
Transpose
____________________________________________________________________________
256 x 48 x 132
Linear 33924 True
GELU
Dropout
____________________________________________________________________________
256 x 48 x 256
Linear 34048 True
Dropout
____________________________________________________________________________
256 x 256 x 48
Transpose
BatchNorm1d 512 True
____________________________________________________________________________
256 x 48 x 256
Transpose
Linear 65792 True
Linear 65792 True
Linear 65792 True
Dropout
Linear 65792 True
Dropout
Dropout
____________________________________________________________________________
256 x 256 x 48
Transpose
BatchNorm1d 512 True
____________________________________________________________________________
256 x 48 x 256
Transpose
____________________________________________________________________________
256 x 48 x 132
Linear 33924 True
GELU
Dropout
____________________________________________________________________________
256 x 48 x 256
Linear 34048 True
Dropout
____________________________________________________________________________
256 x 256 x 48
Transpose
BatchNorm1d 512 True
____________________________________________________________________________
256 x 48 x 256
Transpose
____________________________________________________________________________
256 x 8 x 12288
Flatten
____________________________________________________________________________
256 x 8 x 2
Linear 24578 True
____________________________________________________________________________
Total params: 691,226
Total trainable params: 691,226
Total non-trainable params: 0
Optimizer used: <function Adam at 0x7fce117e55a0>
Loss function: <function mae at 0x7fcdeccf8c10>
Callbacks:
- TrainEvalCallback
- CastToTensor
- Recorder
- ProgressCallback
- ShowGraph
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Hi!
Hope everybody is well.
I am trying to optimize patchTST model for multivariate forecasting on dataset with ~27000 samples and 8 columns. The splits are train=70,Valid=20 and Test 10. I have tried implementing tutorial notebook on patctTST with my data and used the same scaling method as in notebook. The results are no matter what I do there is almost always a difference between training and validation loss(overfitting). So, I tried using sklearn standrd scaler what happened was that the overfittingw as reduced considerably;however, the overall MAE loss stands at ~4.019 while validation loss stands at ~4.703. Comparaively speaking, using TSSTandard Scaler leads to mae loss of ~0.57 and ~0.67 for training and validation respectively. To avoid this gap, I tried using dropout, decreasing or increasing complexity of model architecture etc. but no luck. So, my question is how can I deal with this?
Note: After getting the splits, I fitted the sklean standard scaler on train data only and then trasnformed both valdiationa and test datasets. Also, optimal learning rate was found using lr_find.
Model Architecture:
Loss Curves using TSStandard Scaler:
Loss Curves using Skelearn Standard Scaler:
Beta Was this translation helpful? Give feedback.
All reactions