Fine-Tuning of Pretrained TSTPlus Models #727

MMayr96 · 2023-03-29T12:27:09Z

MMayr96
Mar 29, 2023

Hi all,

I have some problems fine-tuning pre-trained TSTPlus models.

Problem Statement:
Given n features, i.e. sensor measurements tracked over time, I want to pretrain a general model using masked value prediction on slided windows of multidimensional data. I then want to take this pre-trained model and fine-tune it on specific tasks, e.g. forecasting, classification. Some snippets of the currently implemented pre-training approach that converges is shown in the following.

Data Preparation:

X, y = SlidingWindowSplitter(window_len=50, stride=2, horizon=10,pad_remainder=True, get_x=[1,2], get_y=[0], padding_value=0)(df_pivot) # where df_pivot is of shape [n_timesteps x n_features]
splits = get_splits(y, valid_size=.3, stratify=False, shuffle=False)
check_data(X, y)

X      - shape: [471 samples x 2 features x 50 timesteps]  type: ndarray  dtype:float64  isnan: 0 
y      - shape: (471, 10)  type: ndarray  dtype:float64  isnan: 0

Data Loaders:

tfms = [None, TSRegression()]
batch_tfms = [TSStandardize(by_var=True, use_single_batch=False)] 
dls100 = get_ts_dls(X, y, splits = splits, tfms=tfms, batch_tfms=batch_tfms, bs=64) # for supervised fine-tuning (100% labels)
udls100 = get_ts_dls(X, splits = splits, tfms=tfms, batch_tfms=batch_tfms, bs=64) # for self-supervised learning

Pre-Training:

modelname="test"
learn = ts_learner(udls100, TSTPlus, cbs=[ShowGraph(),MVP(target_dir='./models',fname=f'{modelname}_pretrain', subsequence_mask=True, future_mask=False)], train_metrics=True, metrics=mse) 
lr = float(learn.lr_find().valley) 
learn.fit_one_cycle(10, lr)

epoch	train_loss	valid_loss	time
0	0.901126	0.950730	00:03
1	0.834622	0.782307	00:02
2	0.744210	1.241000	00:02
3	0.682028	1.348739	00:02
4	0.656928	0.941007	00:02
5	0.625320	0.860051	00:02

This converges well. The actual problems start when utilizing this pre-trained model. Given now this models/test_pretrain.pth, I would like to fine-tune this on the labelled data loader (i.e. incl. target sensor) . As far as I understand, one can load the pre-trained model like the following. I tried to fine-tune on a forecasting / regression task, i.e. predict y:

  learn_pre_trained = ts_learner(dls100, TSTPlus, pretrained=True, weights_path=f'models/{modelname}_pretrain.pth', metrics=mae)
  learn_pre_trained.fine_tune(n_epochs, base_lr=1e-1, freeze_epochs=freeze_epochs)

epoch	train_loss	valid_loss	mae	time
0	929116.625000	1042135.375000	1019.228699	00:02
1	884935.687500	1082593.125000	1034.478516	00:02
2	803845.750000	1278779.000000	1114.631592	00:03
3	665784.625000	1106931.125000	1009.833862	00:02
4	518952.000000	825159.125000	865.870544	00:02
5	412999.656250	432214.312500	618.384827	00:02

However the fine-tuned model does not converge at all, trainings and validation loss is through the roof. I think I may have some logical flaws somewhere in my used pipeline. Any hints are appreciated!

oguiza · 2023-03-30T08:50:36Z

oguiza
Mar 30, 2023
Maintainer

Hi @MMayr96 ,
I have a few suggestions:

manually calculate the mean and std you want to use, and pass it to TSStandardScaler to ensure you are using the same values during pretraining and fine-tuning.
the lr looks to high to me (1e-1). You may want to use learn.lr_find() to get a better one. For transformers, I've usually find it to be somewhere between 1e-4 to 1e-3.
I often find that using fit_one_cycle on the pretrained model achieves better performance compared to fine_tune.

1 reply

MMayr96 Mar 30, 2023
Author

Hi @oguiza,

thank you very much for your input! I already tried second and third suggestion - unfortunately without success. I suspect that it has to do smth. with scaling, as you also pointed out!
I will try it later this week when I find time and update this discussion based on my findings!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fine-Tuning of Pretrained TSTPlus Models #727

{{title}}

Replies: 1 comment 1 reply

{{title}}

{{title}}

Select a reply

Fine-Tuning of Pretrained TSTPlus Models #727

MMayr96 Mar 29, 2023

Data Preparation:

Data Loaders:

Pre-Training:

Replies: 1 comment · 1 reply

oguiza Mar 30, 2023 Maintainer

MMayr96 Mar 30, 2023 Author

MMayr96
Mar 29, 2023

Replies: 1 comment 1 reply

oguiza
Mar 30, 2023
Maintainer

MMayr96 Mar 30, 2023
Author