Some questions about data normalization #411
Replies: 9 comments
-
In addition, how should I non-normalized the predicted y (label) in the real wold after I use the function learn.get_X_preds(x[splits[2]]). Hahhahahh, I have so much questions and my English skill also pool. |
Beta Was this translation helpful? Give feedback.
-
Hi @chuzheng88, batch_tfms = TSNormalize(by_sample=True, by_var=True, range=(0,1)) like in this example: X, y, splits = get_regression_data('Covid3Month', split_data=False)
print(X.min(), X.max()) # 0.0 20341.0
tfms = [None, TSRegression()]
batch_tfms = TSNormalize(by_sample=True, by_var=True, range=(0,1))
dls = get_ts_dls(X, y, splits=splits, tfms=tfms, batch_tfms=batch_tfms)
xb, yb = dls.train.one_batch()
print(xb.min(), xb.max()) # TSTensor([0.0], device=cuda:0) TSTensor([1.0], device=cuda:0) As to the y, why do you want to normalize it? I think it'd be good to try it first without any preprocessing. |
Beta Was this translation helpful? Give feedback.
-
Thank you for your answer. In my opinion, the operation (y were normalizated to range(0, 1)) will be good for calculate the gradients and then execute the backpropagation algorithm becuase normalized X were normalized to range(0, 1). If X are range(0, 1) and y are range(-100, 10000), I think the magnitude of X and y will cause a larger error. |
Beta Was this translation helpful? Give feedback.
-
I still think you should try it. output = output * (y_range.max() - y_range.min()) + y_range.min() In this way, the network only needs to predict values between 0 and 1. I'd recommend you to use the model with the original y with and without y_range and compare the results. If they are not good, you may want to implement a manual preprocessing and postprocessing. |
Beta Was this translation helpful? Give feedback.
-
Thank you very much. I will try it and compare results between normalized y and true y in the real world. I will post the results if the results have a quite different. |
Beta Was this translation helpful? Give feedback.
-
In addition, can tsai be well compatible with parameters tuning tools, such as RAY TUNE (https://pytorch.org/tutorials/beginner/hyperparameter_tuning_tutorial.html)? |
Beta Was this translation helpful? Give feedback.
-
I haven't used Ray Tune. |
Beta Was this translation helpful? Give feedback.
-
I have read these two articles which are helpful for me. It suddenly occured to me that X in my dataset consists of variable sequence len, such as: |
Beta Was this translation helpful? Give feedback.
-
I've modified a function that was already available to make it fit a wider need. It's called pad_sequences. You can read the documentation here. I'll move this issue to Discussions as no changes to tsai are required. |
Beta Was this translation helpful? Give feedback.
-
Tsai is a very good project which can save a lot of time. I want to usd the project to solve a regression problem.
I have the dataset (both X and y), I want to use X (time series sequence) to predict y (float point), but X and y also not be normalized, so I normalized the dataset (both X and y) before training model and the non-normalized the predicted value to the true value in the real world. When I used the function "TSNormalize" to preprocess my dataset, the function did not work in my opinion. My code describe as follows:
Codes in notebook cell print the normalized data, but the normalized data also greater than 1 (e.g., tensor([2.7380, 2.8582, 2.6833, 2.7741], device='cuda:0')) or tensor([2.6764, 2.9775, 3.0802, 2.6598], device='cuda:0'))).
I want to know how should I use the TSNormalize function.
Beta Was this translation helpful? Give feedback.
All reactions