Question: weird valid loss when re-scaling y #1013

lfbittencourt · 2023-08-25T13:10:36Z

lfbittencourt
Aug 25, 2023

First of all, I must say this project has been a fundamental part of my master's thesis, so thank you very much for that.

In the last couple of days, I've been trying to understand and fix an issue but had no success. I've managed to reduce my code to a minimal example, so I hope it's easy to understand:

import numpy as np
import pandas as pd
import torch
import torch.nn as nn
from skorch.regressor import NeuralNetRegressor
from torch.optim import Adam
from torchmetrics.regression import MeanAbsolutePercentageError


class Module(nn.Module):
    def __init__(self, input_dimensions, dropout_rate=0):
        super(Module, self).__init__()

        self.module = nn.Sequential(
            nn.Linear(input_dimensions, 256),
            nn.ReLU(),
            nn.Dropout(p=dropout_rate),
            nn.Linear(256, 128),
            nn.ReLU(),
            nn.Dropout(p=dropout_rate),
            nn.Linear(128, 64),
            nn.ReLU(),
            nn.Dropout(p=dropout_rate),
            nn.Linear(64, 1),
        )

    def forward(self, X):
        if X.dtype != torch.float32:
            X = X.to(torch.float32)

        X = self.module(X)

        return X


class NeuralNet(NeuralNetRegressor):
    def fit(self, X, y):
        # Sets the input dimensions of the module according to current X
        self.set_params(module__input_dimensions=X.shape[1])

        # Check if X is a Pandas DataFrame and convert it to a numpy array
        if hasattr(X, "to_numpy"):
            X = X.to_numpy()

        # Check if y is a Pandas Series and convert it to a numpy array
        if hasattr(y, "to_numpy"):
            y = y.to_numpy()

        if X.dtype != np.float32:
            X = X.astype(np.float32)

        if y.dtype != np.float32:
            y = y.astype(np.float32)

        # Reshape y to 2D if it is 1D
        # From https://github.com/skorch-dev/skorch/issues/701#issuecomment-700943377
        if y.ndim == 1:
            y = y.reshape(-1, 1)

        return super().fit(X, y)


df = pd.read_json("08-final-dataset.json.gz")

regressor = NeuralNet(
    module=Module,
    criterion=MeanAbsolutePercentageError,
    optimizer=Adam,
    lr=0.001,  # learning rate
    max_epochs=20,
    verbose=1,
)

X = df[["area", "rooms", "bathrooms"]]
y = df["price"]

# Reescale y to [0, 1]
y = (y - y.min(axis=0)) / (y.max(axis=0) - y.min(axis=0))

regressor.fit(X, y)

The output below is the result of running the above code commenting the y = (y - y.min(axis=0)) / (y.max(axis=0) - y.min(axis=0)) line. As you can see, everything works fine -- valid loss is a little bit higher than train loss, but it's overall okay.

  epoch    train_loss    valid_loss     dur
-------  ------------  ------------  ------
      1        0.5889        0.3677  1.2962
      2        0.2960        0.3681  3.0432
      3        0.2953        0.3675  1.3053
      4        0.2945        0.3671  1.4555
      5        0.2937        0.3667  1.3855
      6        0.2929        0.3652  1.3111
      7        0.2920        0.3635  1.3114
      8        0.2911        0.3614  1.5104
      9        0.2903        0.3606  1.5467
     10        0.2895        0.3588  1.4119
     11        0.2887        0.3567  1.3762
     12        0.2879        0.3557  1.3895
     13        0.2872        0.3532  1.3399
     14        0.2866        0.3518  1.4539
     15        0.2860        0.3504  1.2753
     16        0.2855        0.3493  1.3677
     17        0.2851        0.3477  1.2709
     18        0.2847        0.3459  1.2958
     19        0.2843        0.3443  1.2826
     20        0.2839        0.3428  1.2601

However, when I uncomment that line, train loss is still okay, but the valid loss range is way higher:

  epoch    train_loss    valid_loss     dur
-------  ------------  ------------  ------
      1        0.8218       43.2710  1.1846
      2        0.3220       40.1128  3.0774
      3        0.3117       34.4203  1.1700
      4        0.3104       23.3642  1.1814
      5        0.3078       35.8995  1.1419
      6        0.3082       28.1432  1.1490
      7        0.3024       27.2994  1.2876
      8        0.3015       27.5079  1.2131
      9        0.3027       30.9570  1.2153
     10        0.3007       27.3501  1.2178
     11        0.3008       28.4490  1.2271
     12        0.3003       25.5512  1.2243
     13        0.3001       28.6095  1.1948
     14        0.3001       27.0454  1.2343
     15        0.2993       29.1054  1.2169
     16        0.2994       29.1186  1.2373
     17        0.3002       30.1973  1.2127
     18        0.2988       28.9823  1.2316
     19        0.3010       30.5568  1.2052
     20        0.2983       30.2289  1.1958

I've tried with PyTorch Forecasting MAPE loss with similar results. As a consequence, I can't use an early stopper, for example, because valid loss is simply untrustable

So my question is: do you have any idea of any internal process that could be causing this? I've tried to look at the code, but I'm not able to find anything.

Answered by BenjaminBossan

Aug 25, 2023

First of all, I must say this project has been a fundamental part of my master's thesis, so thank you very much for that.

Happy to hear that, thanks.

In converted the issue into a discussion, I hope you don't mind.

Regarding your problem, I could reproduce it with a synthetic dataset. My first thought was that by scaling y, we change the order of magnitude of the loss and thus need to adjust the learning rate to prevent overfitting. But after experimenting a bit with this, I don't believe anymore that this is the problem (or not the whole problem).

Interestingly, when I used the default loss (MSE), there was no such weird behavior. One reason could be the normalization step in MAPE, whi…

View full answer

BenjaminBossan · 2023-08-25T15:03:48Z

BenjaminBossan
Aug 25, 2023
Maintainer

First of all, I must say this project has been a fundamental part of my master's thesis, so thank you very much for that.

Happy to hear that, thanks.

In converted the issue into a discussion, I hope you don't mind.

Regarding your problem, I could reproduce it with a synthetic dataset. My first thought was that by scaling y, we change the order of magnitude of the loss and thus need to adjust the learning rate to prevent overfitting. But after experimenting a bit with this, I don't believe anymore that this is the problem (or not the whole problem).

Interestingly, when I used the default loss (MSE), there was no such weird behavior. One reason could be the normalization step in MAPE, which divides by y_true. If you have very small value in y, or indeed 0.0, your loss will explode. You should thus check if your scaled y contains extremely small values.

In general, I don't think that MAPE is a good criterion, I would prefer something more stable like MSE. You can still calculate the loss as an additional score (using the EpochScoring callback), but try to avoid it as a criterion.

3 replies

lfbittencourt Aug 25, 2023
Author

Thanks! You absolutely nailed it.

I was about to ask if train loss shouldn't be affected as well, but it turned out it was just luck. Train loss starts to behave in the same weird way as soon as I shuffle the dataset.

From your experience, do you think re-scaling y is even necessary in this case (real estate valuation)?

BenjaminBossan Aug 25, 2023
Maintainer

I was about to ask if train loss shouldn't be affected as well, but it turned out it was just luck.

Ah yes, I forgot to mention that this is probably just coincidence.

From your experience, do you think re-scaling y is even necessary in this case (real estate valuation)?

Yes, even if you use a different loss function as I recommended, you will probably find that scaling helps. Maybe just experiment with different types of scaling (e.g. log+1, sqrt or other power, etc.). You should consider using TransformedTargetRegressor for that, which should work with skorch nets. That way, you can ensure that when the scores are calculated, the losses are scaled back to their original range.

lfbittencourt Aug 25, 2023
Author

Perfect. I'm already using TransformedTargetRegressor -- just removed it here for the sake of simplicity. Thank you very much for your support.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question: weird valid loss when re-scaling y #1013

{{title}}

Replies: 1 comment 3 replies

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Question: weird valid loss when re-scaling y #1013

lfbittencourt Aug 25, 2023

Replies: 1 comment · 3 replies

BenjaminBossan Aug 25, 2023 Maintainer

lfbittencourt Aug 25, 2023 Author

BenjaminBossan Aug 25, 2023 Maintainer

lfbittencourt Aug 25, 2023 Author

lfbittencourt
Aug 25, 2023

Replies: 1 comment 3 replies

BenjaminBossan
Aug 25, 2023
Maintainer

lfbittencourt Aug 25, 2023
Author

BenjaminBossan Aug 25, 2023
Maintainer

lfbittencourt Aug 25, 2023
Author