Cannot overfit net on a small training set #165

sptom · 2020-04-23T14:50:21Z

Hi there,
First of all, thank you for this code!
My first run with the network was unsuccessful, so I tried to make a quick sanity check.
It is my understanding that a "healthy" neural net should easily overfit the training data when few examples are given. It should quickly learn to classify them with 100% accuracy by simply "memorising" the images.
So I tried to overfit the net for 38 images (35, actually, because 3 are used as validation set) from the Pascal VOC database.

I use binary black=background white=object masks.

Running training for 100 epochs, I still get very poor results - the training loss fluctuates rather heavily around 0.5, as can be seen from the tensor-board plot and the output to screen:

I'm quite certain that this behaviour is quite irregular and that I'm missing something. Do you have an idea what might that be?

The results on the very same training set don't make any sense:

milesial · 2020-04-23T16:17:31Z

Thank you for the detailed explaination of your problem. From the loss value and the loss plot it seems that your learning rate is way too high, have you tried lowering it? Divide it by 10 first maybe.

Also for a sanity check, you can try with even less images, like 1 or 2, the training will be faster.

juliagong · 2020-04-30T16:02:17Z

Hi @milesial and @sptom! I'm having this issue of failing to overfit as well, and I'm using 1e-4 as my learning rate and a training set of just a single image. I can't seem to find anything irregular in the training code, but if more than one of us is having this issue, I wonder if there's something we're overlooking? Any insight would be appreciated!

milesial · 2020-04-30T17:27:15Z

I think your learning rate is too high. For the full carvana dataset I used 2e-6 as a LR

juliagong · 2020-04-30T20:03:18Z

Thanks; I unfortunately have also tried learning rates on the order of 1e-6 and got the same results. I also disabled all augmentation, normalization, and other regularization so that it's exactly the steps that you used. For some reason, the model isn't overfitting to the one-image dataset and either gives nonsensical segmentations or converges to weights being all 0. Have you had this issue? Do you have any insights? Thanks!

sptom · 2020-04-30T21:39:15Z

Thanks for the reply Alex,
I also tried running the code with various lower orders of magnitude for the learning rate. I also tried using different optimizers, but I have to agree with Julia here, it did not help:
The phenomenon that I see is that the lower I take the LR, the quicker I get the loss to converge to 0.7 and stay there, for some reason.

In some of the runs I got somewhat indicative results for the masks, but it's hardly something that could be considered 'overfitting'.

Could it be that the loss funciton is unstable? Or perhaps you added some alteration in a recent update?

Thanks a lot,
Tom

milesial · 2020-04-30T23:14:51Z

There was a lot of small tweaks in recent commits, but nothing that should affect the convergence I think. Are you using transposed conv or the bilinear route (default)?
The loss is just the cross entropy so it should be pretty stable.

Do you both work on the same dataset? Have you tried with an image from the Carvana dataset?

This problem is very strange. If you feed it 100 images, does it learn something?

juliagong · 2020-05-01T03:52:48Z

I'm using the bilinear route. I don't think we're using the same dataset, but we have the same issue. I've tried feeding it 50-100 images in training and it doesn't learn properly; it stays at around 0.7 loss.

juliagong · 2020-05-01T15:40:02Z

Update: I also tried transposed convolutions and they are not working either. It's such a strange issue and I don't think I've ever encountered something like this before.

I wonder if it is something that's not wrong with the model, but somehow with the training procedure. I have tried both the two models from this repo as well as a pretrained model from a different project, all with the same image. They all do not learn.

juliagong · 2020-05-02T18:50:48Z

@milesial I spent today debugging once again by rewriting the entire training pipeline from scratch and testing on incrementally more meaningful sets of data, and ended up finding the problem! It was very sneaky.

Your train.py is actually fine, but in eval.py, net.eval() was called. However, net.train() was called only at the beginning of each epoch, while net.eval() was called every batch. So there was only meaningful training going on for one batch per epoch - no wonder! :)

Thanks for your help and quick response on this problem. I hope this fixes @sptom's issue as well.

sptom · 2020-05-03T08:59:27Z

Oh, Wow, awesome @juliagong ! that sounds really sneaky! I didn't notice that the net.train() was put in the epoch and not in the batch loop...
However, I tried to solve this, first by putting net.train() into the batch loop, which didn't help unfortunately. I then also tried to put the code in eval.py into with torch.no_grad(): instead of using net.eval(), but I didn't notice any significant effect by that.
Could you say how did you resolve the issue, and also - which parameters did you use afterwards?
Was the overfitting accurate? How many epochs did it take to converge to effectively zero loss?

phper5 · 2020-05-03T10:06:42Z

@milesial I spent today debugging once again by rewriting the entire training pipeline from scratch and testing on incrementally more meaningful sets of data, and ended up finding the problem! It was very sneaky.

Your train.py is actually fine, but in eval.py, net.eval() was called. However, net.train() was called only at the beginning of each epoch, while net.eval() was called every batch. So there was only meaningful training going on for one batch per epoch - no wonder! :)

Thanks for your help and quick response on this problem. I hope this fixes @sptom's issue as well.

i think the real train code is masks_pred = net(imgs) and then count the loss and loss.backward() so every batch it has traind, even net.train() was called only at the beginning of each epoch. am i wrong? thanks

sptom · 2020-05-03T10:18:26Z

@phper5, for masks_pred = net(imgs) you are right,
But the script does calculate the accuracy on the test set several times in an epoch using eval.py

milesial · 2020-05-03T11:49:42Z

@juliagong Thanks for your investigation ! It is indeed a big mistake that is there from a long time ago and needs a fix. But do the train and eval methods of the net module affect something else than BatchNorms here?
When you say there is no meaningful training, I'm not sure, since there is still gradient updates on other layers, it's just the BatchNorms that are broken (?)

Thanks to all of you for participating in this

phper5 · 2020-05-03T12:27:30Z

@phper5, for masks_pred = net(imgs) you are right,
But the script does calculate the accuracy on the test set several times in an epoch using eval.py

Yes you are right. sorry I didn't look carefully

sptom · 2020-05-04T07:23:32Z

Oh, Wow, awesome @juliagong ! that sounds really sneaky! I didn't notice that the net.train() was put in the epoch and not in the batch loop...
However, I tried to solve this, first by putting net.train() into the batch loop, which didn't help unfortunately. I then also tried to put the code in eval.py into with torch.no_grad(): instead of using net.eval(), but I didn't notice any significant effect by that.
Could you say how did you resolve the issue, and also - which parameters did you use afterwards?
Was the overfitting accurate? How many epochs did it take to converge to effectively zero loss?

@juliagong could you please share how did you resolve the issue? Because I did the above and it doesn't seem to help, still converging to 0.6 loss

gboy2019 · 2020-05-05T13:01:54Z

Oh, Wow, awesome @juliagong ! that sounds really sneaky! I didn't notice that the net.train() was put in the epoch and not in the batch loop...
However, I tried to solve this, first by putting net.train() into the batch loop, which didn't help unfortunately. I then also tried to put the code in eval.py into with torch.no_grad(): instead of using net.eval(), but I didn't notice any significant effect by that.
Could you say how did you resolve the issue, and also - which parameters did you use afterwards?
Was the overfitting accurate? How many epochs did it take to converge to effectively zero loss?

@juliagong could you please share how did you resolve the issue? Because I did the above and it doesn't seem to help, still converging to 0.6 loss

me too，sometimes 0.7，sometimes 0.6，sometimes 0.8！so how fix this issue？

ProfessorHuang · 2020-05-05T14:39:15Z

@milesial I spent today debugging once again by rewriting the entire training pipeline from scratch and testing on incrementally more meaningful sets of data, and ended up finding the problem! It was very sneaky.

Your train.py is actually fine, but in eval.py, net.eval() was called. However, net.train() was called only at the beginning of each epoch, while net.eval() was called every batch. So there was only meaningful training going on for one batch per epoch - no wonder! :)

Thanks for your help and quick response on this problem. I hope this fixes @sptom's issue as well.

Invaluable finding! Thank you very much. milesial's code is pretty, I think I can understand it, but I can't figure out Why it doesn't work on my dataset. Today, When I changed the code as you said, everything becomes good.

milesial · 2020-05-05T16:15:50Z

Hi all, I modified to switch back to train mode in 773ef21

sptom · 2020-05-05T21:58:28Z

Thanks, @milesial for your update.
However, I'm sorry to say that this did not resolve the issue.
Loss is still stuck around 0.6
I tried reducing the dataset to 11 copies of a single image, and the loss is now stuck on 0.3:

The above is the result after running 15 epochs on 11 copies of the same image!

Were you able to overfit a set of 10 different images? If so, how many epochs did it take, and with which parameters?

@ProfessorHuang , What exactly did you change in the code?

gboy2019 · 2020-05-06T07:09:09Z

Thanks, @milesial for your update.
However, I'm sorry to say that this did not resolve the issue.
Loss is still stuck around 0.6
I tried reducing the dataset to 11 copies of a single image, and the loss is now stuck on 0.3:

The above is the result after running 15 epochs on 11 copies of the same image!

Were you able to overfit a set of 10 different images? If so, how many epochs did it take, and with which parameters?

@ProfessorHuang , What exactly did you change in the code?

could you share your code？loss of mine is 1e+3 sometimes，very terrible ，so show your code？

shilei2403 · 2021-01-08T11:52:27Z

@milesial I spent today debugging once again by rewriting the entire training pipeline from scratch and testing on incrementally more meaningful sets of data, and ended up finding the problem! It was very sneaky.
Your train.py is actually fine, but in eval.py, net.eval() was called. However, net.train() was called only at the beginning of each epoch, while net.eval() was called every batch. So there was only meaningful training going on for one batch per epoch - no wonder! :)
Thanks for your help and quick response on this problem. I hope this fixes @sptom's issue as well.

Invaluable finding! Thank you very much. milesial's code is pretty, I think I can understand it, but I can't figure out Why it doesn't work on my dataset. Today, When I changed the code as you said, everything becomes good.

how do you change the code ?could you show the details?

karlita101 · 2021-03-10T23:01:43Z

Hi there, I am also wondering if there's been any updates or willingness to share how they've overcame this issue.

Very much appreciated!

@juliagong @ProfessorHuang

Li-Wei-NCKU · 2021-07-15T03:52:20Z

Hi all, I modified to switch back to train mode in 773ef21

The modify makes sense to me logically.
However, the loss is still stuck and the model couldn't overfit on a small dataset with only 20 images.
Is there any kind suggestion? @milesial @juliagong @ProfessorHuang

AJSVB · 2021-07-31T09:37:18Z

I might be wrong, but I had similar issue, and reducing weight decay and momentum helped me overfitting

rgkannan676 · 2021-09-06T07:23:51Z

Hi,

For small datasets, reducing the evaluation frequency reduced the training loss for me. This is will avoid learning rate becoming small value after few steps.

Example.

# Evaluation round
if global_step % (n_train // (0.25 * batch_size)) == 0:

Flyingdog-Huang · 2021-09-11T14:19:23Z

Hi,

For small datasets, reducing the evaluation frequency reduced the training loss for me. This is will avoid learning rate becoming small value after few steps.

Example.
# Evaluation round
if global_step % (n_train // (0.25 * batch_size)) == 0:

@rgkannan676 Thanks a lot ,this way helps the loss value becomes normal in small dataset for me ,I want to know why it can avoid LR becoming small after few steps, and now I will think how can reduce the loss shoking like this

hope can recieve your reply, thanks again @rgkannan676

Flyingdog-Huang · 2021-09-12T13:59:33Z

maybe I find the reason

k-nayak · 2021-09-15T08:19:19Z

Hi,

For small datasets, reducing the evaluation frequency reduced the training loss for me. This is will avoid learning rate becoming small value after few steps.

Example.
# Evaluation round
if global_step % (n_train // (0.25 * batch_size)) == 0:

I have a dataset of 260 images and 0.25 did help significantly to reduce loss, but the DIce Coefficient has remained stagnant at 0.36. Is there any way to improve the DICE score ? It is unable to generalize when given a new image.

Flyingdog-Huang · 2021-09-16T09:22:48Z

Hi,
For small datasets, reducing the evaluation frequency reduced the training loss for me. This is will avoid learning rate becoming small value after few steps.
Example.
# Evaluation round
if global_step % (n_train // (0.25 * batch_size)) == 0:
I have a dataset of 260 images and 0.25 did help significantly to reduce loss, but the DIce Coefficient has remained stagnant at 0.36. Is there any way to improve the DICE score ? It is unable to generalize when given a new image.

I also meet this problem , and I am thinking a way of that

k-nayak · 2021-09-16T09:41:32Z

Same here. Will update incase I get a fix for better Dice. Please do update if any fix is found.

Thanks in advance.

Flyingdog-Huang · 2021-09-24T10:02:14Z

Same here. Will update incase I get a fix for better Dice. Please do update if any fix is found.

Thanks in advance.

hi, what about your project? 2 class or more class ?

k-nayak · 2021-09-24T10:03:44Z

Same here. Will update incase I get a fix for better Dice. Please do update if any fix is found.
Thanks in advance.

hi, what about your project? 2 class or more class ?

mine is binary segmentation

Flyingdog-Huang · 2021-09-24T10:07:30Z

Same here. Will update incase I get a fix for better Dice. Please do update if any fix is found.
Thanks in advance.

hi, what about your project? 2 class or more class ?

mine is binary segmentation

oh, that is weird, mine is multi-seg

Flyingdog-Huang · 2021-09-29T03:40:22Z

Same here. Will update incase I get a fix for better Dice. Please do update if any fix is found.
Thanks in advance.

hi, what about your project? 2 class or more class ?

mine is binary segmentation

I want to know if the target is smaller than the backgroud in your dataset?
my situation is :
my dataset is small , and target is smaller than backgroud.
in train part, I compute dice loss with backgroud channel, and the loss works well.
in evaluate part, I compute dice without backgroud channel , and the dice works bad.

and I found that the target is almost same as backgroud in this project's data,
so I guess the reason that dice is not good enough is we do not compute the big backgroud channel.
next I will analyze the relationship of dice and big backgroud in math way.

k-nayak · 2021-09-29T06:49:16Z

Very interesting Approach @Flyingdog-Huang. My data set is small as well and the target is smaller compared to the whole image and there are water droplets, which are difficult to distinguish at times and also making masks. Based on the lighting conditions the data tends to be better or worse at times. The model is unable to distinguish at times if a droplet is present or not since it's transparent and lacks robust edges most of the times. I believe with my data set the problem is the the dataset itself. I am implementing it to do real-time segmentation in a video. And the results are not very good. Attention Unet perfomed a little better than U-Net and i am checking If residual Attention U-Net can perform better. My Dice score is around 0.73 with a loss of 0.20 .

milesial mentioned this issue Aug 6, 2020

Help with training #208

Closed

Cannot overfit net on a small training set #165

Cannot overfit net on a small training set #165

Comments

sptom commented Apr 23, 2020

milesial commented Apr 23, 2020

juliagong commented Apr 30, 2020

milesial commented Apr 30, 2020

juliagong commented Apr 30, 2020

sptom commented Apr 30, 2020

milesial commented Apr 30, 2020 • edited Loading

juliagong commented May 1, 2020

juliagong commented May 1, 2020 • edited Loading

juliagong commented May 2, 2020

sptom commented May 3, 2020 • edited Loading

phper5 commented May 3, 2020

sptom commented May 3, 2020

milesial commented May 3, 2020

phper5 commented May 3, 2020

sptom commented May 4, 2020

gboy2019 commented May 5, 2020

ProfessorHuang commented May 5, 2020

milesial commented May 5, 2020

sptom commented May 5, 2020 • edited Loading

gboy2019 commented May 6, 2020

shilei2403 commented Jan 8, 2021

karlita101 commented Mar 10, 2021

Li-Wei-NCKU commented Jul 15, 2021

AJSVB commented Jul 31, 2021

rgkannan676 commented Sep 6, 2021 • edited Loading

Flyingdog-Huang commented Sep 11, 2021

Flyingdog-Huang commented Sep 12, 2021

k-nayak commented Sep 15, 2021

Flyingdog-Huang commented Sep 16, 2021

k-nayak commented Sep 16, 2021

Flyingdog-Huang commented Sep 24, 2021

k-nayak commented Sep 24, 2021

Flyingdog-Huang commented Sep 24, 2021

Flyingdog-Huang commented Sep 29, 2021

k-nayak commented Sep 29, 2021

milesial commented Apr 30, 2020 •

edited

Loading

juliagong commented May 1, 2020 •

edited

Loading

sptom commented May 3, 2020 •

edited

Loading

sptom commented May 5, 2020 •

edited

Loading

rgkannan676 commented Sep 6, 2021 •

edited

Loading