-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cannot overfit net on a small training set #165
Comments
Thank you for the detailed explaination of your problem. From the loss value and the loss plot it seems that your learning rate is way too high, have you tried lowering it? Divide it by 10 first maybe. Also for a sanity check, you can try with even less images, like 1 or 2, the training will be faster. |
Hi @milesial and @sptom! I'm having this issue of failing to overfit as well, and I'm using 1e-4 as my learning rate and a training set of just a single image. I can't seem to find anything irregular in the training code, but if more than one of us is having this issue, I wonder if there's something we're overlooking? Any insight would be appreciated! |
I think your learning rate is too high. For the full carvana dataset I used 2e-6 as a LR |
Thanks; I unfortunately have also tried learning rates on the order of 1e-6 and got the same results. I also disabled all augmentation, normalization, and other regularization so that it's exactly the steps that you used. For some reason, the model isn't overfitting to the one-image dataset and either gives nonsensical segmentations or converges to weights being all 0. Have you had this issue? Do you have any insights? Thanks! |
There was a lot of small tweaks in recent commits, but nothing that should affect the convergence I think. Are you using transposed conv or the bilinear route (default)? Do you both work on the same dataset? Have you tried with an image from the Carvana dataset? This problem is very strange. If you feed it 100 images, does it learn something? |
I'm using the bilinear route. I don't think we're using the same dataset, but we have the same issue. I've tried feeding it 50-100 images in training and it doesn't learn properly; it stays at around 0.7 loss. |
Update: I also tried transposed convolutions and they are not working either. It's such a strange issue and I don't think I've ever encountered something like this before. I wonder if it is something that's not wrong with the model, but somehow with the training procedure. I have tried both the two models from this repo as well as a pretrained model from a different project, all with the same image. They all do not learn. |
@milesial I spent today debugging once again by rewriting the entire training pipeline from scratch and testing on incrementally more meaningful sets of data, and ended up finding the problem! It was very sneaky. Your Thanks for your help and quick response on this problem. I hope this fixes @sptom's issue as well. |
Oh, Wow, awesome @juliagong ! that sounds really sneaky! I didn't notice that the net.train() was put in the epoch and not in the batch loop... |
i think the real train code is |
@phper5, for masks_pred = net(imgs) you are right, |
@juliagong Thanks for your investigation ! It is indeed a big mistake that is there from a long time ago and needs a fix. But do the Thanks to all of you for participating in this |
Yes you are right. sorry I didn't look carefully |
@juliagong could you please share how did you resolve the issue? Because I did the above and it doesn't seem to help, still converging to 0.6 loss |
me too,sometimes 0.7,sometimes 0.6,sometimes 0.8!so how fix this issue? |
Invaluable finding! Thank you very much. milesial's code is pretty, I think I can understand it, but I can't figure out Why it doesn't work on my dataset. Today, When I changed the code as you said, everything becomes good. |
Hi all, I modified to switch back to train mode in 773ef21 |
Thanks, @milesial for your update.
Were you able to overfit a set of 10 different images? If so, how many epochs did it take, and with which parameters? @ProfessorHuang , What exactly did you change in the code? |
could you share your code?loss of mine is 1e+3 sometimes,very terrible ,so show your code? |
how do you change the code ?could you show the details? |
Hi there, I am also wondering if there's been any updates or willingness to share how they've overcame this issue. Very much appreciated! |
The modify makes sense to me logically. |
I might be wrong, but I had similar issue, and reducing weight decay and momentum helped me overfitting |
Hi, For small datasets, reducing the evaluation frequency reduced the training loss for me. This is will avoid learning rate becoming small value after few steps. Example.
|
@rgkannan676 Thanks a lot ,this way helps the loss value becomes normal in small dataset for me ,I want to know why it can avoid LR becoming small after few steps, and now I will think how can reduce the loss shoking like this hope can recieve your reply, thanks again @rgkannan676 |
I have a dataset of 260 images and 0.25 did help significantly to reduce loss, but the DIce Coefficient has remained stagnant at 0.36. Is there any way to improve the DICE score ? It is unable to generalize when given a new image. |
I also meet this problem , and I am thinking a way of that |
Same here. Will update incase I get a fix for better Dice. Please do update if any fix is found. Thanks in advance. |
hi, what about your project? 2 class or more class ? |
mine is binary segmentation |
oh, that is weird, mine is multi-seg |
Very interesting Approach @Flyingdog-Huang. My data set is small as well and the target is smaller compared to the whole image and there are water droplets, which are difficult to distinguish at times and also making masks. Based on the lighting conditions the data tends to be better or worse at times. The model is unable to distinguish at times if a droplet is present or not since it's transparent and lacks robust edges most of the times. I believe with my data set the problem is the the dataset itself. I am implementing it to do real-time segmentation in a video. And the results are not very good. Attention Unet perfomed a little better than U-Net and i am checking If residual Attention U-Net can perform better. My Dice score is around 0.73 with a loss of 0.20 . |
Hi there,
First of all, thank you for this code!
My first run with the network was unsuccessful, so I tried to make a quick sanity check.
It is my understanding that a "healthy" neural net should easily overfit the training data when few examples are given. It should quickly learn to classify them with 100% accuracy by simply "memorising" the images.
So I tried to overfit the net for 38 images (35, actually, because 3 are used as validation set) from the Pascal VOC database.
I use binary black=background white=object masks.
Running training for 100 epochs, I still get very poor results - the training loss fluctuates rather heavily around 0.5, as can be seen from the tensor-board plot and the output to screen:
I'm quite certain that this behaviour is quite irregular and that I'm missing something. Do you have an idea what might that be?
The results on the very same training set don't make any sense:
The text was updated successfully, but these errors were encountered: