Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fine tuning issue #17

Open
LinlyAC opened this issue Dec 15, 2016 · 13 comments
Open

fine tuning issue #17

LinlyAC opened this issue Dec 15, 2016 · 13 comments

Comments

@LinlyAC
Copy link

LinlyAC commented Dec 15, 2016

Hello everyone, these day I meet a question which I can't solve, so come here for help.
My work is to finetuning this caffe-heatmap model to adapt my own dataset ,which also have seven upper-body joints. But when I deal with fintuning, I draw two chart, Train loss vs iter and Test Accuracy vs iter. The first chart seems to be normal(that is monotonic decrease), but the accuracy rapidly decrease to zero after few iters.
image

image

That confuses me. In my opinion, the accurary should go up when the train loss go down. So is everyone can help me?
(PS: my work have same needs with the caffe-heatmap, that need 7 upper-body joints,so I don't change the original code, only change the train and test data)
Thanks.

@bazilas
Copy link
Collaborator

bazilas commented Dec 15, 2016

Check out the heatmap predictions. If everything converges to zero, you might need to weight your foreground/background gradients so that they equal contribution to the parameter update.

@LinlyAC
Copy link
Author

LinlyAC commented Dec 15, 2016

@bazilas Thanks for your answer. I have tried your suggestion.
About the heatmap predictions, I check the log file. In this file, the loss_heatmap also go down from 1.00483 to 0
[iter 1]
image
[iter 163]
image
Maybe I mistake your suggestion, I also use this 'disabled' model to run the matlab demo, and each heatmap turn to blue. In addition, all joints coordinates in each frame seem to get together.
image
image

I do not know whether this is what you mean. If this is the mistake, how should I to adjust the foreground/background gradients. Is there some skills?

Thank you again!

@bazilas
Copy link
Collaborator

bazilas commented Dec 15, 2016

you could count the number of foreground / background heatmap pixels (e.g. by thresholding) and balance the gradients accordingly.

@skyz8421
Copy link

skyz8421 commented Jan 5, 2017

I met the same problem. I used the prototxt that the author provided, and my training loss just goes down to 0 very quickly, but the prediction is very bad, and the heatmaps turn out to be blue.
How to do with the problem? How to do some modifications based on the balancing skill?
@bazilas

@power0341
Copy link

Hey guys @bazilas @LinlyAC @samlong-yang, I'm also stuck here, I modified some lines of the the source code in order to train a model predicting several joints location via single depth images. My model just behaves the same way as @LinlyAC and @samlong-yang 's, in training, the loss fell down to near zero from the first iteration, and it only predicts single valued heatmaps. In fact, I tried the Matlab version and resulted the same. It would be very great if someone can offer a tutorial on how to effectively train a model using heatmap regression.

@LinlyAC
Copy link
Author

LinlyAC commented Feb 22, 2017

I am grateful that many people pay attention to this problem. @bazilas @samlong-yang @power0341
I really haven't solved the problem yet, but I have a new idea that might help.
When I try to fine-tuning this model, the data format like that:
image
because this format is also used in readme.txt:
image
However, when I download the example FLIC dataset, the data format like this:
image
It can be clearly seen that the data format is different from the Readme.txt.
To be honest, I don't quite understand the meaning of these decimals, and I don't know if anyone can give me some help, which may help us solve the problem of fine-tuning.

@distant1219
Copy link

Hey, @LinlyAC ,I'm training this project and I use the data which the author provided.But I met same problem.Did you try to train the author's dataset?Hope you reply.

@power0341
Copy link

hi @LinlyAC , if I understand, there are a couple of things that matter. first, we need to normalize the coordinate of joints, for example, (x/w, t/h) or ((c_x-x)/w, (c_y-y)/w), referring to "DeepPose" paper for details, then, we also pay attention to carefully choosing the magnitude of the gaussian so the model really gets converaged.

@distant1219
Copy link

Hello @power0341 @LinlyAC ,in the readme of the project, we should set the 'multfact' is 282 if using preprocessed data from website.That the parameter multiplies joints coordinate which is decimal is the ground truth.If we use our datasets, I think the 'multfact' should be set to 1.Even though, I also can't train a proper model.It's loss becomes very lower at the begin, but it's wrong.What should i do?Want Help!!

@LinlyAC
Copy link
Author

LinlyAC commented Feb 23, 2017

@distant1219 I am really grateful for your help. But I also cannot solve this problem, I am so sorry.

Hi, @tpfister, I am sorry to bother you. But there are some people like me who encountered some problems can not be solved, so we hope you can give us some tips about finetuning this model.

Heartfelt thanks to you.

@kennyjchen
Copy link

I am having the same problem as LinlyAC, and I will help with the search for a solution. Will post back if I figure it out.

@EEWenbinWu
Copy link

The reason why your heatmaps is blue is because the demo's multifacts = 1.
See This.
picture_meitu_1

@kennyjchen
Copy link

kennyjchen commented Mar 16, 2017

Hi everyone,

I managed to solve the problem with the blue heatmap. It seems as if Stochastic Gradient Descent (SGD) was exploding in loss for batchsizes greater than 3, or dropping drastically to zero for batchsizes less than 3. Try changing the type in solver.prototext to AdaGrad or AdaDelta, as seen here:

http://caffe.berkeleyvision.org/tutorial/solver.html

I'm not sure what version of Caffe they implemented this; I had to recompile caffe-heatmap using the newest version of Caffe. After doing this and training on batchsizes of ~25, I am now able to fine tune.

A note on EEWebbinWu's comment: if you follow my steps above, do not put the scalar multiplier as shown; it will not work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants