Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Does anyone succeed on imagenet? #54

Open
HuaZheLei opened this issue Aug 11, 2018 · 5 comments
Open

Does anyone succeed on imagenet? #54

HuaZheLei opened this issue Aug 11, 2018 · 5 comments

Comments

@HuaZheLei
Copy link

I tried some small Ps, some small Ks and many learning rates. But I always get a loss of 0.693. Anyone can share his/her experience on imagenet?

@HuaZheLei HuaZheLei changed the title Does anyone succeed onimagenet? Does anyone succeed on imagenet? Aug 11, 2018
@lucasb-eyer
Copy link
Member

Hi, it's a great question and while I have ImageNet experience, I have never tried triplet there.

For the hardest dataset I had, I needed to use P=2,K=2 until the network got past those 0.693 and then gradually increase them. Are you able to converge with P=2, K=2?

Alternatively, you can try the batch_sample version of the loss that I implemented in the sampling branch (but did not get the time to merge yet, see #33.) I have very good experience with it on very large datasets, but haven't tried it on ImageNet either.

I would love to hear about your experiences trying these!

@ergysr
Copy link

ergysr commented Aug 14, 2018

From my experience (not on ImageNet) the loss doesn't get past the margin when either sample difficulty or label noise is high. Few alternatives are the batch_sample version that Lucas suggests, the weighted triplet version, or a multi-task loss (e.g. hard triplet combined with categorical cross-entropy). I have tried the last two and they work well with hard/noisy examples. So you have at least three options to try.

@willard-yuan
Copy link

I have trained with P=2,K=2 on Tiny-Imagenet-200, and the loss doesn't converge. Using the batch_sample first and then batch_hard works well.

@lucasb-eyer
Copy link
Member

Thanks for the feedback @willard-yuan, that's useful feedback.

@mazatov
Copy link

mazatov commented Mar 4, 2020

I just came across this discussion. What's the significance of 0.693 number? @HuaZheLei @ergysr

I was trying to train on Market1501 but with fc1024_normalize. The model converged for fc1024 but got stuck at 0.693 for the normalized version. I had it previously stuck at 0.693 for other datasets too. 🤷‍♂

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants