-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Other Datasets Problem #9
Comments
and the final training results are relatively poor. I suspect this is because the logits have not been trained well. plz |
If you are trying to apply this method for from-scratch training (without any pre-trained weight), it would be difficult to optimize. I recently released a new probabilistic VLM project for from-scratch training: Probabilistic Language-Image Pre-Training There is no full training code yet, but you can easily implement the new loss function: If you need to use PCME++ loss for from-scratch training, you will need additional deterministic loss for a stable convergence. As shown in my new paper |
Thank you very much for your help! I have tried the new method you suggested, but unfortunately, I am encountering an issue where the values in sigma_pdist remain abnormal and the distribution is very concentrated. This phenomenon has not improved during the training process. I am wondering if this could be related to the data dimensions. In my dataset, both the mean and log variance are encoded with the shape (Batchsize, Dim), specifically (256, 512). I would appreciate your thoughts on whether the dimensionality could be contributing to this issue. |
Dear Author, I attempted to apply this method on other datasets; however I have observed that the mu_pdist、sigma_pdist and logits distributions are very concentrated during training , even though the distributions of mean and std themselves seem fine.
The text was updated successfully, but these errors were encountered: