-
Notifications
You must be signed in to change notification settings - Fork 158
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
There is a long gap between the validation accuracy of the dataset of vlmevalkit and the model paper #94
Comments
Hi, @YongLD , |
Hello, I find that for TextVQA dataset, LLaVA evaluation with with reference token like: |
@kennymckormick Can we use the azure openai key in VlmEvalKit? How can I change the base_url of azure? |
Currently, VLMEvalKit does not support openai api key (cuz I do not have azure api access can cannot debug). You can follow the azure doc to support an azure openai wrapper in VLMEvalKit. |
I've noticed that most of the results are dissimilar compared to those in the research paper. I believe that the framework that's in use should be rectified, making it more corresponding with the original results. |
Hi, @geknow , |
On the TextVQA dataset, the paper in
Instructblip 13b
indicates that its precision is50.7
, and the paper inQwen VL Chat
shows an accuracy of63.75
.In terms of the accuracy measured by the vlmevalkit official, the accuracy of
Instructblip 13b
is about30
, and the accuracy ofQWEN VL Chat
is10.5
, what do you think is the problem?Also, I tested the accuracy of Instructblip 13b on textVQA and found that I ran with an accuracy of 16.7, what went wrong? These are all the results of prefech, and GPT is not used.
The text was updated successfully, but these errors were encountered: