There is a long gap between the validation accuracy of the dataset of vlmevalkit and the model paper #94

YongLD · 2024-02-23T11:35:14Z

On the TextVQA dataset, the paper in Instructblip 13b indicates that its precision is 50.7, and the paper in Qwen VL Chat shows an accuracy of 63.75.
In terms of the accuracy measured by the vlmevalkit official, the accuracy of Instructblip 13b is about 30, and the accuracy of QWEN VL Chat is 10.5, what do you think is the problem?
Also, I tested the accuracy of Instructblip 13b on textVQA and found that I ran with an accuracy of 16.7, what went wrong? These are all the results of prefech, and GPT is not used.

The text was updated successfully, but these errors were encountered:

kennymckormick · 2024-02-24T12:33:53Z

Hi, @YongLD ,
Actually, the support of VQA datasets is still in progress (we only share some preliminary results for now). We still cannot obtain the corresponding accuracies reported by the VLM papers. Potential reasons might be different prompt used or different inference hyper parameters.

John-Ge · 2024-02-25T07:19:02Z

Hello, I find that for TextVQA dataset, LLaVA evaluation with with reference token like:
What kind of beer is this?\nReference OCR token: NINK, NK, BOWING, CC, STON, SUE, ED, Sublimely, SELF, ELF-RICHEE, swAaVd, KGy, ALE\nAnswer the question using a single word or phrase.
in VLMEvalKit does not apply that token, would you like to add an option for users to choose to add the reference tokens or not?

YongLD · 2024-02-28T07:02:08Z

@kennymckormick Can we use the azure openai key in VlmEvalKit? How can I change the base_url of azure?

kennymckormick · 2024-02-28T13:03:38Z

@kennymckormick Can we use the azure openai key in VlmEvalKit? How can I change the base_url of azure?

Currently, VLMEvalKit does not support openai api key (cuz I do not have azure api access can cannot debug). You can follow the azure doc to support an azure openai wrapper in VLMEvalKit.

geknow · 2024-03-13T02:28:36Z

I've noticed that most of the results are dissimilar compared to those in the research paper. I believe that the framework that's in use should be rectified, making it more corresponding with the original results.

kennymckormick · 2024-03-13T02:47:02Z

I've noticed that most of the results are dissimilar compared to those in the research paper. I believe that the framework that's in use should be rectified, making it more corresponding with the original results.

Hi, @geknow ,
We understand the known issues with the preliminary results on VQA tasks might be misleading. We have know removed these results and will re-upload them once ready.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

There is a long gap between the validation accuracy of the dataset of vlmevalkit and the model paper #94

There is a long gap between the validation accuracy of the dataset of vlmevalkit and the model paper #94

YongLD commented Feb 23, 2024

kennymckormick commented Feb 24, 2024

John-Ge commented Feb 25, 2024

YongLD commented Feb 28, 2024

kennymckormick commented Feb 28, 2024

geknow commented Mar 13, 2024

kennymckormick commented Mar 13, 2024

There is a long gap between the validation accuracy of the dataset of vlmevalkit and the model paper #94

There is a long gap between the validation accuracy of the dataset of vlmevalkit and the model paper #94

Comments

YongLD commented Feb 23, 2024

kennymckormick commented Feb 24, 2024

John-Ge commented Feb 25, 2024

YongLD commented Feb 28, 2024

kennymckormick commented Feb 28, 2024

geknow commented Mar 13, 2024

kennymckormick commented Mar 13, 2024