You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
A descriptive title: How to reproduce performance in Scienceqa_img for Qwen_VL?
Can't reproduce performance in default config.
Hi, Thank you for this great project!
When I used the default configuration to reproduce performance in Scienceqa_img for Qwen_VL, I got a worse accuracy, only 46.7...
Here is my command: python lmms_eval/__main__.py --model qwen_vl --model_args pretrained="Qwen/Qwen-VL" --tasks scienceqa_img --batch_size 1 --log_samples --log_samples_suffix qwenvl_scienceqa_img --output_path ./logs/
and then, I got the result:
Tasks
Version
Filter
n-shot
Metric
Value
Stderr
scienceqa_img
Yaml
none
0
exact_match
0.467
±
0.0111
Maybe, the prompt does not match? I also printed some results, and I found there is some redundant information (explanation: .......), like this:
Model Responding: 0%|▎ | 4/2017 [00:01<12:17, 2.73it/s]text_outputs: B
Explanation: The passage says that Ernest put a parachute with a 1
Model Responding: 0%|▎ | 5/2017 [00:02<14:32, 2.31it/s]text_outputs: B
Explanation: The passage says that Gordon put a parachute with a 1
Model Responding: 0%|▍ | 6/2017 [00:02<15:55, 2.11it/s]text_outputs: B
Model Responding: 0%|▌ | 7/2017 [00:03<12:50, 2.61it/s]text_outputs: A
Model Responding: 0%|▌ | 8/2017 [00:03<10:50, 3.09it/s]text_outputs: B
Explanation: The passage says that Sebastian put a parachute with a 1
Model Responding: 0%|▋ | 9/2017 [00:03<13:14, 2.53it/s]text_outputs: A
Model Responding: 0%|▋ | 10/2017 [00:04<11:09, 3.00it/s]text_outputs: C
So, has anyone had these problems?
Thanks!
The text was updated successfully, but these errors were encountered:
xiaochengsky
changed the title
How to produce performance in Scienceqa_img for Qwen_VL?
How to reproduce original performance in Scienceqa_img for Qwen_VL?
Oct 29, 2024
xiaochengsky
changed the title
How to reproduce original performance in Scienceqa_img for Qwen_VL?
I can't reproduce original performances in some datasets for Qwen_VL.
Oct 29, 2024
xiaochengsky
changed the title
I can't reproduce original performances in some datasets for Qwen_VL.
I can't reproduce many original performances for Qwen_VL. Who can help me?
Oct 29, 2024
By default, we use chat template to help us format the prompt for Qwen_VL. In previous development, we did some tries to align the result with ai2d for Qwen_VL by changing some prompt. However, it is not possible for every dataset and we do not feel this is the correct way or fair for evaluation, which is also the initiative for this project. If you want to exactly match the result, you may need to change how we format context in the Qwen_VL to their eval scripts in here
Hi, Thank you for this great project!
When I used the default configuration to reproduce performance in Scienceqa_img for Qwen_VL, I got a worse accuracy, only 46.7...
Here is my command:
python lmms_eval/__main__.py --model qwen_vl --model_args pretrained="Qwen/Qwen-VL" --tasks scienceqa_img --batch_size 1 --log_samples --log_samples_suffix qwenvl_scienceqa_img --output_path ./logs/
and then, I got the result:
Maybe, the prompt does not match? I also printed some results, and I found there is some redundant information (explanation: .......), like this:
Model Responding: 0%|▎ | 4/2017 [00:01<12:17, 2.73it/s]text_outputs: B
Explanation: The passage says that Ernest put a parachute with a 1
Model Responding: 0%|▎ | 5/2017 [00:02<14:32, 2.31it/s]text_outputs: B
Explanation: The passage says that Gordon put a parachute with a 1
Model Responding: 0%|▍ | 6/2017 [00:02<15:55, 2.11it/s]text_outputs: B
Model Responding: 0%|▌ | 7/2017 [00:03<12:50, 2.61it/s]text_outputs: A
Model Responding: 0%|▌ | 8/2017 [00:03<10:50, 3.09it/s]text_outputs: B
Explanation: The passage says that Sebastian put a parachute with a 1
Model Responding: 0%|▋ | 9/2017 [00:03<13:14, 2.53it/s]text_outputs: A
Model Responding: 0%|▋ | 10/2017 [00:04<11:09, 3.00it/s]text_outputs: C
So, has anyone had these problems?
Thanks!
The text was updated successfully, but these errors were encountered: