Reproduce the result in Tab.3 #5

Sephirex-X · 2024-10-16T09:25:26Z

Hi, thx for your great work, I have encountered troubles when reproducing result in Tab.3.

I used your script in 'training_analysis/llava/eval_imagewikiqa.sh' to do so. For LLaVA1.5-7B, the script is

python model_vqa.py \
    --model-path /home/huggingface/liuhaotian/llava-v1.5-7b \
    --question-file /home/VLMClassifier/data/imagewikiqa.jsonl  \
    --image-folder /home/VLMClassifier/imagewikiqa \
    --answers-file outputs/llava-v1.5-7b.jsonl \
    --temperature 0 \
    --conv-mode vicuna_v1

After running "training_analysis/llava/eval_imagewikiqa.ipynb" with "outputs/llava-v1.5-7b.jsonl", I got mean of acc ~ 39, which is pretty close to the reported result (denoted as "LLaVA1.5-7B" on right side of Tab.3).

For "LLaVA1.5-7B Finetuned on ImageNet+LLaVA", I merged your projector weights named "imagenet_and_llava_mm_projector.bin" from https://drive.google.com/drive/folders/182yUoLPK9nzKThp6MlzvTSnNpDrP7oXb using script "training_analysis/llava/process_model.ipynb" with "liuhaotian/llava-v1.5-7b". However, the result is around 42, way more lower than the reported one.

Would you please let me know the correct way to reproduced your results?

yuhui-zh15 · 2024-10-16T17:15:38Z

Hi, did you enable these lines?

VLMClassifier/training_analysis/llava/model_vqa.py

Line 75 in 2d3fbf6

############## Enable This Part for ImageWikiQA ##############

Sephirex-X · 2024-10-17T05:05:58Z

Hi, did you enable these lines?

VLMClassifier/training_analysis/llava/model_vqa.py

Line 75 in 2d3fbf6

############## Enable This Part for ImageWikiQA ##############

I haven't enabled that but after enable this, it still did work, only raise from 42.4 (not enable) to 43.7 (enabled)

yuhui-zh15 · 2024-10-17T05:12:35Z

        conv = conv_templates[args.conv_mode].copy()
        conv.append_message(conv.roles[0], qs)
        # conv.append_message(conv.roles[1], None)
        # prompt = conv.get_prompt()

        ############## Enable This Part for ImageWikiQA ##############
        conv.append_message(conv.roles[1], "Let's think step by step.")
        prompt = conv.get_prompt().replace("</s>", "")
        # print(prompt)

Does your code look like this?

Sephirex-X · 2024-10-17T05:32:37Z

        conv = conv_templates[args.conv_mode].copy()
        conv.append_message(conv.roles[0], qs)
        # conv.append_message(conv.roles[1], None)
        # prompt = conv.get_prompt()

        ############## Enable This Part for ImageWikiQA ##############
        conv.append_message(conv.roles[1], "Let's think step by step.")
        prompt = conv.get_prompt().replace("</s>", "")
        # print(prompt)

Does your code look like this?

Here is my code looks like:

        conv = conv_templates[args.conv_mode].copy()
        conv.append_message(conv.roles[0], qs)
        conv.append_message(conv.roles[1], None)
        prompt = conv.get_prompt()

        ############## Enable This Part for ImageWikiQA ##############
        conv.append_message(conv.roles[1], "Let's think step by step.")
        prompt = conv.get_prompt().replace("</s>", "")
        print(prompt)

I'am not sure these two lines can have such huge effect? (49 to 43)

        conv.append_message(conv.roles[1], None)
        prompt = conv.get_prompt()

yuhui-zh15 · 2024-10-17T05:39:43Z

Can you follow the code above and re-run that? LLM is known to be sensitive to prompt format.

Sephirex-X · 2024-10-17T05:42:24Z

Can you follow the code above and re-run that? LLM is known to be sensitive to prompt format.
Yeah sure.

Sephirex-X · 2024-10-17T08:45:10Z

Thx it works pretty well!

Sephirex-X · 2024-10-23T12:07:12Z

Hi~ is it possible to provide your result's json file ? I'm not sure the matching is working properly. Thx a lot~

yuhui-zh15 · 2024-10-24T23:50:33Z

Hi, what do you mean by "the matching is not working properly"?

Sephirex-X · 2024-10-25T05:11:12Z

Hi, what do you mean by "the matching is not working properly"?

When I use "training_analysis/llava/eval_imagewikiqa.ipynb" to get answers from "text", some times it will generate incorrect one.
For example
parse_multi_choice_response( "assistant\nB", ["A", "B", "C", "D"], {"A": "aaaaa", "B": "bbbbb", "C": "ccccc", "D": "ddddd"}, )it returns 'A'. Therefore, the acc result may not be correct in other VLMs.

Sephirex-X closed this as completed Oct 17, 2024

Sephirex-X reopened this Oct 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reproduce the result in Tab.3 #5

Reproduce the result in Tab.3 #5

Sephirex-X commented Oct 16, 2024

yuhui-zh15 commented Oct 16, 2024

Sephirex-X commented Oct 17, 2024

yuhui-zh15 commented Oct 17, 2024

Sephirex-X commented Oct 17, 2024

yuhui-zh15 commented Oct 17, 2024

Sephirex-X commented Oct 17, 2024

Sephirex-X commented Oct 17, 2024

Sephirex-X commented Oct 23, 2024

yuhui-zh15 commented Oct 24, 2024 •

edited

Loading

Sephirex-X commented Oct 25, 2024 •

edited

Loading

Reproduce the result in Tab.3 #5

Reproduce the result in Tab.3 #5

Comments

Sephirex-X commented Oct 16, 2024

yuhui-zh15 commented Oct 16, 2024

Sephirex-X commented Oct 17, 2024

yuhui-zh15 commented Oct 17, 2024

Sephirex-X commented Oct 17, 2024

yuhui-zh15 commented Oct 17, 2024

Sephirex-X commented Oct 17, 2024

Sephirex-X commented Oct 17, 2024

Sephirex-X commented Oct 23, 2024

yuhui-zh15 commented Oct 24, 2024 • edited Loading

Sephirex-X commented Oct 25, 2024 • edited Loading

yuhui-zh15 commented Oct 24, 2024 •

edited

Loading

Sephirex-X commented Oct 25, 2024 •

edited

Loading