Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reproduce the result in Tab.3 #5

Open
Sephirex-X opened this issue Oct 16, 2024 · 10 comments
Open

Reproduce the result in Tab.3 #5

Sephirex-X opened this issue Oct 16, 2024 · 10 comments

Comments

@Sephirex-X
Copy link

Hi, thx for your great work, I have encountered troubles when reproducing result in Tab.3.

I used your script in 'training_analysis/llava/eval_imagewikiqa.sh' to do so. For LLaVA1.5-7B, the script is

python model_vqa.py \
    --model-path /home/huggingface/liuhaotian/llava-v1.5-7b \
    --question-file /home/VLMClassifier/data/imagewikiqa.jsonl  \
    --image-folder /home/VLMClassifier/imagewikiqa \
    --answers-file outputs/llava-v1.5-7b.jsonl \
    --temperature 0 \
    --conv-mode vicuna_v1

After running "training_analysis/llava/eval_imagewikiqa.ipynb" with "outputs/llava-v1.5-7b.jsonl", I got mean of acc ~ 39, which is pretty close to the reported result (denoted as "LLaVA1.5-7B" on right side of Tab.3).

For "LLaVA1.5-7B Finetuned on ImageNet+LLaVA", I merged your projector weights named "imagenet_and_llava_mm_projector.bin" from https://drive.google.com/drive/folders/182yUoLPK9nzKThp6MlzvTSnNpDrP7oXb using script "training_analysis/llava/process_model.ipynb" with "liuhaotian/llava-v1.5-7b". However, the result is around 42, way more lower than the reported one.

Would you please let me know the correct way to reproduced your results?

@yuhui-zh15
Copy link
Owner

Hi, did you enable these lines?

############## Enable This Part for ImageWikiQA ##############

@Sephirex-X
Copy link
Author

Hi, did you enable these lines?

############## Enable This Part for ImageWikiQA ##############

I haven't enabled that but after enable this, it still did work, only raise from 42.4 (not enable) to 43.7 (enabled)

@yuhui-zh15
Copy link
Owner

        conv = conv_templates[args.conv_mode].copy()
        conv.append_message(conv.roles[0], qs)
        # conv.append_message(conv.roles[1], None)
        # prompt = conv.get_prompt()

        ############## Enable This Part for ImageWikiQA ##############
        conv.append_message(conv.roles[1], "Let's think step by step.")
        prompt = conv.get_prompt().replace("</s>", "")
        # print(prompt)

Does your code look like this?

@Sephirex-X
Copy link
Author

        conv = conv_templates[args.conv_mode].copy()
        conv.append_message(conv.roles[0], qs)
        # conv.append_message(conv.roles[1], None)
        # prompt = conv.get_prompt()

        ############## Enable This Part for ImageWikiQA ##############
        conv.append_message(conv.roles[1], "Let's think step by step.")
        prompt = conv.get_prompt().replace("</s>", "")
        # print(prompt)

Does your code look like this?

Here is my code looks like:

        conv = conv_templates[args.conv_mode].copy()
        conv.append_message(conv.roles[0], qs)
        conv.append_message(conv.roles[1], None)
        prompt = conv.get_prompt()

        ############## Enable This Part for ImageWikiQA ##############
        conv.append_message(conv.roles[1], "Let's think step by step.")
        prompt = conv.get_prompt().replace("</s>", "")
        print(prompt)

I'am not sure these two lines can have such huge effect? (49 to 43)

        conv.append_message(conv.roles[1], None)
        prompt = conv.get_prompt()

@yuhui-zh15
Copy link
Owner

Can you follow the code above and re-run that? LLM is known to be sensitive to prompt format.

@Sephirex-X
Copy link
Author

Can you follow the code above and re-run that? LLM is known to be sensitive to prompt format.
Yeah sure.

@Sephirex-X
Copy link
Author

Thx it works pretty well!

@Sephirex-X
Copy link
Author

Hi~ is it possible to provide your result's json file ? I'm not sure the matching is working properly. Thx a lot~

@Sephirex-X Sephirex-X reopened this Oct 23, 2024
@yuhui-zh15
Copy link
Owner

yuhui-zh15 commented Oct 24, 2024

Hi, what do you mean by "the matching is not working properly"?

@Sephirex-X
Copy link
Author

Sephirex-X commented Oct 25, 2024

Hi, what do you mean by "the matching is not working properly"?

When I use "training_analysis/llava/eval_imagewikiqa.ipynb" to get answers from "text", some times it will generate incorrect one.
For example
parse_multi_choice_response( "assistant\nB", ["A", "B", "C", "D"], {"A": "aaaaa", "B": "bbbbb", "C": "ccccc", "D": "ddddd"}, )it returns 'A'. Therefore, the acc result may not be correct in other VLMs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants