You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thanks for your attention! We never test on the MMbench. I believe this performance may be related to the vision-LLM and the preference data used for DPO training. Also, this repo supports DPO training without needing to load a reward model (please take a look at this script).
Will the mmbench test set score drop after dpo? Does this repo supports dpo without another reward model loaded?
The text was updated successfully, but these errors were encountered: