LoRA for Reward Model Training #225

bugsz · 2024-07-02T01:35:12Z

Is your feature request related to a problem? Please describe.
Hi! I am trying to finetune a reward model in the way that HelpSteer2 did, but run into OOM issue.

Then I found LoRA is supported in SFT, but not supported in reward model training. Is it possible to also use LoRA in reward model training as well? I think it is possible given that the reward model is built upon base model.

Also, I am using vMem estimation here, which states a full finetune of a LLaMA 2-7B model in float16 type takes roughly 60GB. However, when I tried to use a 2*A6000 with 48GB vMem each, I got OOM error. Does anyone have a accurate estimation of the memory usage under different model size?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LoRA for Reward Model Training #225

LoRA for Reward Model Training #225

bugsz commented Jul 2, 2024

LoRA for Reward Model Training #225

LoRA for Reward Model Training #225

Comments

bugsz commented Jul 2, 2024