Fix : Small fixes in DPO trainer args in DPO notebook #120

ash-01xor · 2024-12-18T14:41:15Z

Changes Made

Small fixes in the parameters present in the DPO trainer. While training the SmolLM instruct model on the "trl-lib/ultrafeedback_binarized dataset , using the following arguments which were present

beta=0.1,
# Maximum length of the input prompt in tokens
max_prompt_length=1024,
# Maximum combined length of prompt + response in tokens
max_length=1536

resulted in unexpected keyword argument errors.

Felt it would be better if the user can modify it based on their need and dataset used , rather than these arguments present by default and resulting in errors.

ash-01xor · 2024-12-20T12:55:08Z

@burtenshaw can you take a look at this

Fix errors in DPO trainer args

65ca2eb

ash-01xor changed the title ~~Fix : Small fix errors in DPO trainer args in DPO notebook~~ Fix : Small fixes in DPO trainer args in DPO notebook Dec 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix : Small fixes in DPO trainer args in DPO notebook #120

Fix : Small fixes in DPO trainer args in DPO notebook #120

ash-01xor commented Dec 18, 2024

ash-01xor commented Dec 20, 2024

Fix : Small fixes in DPO trainer args in DPO notebook #120

Are you sure you want to change the base?

Fix : Small fixes in DPO trainer args in DPO notebook #120

Conversation

ash-01xor commented Dec 18, 2024

Changes Made

ash-01xor commented Dec 20, 2024