-
Notifications
You must be signed in to change notification settings - Fork 116
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replacing weight with multiplier #105
base: main
Are you sure you want to change the base?
Conversation
TODO: verify num_epochs # of steps matches what we expect. |
set to 1 epoch with 3000 total samples. |
With this change, we can no longer use non generic datasets (with or without epoch), because the multiplier requires a length associated with the dataset, which only generic datasets allow you to configure. I'll keep this on the sideline until #111 is merged in. |
@@ -1142,10 +1143,12 @@ def __init__( | |||
self._static = static | |||
|
|||
self._stop_strategy = stop_strategy | |||
relative_frequencies = [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think life might be easier if we didn't try to read the multiplier
value out of each dataset class, and instead read it from a config passed to the interleave class, similar to how this is done with probabilities
in HF's interleave_datasets
: https://huggingface.co/docs/datasets/en/package_reference/main_classes#datasets.interleave_datasets
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then we wouldn't have to worry about the interaction with len
, etc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then we'd need to calculate the probabilities by hand in the config? I think we'd need len either way because that's how hf determines the # of steps to take in epoch mode.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, the weights are being set by hand already, right? It just seems a bit strange to have to read the weight/multiplier property from the dataset class, especially when the dataset class doesn't use it internally.
To make running in epoch mode easier, we redefine weights to effectively be
In the interleave class, I redefine the probability of select the next sample using the dataset length and weight to reflect the above (only in epoch mode)