You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I've noticed that the limit passed to evaluate method doesn't randomly select from the dataset, as I would expect, but rather just takes the first n samples where n is the limit
I understand how this works better with cached results, but it does introduce a sampling bias when calculating results, b/c although the guidance is to only use for testing, it is very useful for datasets which have a large amount of samples (e.g. MMLU).
Would it possible to pass an additional param that allows for random sampling instead of first N? Alternatively, pass in a HuggingFace dataset to the evaluate method?
The text was updated successfully, but these errors were encountered:
Hi,
I've noticed that the limit passed to evaluate method doesn't randomly select from the dataset, as I would expect, but rather just takes the first n samples where n is the limit
I understand how this works better with cached results, but it does introduce a sampling bias when calculating results, b/c although the guidance is to only use for testing, it is very useful for datasets which have a large amount of samples (e.g. MMLU).
Would it possible to pass an additional param that allows for random sampling instead of first N? Alternatively, pass in a HuggingFace dataset to the evaluate method?
The text was updated successfully, but these errors were encountered: