Passing a limit doesn't randomly sample, but rather takes dataset[:limit], introducing dataset bias #2598

aalpat1 · 2024-12-27T02:03:29Z

Hi,

I've noticed that the limit passed to evaluate method doesn't randomly select from the dataset, as I would expect, but rather just takes the first n samples where n is the limit

I understand how this works better with cached results, but it does introduce a sampling bias when calculating results, b/c although the guidance is to only use for testing, it is very useful for datasets which have a large amount of samples (e.g. MMLU).

Would it possible to pass an additional param that allows for random sampling instead of first N? Alternatively, pass in a HuggingFace dataset to the evaluate method?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Passing a limit doesn't randomly sample, but rather takes dataset[:limit], introducing dataset bias #2598

Passing a limit doesn't randomly sample, but rather takes dataset[:limit], introducing dataset bias #2598

aalpat1 commented Dec 27, 2024

Passing a limit doesn't randomly sample, but rather takes dataset[:limit], introducing dataset bias #2598

Passing a limit doesn't randomly sample, but rather takes dataset[:limit], introducing dataset bias #2598

Comments

aalpat1 commented Dec 27, 2024