-
Notifications
You must be signed in to change notification settings - Fork 82
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
put all datasets in wds #52
Comments
Will leave a note here to also add an option to download (cache) when loading from HF. Mainly, I need to add another CLI option/parameter somewhere, since both source URL (currently specified by --dataset_root) and destination download path should be specified. |
Another related note: adding some form of support for different languages of classnames and templates to avoid needing to duplicate the whole dataset for multilingual eval |
@djghosh13 would definitely be great to do this issue if you're still interested |
+1 for wds! I tried HF datasets for images before, but somehow didn't like that much. Maybe because Arrow wasn't very flexible / intuitive for images. |
Definitely still interested! What are your thoughts on the implementation of this point?
Everything else should be straightforward once I actually get around to it. |
We already have support for that, check the code Currently the path that was taken is to put the other languages prompt and classnames directly in this repo |
Hm, yeah, I guess do we want to use the same procedure for wds? Currently, the wds loader expects a classnames.txt and zeroshot_classification_templates.txt in the same folder/HF repo as the data. |
I think it makes sense to put in HF/wds the same as in the original source, so the English classnames and prompts for most datasets Can think of it as an override The reasoning is that we cannot add more things in the original source, so doing it that way (rather than adding more languages in all the wds) will keep the source of truth in a single place (clip benchmark repo) for both original source and wds formats. |
done |
follow up of #47
The text was updated successfully, but these errors were encountered: