Webdataset updates #75

djghosh13 · 2023-01-31T22:33:51Z

(Mostly) addresses issues #52 and #67

Support for conversion and evaluation of retrieval datasets (clip_benchmark_export_wds --retrieval)
More complete API for converting other classification/retrieval datasets, with import clip_benchmark.webdataset_builder)
(Not in commit) All VTAB+ and retrieval datasets have been uploaded to HF
Readmes have been updated appropriately, default suggestion in benchmark/README.md is to use webdataset

Not completed:

Multilingual support, i.e., overriding default classnames & templates with other languages
Converting voc2007_multilabel

djghosh13 · 2023-01-31T22:41:34Z

Note: I've only tested the benchmark code with a single model, so I haven't run the complete experiments. There are some minor differences in numbers from what @rom1504 gave me, but no differences from what I get with the original datasets when I run them myself.

rom1504 · 2023-02-01T00:16:56Z

Nice, will check it out

mehdidc · 2023-02-01T09:50:20Z

Really cool, thanks @djghosh13! For differences in numbers, might be related to this issue #59

djghosh13 · 2023-02-01T18:18:57Z

I see, yeah, I think the differences I saw were also in the 0.001s range.

rom1504 · 2023-02-02T22:56:45Z

Are datasets getting cached locally? Where and is it tweakable ?

djghosh13 · 2023-02-02T23:17:54Z

I forgot to add this to the readme. By default, no, but there is a new --wds_cache_dir parameter in the CLI which is directly passed to Webdataset(cache_dir=) if a path is given.

djghosh13 · 2023-02-02T23:31:07Z

I hadn't actually tested it before, but it looks like it will save the .tar files inside the specified cache directory in a subdirectory that's named like datasets_clip-benchmark_wds_vtab-cifar10_resolve_main_test (for example)

rom1504 · 2023-02-02T23:36:52Z

ok if it doesn't by default, it's good

rom1504 · 2023-02-02T23:37:05Z

I'll test this

rom1504 · 2023-02-03T00:09:21Z

ok so one thing here
not caused by your pr, but I really think we should put the main eval command in the readme at the beginning, not hidden in benchmark/ folder

rom1504 · 2023-02-03T00:25:02Z

#76 a minor point, but quite nice for UX

rom1504 · 2023-02-03T00:32:52Z

yeah this is much faster than the file based option

will run to the end and compare numbers, if all good will merge

rom1504 · 2023-02-03T22:46:54Z

ok it does work, let's go

djghosh13 added 7 commits January 13, 2023 22:39

WIP better wds creation and loading

0d88674

Added all datasets except multilabel VOC

cccd1a1

Update README, and use vtab/ instead of vtab-

46f3545

Merge branch 'main' of github.com:LAION-AI/CLIP_benchmark

feb8962

Updated main README and docstring

215996f

Removed extra test script

2e80b89

Updated tests with new CLI argument

8ba0619

rom1504 merged commit 2de524d into LAION-AI:main Feb 3, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Webdataset updates #75

Webdataset updates #75

djghosh13 commented Jan 31, 2023

djghosh13 commented Jan 31, 2023

rom1504 commented Feb 1, 2023

mehdidc commented Feb 1, 2023

djghosh13 commented Feb 1, 2023

rom1504 commented Feb 2, 2023

djghosh13 commented Feb 2, 2023

djghosh13 commented Feb 2, 2023

rom1504 commented Feb 2, 2023

rom1504 commented Feb 2, 2023

rom1504 commented Feb 3, 2023

rom1504 commented Feb 3, 2023

rom1504 commented Feb 3, 2023

rom1504 commented Feb 3, 2023

Webdataset updates #75

Webdataset updates #75

Conversation

djghosh13 commented Jan 31, 2023

djghosh13 commented Jan 31, 2023

rom1504 commented Feb 1, 2023

mehdidc commented Feb 1, 2023

djghosh13 commented Feb 1, 2023

rom1504 commented Feb 2, 2023

djghosh13 commented Feb 2, 2023

djghosh13 commented Feb 2, 2023

rom1504 commented Feb 2, 2023

rom1504 commented Feb 2, 2023

rom1504 commented Feb 3, 2023

rom1504 commented Feb 3, 2023

rom1504 commented Feb 3, 2023

rom1504 commented Feb 3, 2023