Skip to content

Commit

Permalink
Evaluate multiple models/datasets/languages using the CLI directly (#56)
Browse files Browse the repository at this point in the history
* Evaluate multiple models/datasets/languages using the CLI directly

* minor fix: forgot import

* output file template, for pretrained and dataset: make sure to not have a "/" in them.

* add multilingual_mscoco_captions to get_dataset_default_task (was missing)

* minor

* update README

* instructions on how to run the benchmark and build the CSV. Remove run.sh, no longer needed.

* support and use multilingual openclip model collection

* Support skipping evaluations that are alread done

* document skip_existing on README

* fix pretrained name for g-14

* add build_csv.py into the main CLI

* add mising args to the test

* update README

* update README

* update README
  • Loading branch information
mehdidc authored Dec 26, 2022
1 parent 06f26c8 commit 79d28fa
Show file tree
Hide file tree
Showing 8 changed files with 368 additions and 137 deletions.
84 changes: 59 additions & 25 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,9 @@ the results are written into a JSON file.

Here is an example for CIFAR-10 zero-shot classification using OpenCLIP's pre-trained model on LAION-400m:

`clip_benchmark --dataset=cifar10 --task=zeroshot_classification --pretrained=laion400m_e32 --model=ViT-B-32-quickgelu --output=result.json --batch_size=64`
`clip_benchmark eval --dataset=cifar10 --task=zeroshot_classification --pretrained=laion400m_e32 --model=ViT-B-32-quickgelu --output=result.json --batch_size=64`

By default, the dataset is downloaded into `--dataset_root`, which by default is `root`.

Here is the content of `result.json` after the evaluation is done:

Expand All @@ -56,11 +58,12 @@ Here is the content of `result.json` after the evaluation is done:
}
```


### VOC2007 example

Here is another example with VOC2007, which is a multi-label classification dataset.

`clip_benchmark --dataset=voc2007_multilabel --task=zeroshot_classification --pretrained=laion400m_e32 --model=ViT-B-32-quickgelu --output=result.json --batch_size=64`
`clip_benchmark eval --dataset=voc2007_multilabel --task=zeroshot_classification --pretrained=laion400m_e32 --model=ViT-B-32-quickgelu --output=result.json --batch_size=64`

Here is the content of `result.json` after the evaluation is done:

Expand All @@ -77,21 +80,16 @@ First, you need to install VTAB's dedicated package.

`pip install task_adaptation==0.1`

The name of the dataset follows the template `vtab/<TASK_NAME>`.
To have the list of the 19 classification tasks using in VTAB, you can use:

`python -c 'from clip_benchmark.datasets.builder import VTAB_19TASKS;print("\n".join(VTAB_19TASKS))'`


Then, you can run it by providing the full dataset name.
Example with `eurosat`:

`clip_benchmark --dataset=vtab/eurosat --task=zeroshot_classification --pretrained=laion400m_e32 --model=ViT-B-32-quickgelu --output=result.json --batch_size=64`

`clip_benchmark eval --dataset=vtab/eurosat --task=zeroshot_classification --pretrained=laion400m_e32 --model=ViT-B-32-quickgelu --output=result.json --batch_size=64`

### TensorFlow dataset example
See [clip_benchmark/datasets/builder.py#L634](clip_benchmark/datasets/builder.py#L634) for the full list of
VTAB dataset collection.


### TensorFlow dataset example

Here is an example on how to run it on [Tensorflow datasets](https://www.tensorflow.org/datasets).
First, you need to install `tfds-nightly` and `timm`.
Expand All @@ -103,22 +101,18 @@ The name of the dataset follows the template `tfds/<DATASET_NAME>`.

Example with `cifar10`:

`clip_benchmark --dataset=tfds/cifar10 --task=zeroshot_classification --pretrained=laion400m_e32 --model=ViT-B-32-quickgelu --output=result.json --batch_size=64`
`clip_benchmark eval --dataset=tfds/cifar10 --task=zeroshot_classification --pretrained=laion400m_e32 --model=ViT-B-32-quickgelu --output=result.json --batch_size=64`


### COCO captions example

Here is an example for COCO captions zero-shot retrieval:

`clip_benchmark --dataset=mscoco_captions --task=zeroshot_retrieval --pretrained=laion400m_e32 --model=ViT-B-32-quickgelu --output=result.json --dataset_root=<PATH_TO_IMAGE_FOLDER> --annotation_file=<PATH_TO_ANNOTATION_FILE> --batch_size=64`
`clip_benchmark eval --dataset=mscoco_captions --task=zeroshot_retrieval --pretrained=laion400m_e32 --model=ViT-B-32-quickgelu --output=result.json --batch_size=64`

(see <https://cocodataset.org/#home> for instructions on how to download)
Note that for using COCO, you also need to install `pycocotools` (e.g., using `pip install pycocotools`).

Note that for using COCO, you also need to install `pycocotools`, using:

`pip install pycocotools`

### Webdataset example
### Webdataset example

Here is an example on how to run it on [webdatasets](https://github.com/webdataset/webdataset).
First, you need to install `webdataset`.
Expand Down Expand Up @@ -162,17 +156,57 @@ The name of the dataset follows the template `wds/<DATASET_NAME>`. Note that the
Example with `cifar10`:

```
$ clip_benchmark --dataset wds/cifar10 --dataset_root ROOT_DIR/wds_cifar10/
$ clip_benchmark --dataset wds/cifar10 --dataset_root https://huggingface.co/datasets/djghosh/wds_cifar10_test/tree/main
$ clip_benchmark eval --dataset wds/cifar10 --dataset_root ROOT_DIR/wds_cifar10/
$ clip_benchmark eval --dataset wds/cifar10 --dataset_root https://huggingface.co/datasets/djghosh/wds_cifar10_test/tree/main
```

All other arguments remain the same as in the other examples.

### API
## Evaluate mulitple models on multiple datasets

For the purpose of benchmarking, it is possible to run the CLI with multiple
pre-trained models on multiple datasets.


### Pretrained models and datasets list as arguments

For models, we can provide list of pretrained model names in the form of 'model,pretrained' (so `model` and `pretrained` are comma separated). For datasets, we can provide a list of datasets. For languages, we can provide a list of languages.
Example:

```bash
clip_benchmark eval --pretrained_model ViT-B-32-quickgelu,laion400m_e32 ViT-L-14,laion400m_e32 \
--dataset cifar10 cifar100 --dataset_root "clip_benchmark_datasets/{dataset}" --language en jp \
--verbose --output "{dataset}_{pretrained}_{model}_{language}_{task}.json"
```

Note that `--dataset_root` and `--output` can be now in the form of a template that depends on the dataset/model/language/task (for `--output`) and dataset name (for `--dataset_root`).

Note that If the benchmark fails at some point, it is possible to resume it by skipping already evaluated models using `--skip_existing`.

### Pretrained models and datasets list as files

We can also provide a path to files with models (each line is in the form of 'model,pretrained' where `model` and `pretrained` are comma separated) and datasets list (one dataset per line):

```bash
clip_benchmark eval --pretrained_model benchmark/models.txt \
--dataset benchmark/datasets.txt --dataset_root "clip_benchmark_datasets/{dataset}" \
--verbose --output "{dataset}_{pretrained}_{model}_{language}_{task}.json"
```

Examples are available in [benchmark/datasets.txt](benchmark/datasets.txt) and [benchmark/models.txt](benchmark/models.txt)

### Model and dataset collections

We can also provide model collection names (`openai`, `openclip_base`, `openclip_multilingual`, `openclip_full` are supported) or dataset collection names (`vtab`, `vtab+`, `retrieval`, `imagenet_robustness` are supported):

```bash
clip_benchmark eval --pretrained_model openai openclip_base --dataset vtab+ retrieval \
--dataset_root "clip_benchmark_datasets/{dataset}" --verbose \
--output "{dataset}_{pretrained}_{model}_{language}_{task}.json"
```

You can also use the API directly. This is especially useful if your model
does not belong to currently supported models.
(TODO)
See [clip_benchmark/models.py#L6](clip_benchmark/models.py#L6) and [clip_benchmark/datasets/builder.py#L634](clip_benchmark/datasets/builder.py#L634) for more information
about the collections.

## Credits

Expand Down
39 changes: 35 additions & 4 deletions benchmark/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,10 +5,41 @@ You can visualize the results in the [notebook](results.ipynb)

# How to reproduce th CLIP benchmark results

To reproduce the results, you need to run

`bash run.sh`
## VTAB+ and retrieval datasets (MS-COCO, Flickr30k, Flickr8k)

this will take a while and create `benchmark.csv` which contains
all the results.
```bash
clip_benchmark eval --pretrained_model openai openclip_base --dataset vtab+ retrieval \
--dataset_root "clip_benchmark_datasets/{dataset}" \
--output "vtab_plus_and_retrieval_{dataset}_{pretrained}_{model}_{language}_{task}.json"
```
(Change `--dataset_root` accordingly)

Once the evaluation finishes, you can construct a CSV with all the results:

```bash
clip_benchmark build vtab_plus_and_retrieval*.json --output=benchmark.csv
```

## Multilingual ImageNet benchmark

To run the multilingual ImageNet benchmark, use:

```bash
clip_benchmark eval --pretrained_model openclip_multilingual openclip_base openai --dataset imagenet1k --language cn it jp en \
--dataset_root "clip_benchmark_datasets/{dataset}" \
--output "multilingual_{dataset}_{pretrained}_{model}_{language}_{task}.json"
```
(Change `--dataset_root` accordingly)

## Multilingual MS-COCO benchmark

To run the multilingual MS-COCO benchmark, use:

```bash
clip_benchmark eval --pretrained_model openclip_multilingual openclip_base openai --dataset multilingual_mscoco_captions --language es it ko pl ru tr zh en \
--dataset_root "clip_benchmark_datasets/{dataset}" \
--output "multilingual_{dataset}_{pretrained}_{model}_{language}_{task}.json"
```

(Change `--dataset_root` accordingly)
14 changes: 0 additions & 14 deletions benchmark/build_csv.py

This file was deleted.

33 changes: 0 additions & 33 deletions benchmark/run.sh

This file was deleted.

Loading

0 comments on commit 79d28fa

Please sign in to comment.