GitHub - satwik-kundu/CSE597-Project

Analyzing the Performance of BLIP on Image-Text Retrieval Task

To install the dependencies, run

pip install -r requirements.txt

Image-Text Retrieval:

Download Flickr30k dataset from the original websites, and set 'image_root' in configs/retrieval_{dataset}.yaml accordingly.
To evaluate the finetuned BLIP model on Flickr30k, run:

python -m torch.distributed.run --nproc_per_node=1 train_retrieval.py \
--config ./configs/retrieval_flickr.yaml \
--output_dir output/retrieval_flickr \
--evaluate

To finetune the pre-trained checkpoint, first set 'pretrained' in configs/retrieval_flickr.yaml as "https://storage.googleapis.com/sfr-vision-language-research/BLIP/models/model_base.pth". Then run:

python -m torch.distributed.run --nproc_per_node=1 train_retrieval.py \
--config ./configs/retrieval_flickr.yaml \
--output_dir output/retrieval_flickr

Pre-train:

Prepare training json files where each json file contains a list. Each item in the list is a dictonary with two key-value pairs: {'image': path_of_image, 'caption': text_of_image}.
In configs/pretrain.yaml, set 'train_file' as the paths for the json files .

python pretrain.py --config ./configs/Pretrain.yaml --output_dir output/Pretrain

Acknowledgement

The implementation used here solely relies on BLIP code from Salesforce , ALBEF, Huggingface Transformers, and timm. We thank the original authors for their open-sourcing.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
annotation		annotation
configs		configs
data		data
models		models
output		output
transform		transform
BLIP.gif		BLIP.gif
CODEOWNERS		CODEOWNERS
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
LICENSE.txt		LICENSE.txt
README.md		README.md
SECURITY.md		SECURITY.md
cog.yaml		cog.yaml
demo.ipynb		demo.ipynb
eval_nocaps.py		eval_nocaps.py
eval_retrieval_video.py		eval_retrieval_video.py
predict.py		predict.py
pretrain.py		pretrain.py
requirements.txt		requirements.txt
task.png		task.png
train_caption.py		train_caption.py
train_nlvr.py		train_nlvr.py
train_retrieval.py		train_retrieval.py
train_vqa.py		train_vqa.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Analyzing the Performance of BLIP on Image-Text Retrieval Task

Image-Text Retrieval:

Pre-train:

Acknowledgement

About

Releases

Packages

Languages

License

satwik-kundu/CSE597-Project

Folders and files

Latest commit

History

Repository files navigation

Analyzing the Performance of BLIP on Image-Text Retrieval Task

Image-Text Retrieval:

Pre-train:

Acknowledgement

About

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages