Multimodal Retrieval Using a GAN Text Encoder

Dependencies

We recommended to use Anaconda for the following packages.

Python 2.7
PyTorch (>0.1.12)
NumPy (>1.12.1)
TensorBoard
Punkt Sentence Tokenizer:

import nltk
nltk.download()
> d punkt

Download data

Download the dataset files and pre-trained models. We use splits produced by Andrej Karpathy. To use full image encoders, download the images from their original sources here, here and here.

wget http://lsa.pucrs.br/jonatas/seam-data/irv2_precomp.tar.gz
wget http://lsa.pucrs.br/jonatas/seam-data/resnet152_precomp.tar.gz
wget http://lsa.pucrs.br/jonatas/seam-data/vocab.tar.gz

** Models not avaiable yet.

Training new models

Run train.py:

python train.py --data_name resnet152_precomp --logger_name runs/model --text_encoder gru --max_violation --lr_update 10 --learning_rate 1e-4 --resume /models/txt_enc.tar --resume2 models/txt_enc_epoch_600.pth

Evaluate pre-trained models

from vocab import Vocabulary
import evaluation
evaluation.evalrank("$RUN_PATH/model_best.pth.tar", data_path="$DATA_PATH", split="test", fold5=True)'

To do cross-validation on MSCOCO, pass fold5=True with a model trained using --data_name coco.

Reference

If you found this code useful, please cite the following papers:

@article{wehrmann2018fast,
  title={Fast Self-Attentive Multimodal Retrieval},
  author={Wehrmann, Jônatas and Armani, Maurício and More, Martin D. and Barros, Rodrigo C.},
  journal={IEEE Winter Conf. on Applications of Computer Vision (WACV'18)},
  year={2018}
}

@article{faghri2017vse++,
  title={VSE++: Improved Visual-Semantic Embeddings},
  author={Faghri, Fartash and Fleet, David J and Kiros, Jamie Ryan and Fidler, Sanja},
  journal={arXiv preprint arXiv:1707.05612},
  year={2017}
}

License

Apache License 2.0

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
data.py		data.py
evaluation.py		evaluation.py
file_utils.py		file_utils.py
layers.py		layers.py
model.py		model.py
preprocessing.py		preprocessing.py
pretrain.py		pretrain.py
text_encoders.py		text_encoders.py
tokenizers.py		tokenizers.py
train.py		train.py
vocab.py		vocab.py
vocab.tar		vocab.tar

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Multimodal Retrieval Using a GAN Text Encoder

Dependencies

Download data

Training new models

Evaluate pre-trained models

Reference

License

About

Releases

Packages

Languages

License

mauricioarmani/seam-retrieval

Folders and files

Latest commit

History

Repository files navigation

Multimodal Retrieval Using a GAN Text Encoder

Dependencies

Download data

Training new models

Evaluate pre-trained models

Reference

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages