Note: The original code for our paper "Fast Self-attentive Multimodal Retrieval" is protected. For providing a public version, we forked this code from: https://github.com/fartashf/vsepp/ and adapted it by adding the self-attentive mechanism along with the main proposed methods.
We recommended to use Anaconda for the following packages.
import nltk
nltk.download()
> d punkt
Download the dataset files and pre-trained models. We use splits produced by Andrej Karpathy. To use full image encoders, download the images from their original sources here, here and here.
wget http://lsa.pucrs.br/jonatas/seam-data/irv2_precomp.tar.gz
wget http://lsa.pucrs.br/jonatas/seam-data/resnet152_precomp.tar.gz
wget http://lsa.pucrs.br/jonatas/seam-data/vocab.tar.gz
We refer to the path of extracted files for *_precomp.tar.gz
as $DATA_PATH
and
files for models.tar.gz
(models are coming up soon) as $RUN_PATH
. Extract vocab.tar.gz
to ./vocab
directory.
Run train.py
:
python train.py --data_path "$DATA_PATH" --data_name irv2_precomp --logger_name
runs/seam-e/irv2_precomp/
Arguments used to train pre-trained models:
Method | Arguments |
---|---|
SEAM-E | --text_encoder seam-e --att_units 300 --att_hops 30 --att_coef 0.5 --measure order --use_abs |
SEAM-C | --text_encoder seam-c --att_units 300 --att_hops 10 --att_coef 0.5 --measure order --use_abs |
SEAM-G | --text_encoder seam-g --att_units 300 --att_hops 30 --att_coef 0.5 --measure order --use_abs |
Order | --text_encoder gru |
Available text encoders:
- SEAM-E (
seam-e
): Self-attention directly over word-embeddings - SEAM-C (
seam-c
): Self-attention over two parallel convolutional layers and over the word inputs. - SEAM-G (
seam-g
): GRU + Self-attention
Note that some default arguments in this repository are different from the original one:
--learning_rate .001 --margin .05
from vocab import Vocabulary
import evaluation
evaluation.evalrank("$RUN_PATH/model_best.pth.tar", data_path="$DATA_PATH", split="test")'
To do cross-validation on MSCOCO, pass fold5=True
with a model trained using
--data_name coco
.
[Coming up soon] Results achieved using this repository (COCO-1cv test set):
Method | Features | R@1 | R@10 | R@1 | R@10 |
---|---|---|---|---|---|
SEAM-E | resnet152_precomp |
||||
SEAM-C | resnet152_precomp |
||||
SEAM-G | resnet152_precomp |
If you found this code useful, please cite the following papers:
@article{wehrmann2018fast,
title={Fast Self-Attentive Multimodal Retrieval},
author={Wehrmann, Jônatas and Armani, Maurício and More, Martin D. and Barros, Rodrigo C.},
journal={IEEE Winter Conf. on Applications of Computer Vision (WACV'18)},
year={2018}
}
@article{faghri2017vse++,
title={VSE++: Improved Visual-Semantic Embeddings},
author={Faghri, Fartash and Fleet, David J and Kiros, Jamie Ryan and Fidler, Sanja},
journal={arXiv preprint arXiv:1707.05612},
year={2017}
}