Release Stable release (stronger baseline) for VisDial challenge 2019. · batra-mlp-lab/visdial-challenge-starter-pytorch

Summarizing changes with PR #7:

A few bug fixes and tweaks for a stronger baseline.

This improves MRR from 0.5845 to 0.6155 and NDCG from 0.5070 to 0.5315 on val.

Changes:

Switched off dropout during evaluation on val in train.py.
Shuffling batches during training (shuffle=True to DataLoader).
Explicitly clearing GPU memory cache with torch.cuda.empty_cache(). Negligible time hit on single GPU, and fits batch sizes of up to 32 x no. of GPUs. There's some time gain when training with larger batch sizes.
Added a linear learning rate warm up (https://arxiv.org/abs/1706.02677), followed by multi-step decaying.
Using a multi-layer LSTM + dropout for the decoder.
Switched from dot-product attention to a richer element-wise multiplication + fc layer attention. (The network can learn dot-product attention if it needs to.)

Provide feedback