Skip to content
/ MIRTT Public

[EMNLP 2021] MIRTT: Learning Multimodal Interaction Representations from Trilinear Transformers for Visual Question Answering

Notifications You must be signed in to change notification settings

IIGROUP/MIRTT

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MIRTT

This repository is the implementation of MIRTT: Learning Multimodal Interaction Representations from Trilinear Transformers for Visual Question Answering.

Data Source

VQA2.0, GQA: LXMERT

visual7w, TDIUC: CTI

VQA1.0: VQA web

Pretrain

Under ./pretrain:

bash run.bash exp_name gpuid

Some parameters can be changed in run.bash.

MC VQA

Under ./main:

bash run.bash exp_name gpuid

FFOE VQA

Two stage workflow

Stage one: bilinear model (BAN, SAN, MLP)

Under ./bilinear_method:

bash run.bash exp_name gpuid mod dataset model

After training, we can generate answer list for each dataset. In this way, we simplify FFOE VQA into MC VQA.

Stage two: MIRTT. Under ./main


keep updating

About

[EMNLP 2021] MIRTT: Learning Multimodal Interaction Representations from Trilinear Transformers for Visual Question Answering

Resources

Stars

Watchers

Forks