Skip to content
/ VQA Public

VQA Model trained on yes/no questions part of COCO dataset.

Notifications You must be signed in to change notification settings

AYaddaden/VQA

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Visual Question Answering (VQA)

A simple implementation of a model used to solve a VQA task. The model is restricted to only answer 'yes/no' questions. The implementation is not meant to be competitive, it is part of a Deep Learning project given to students as an excellent to combine their knowledge on NLP and Image processing using deep learning.

List of used packages

  • Pytorch v 1.7
  • torchvision
  • Huggingface transformers
  • Pandas

Model description

The model is made of 3 parts : (i) an image feature extraction part that consists of resnet18 pretrained model, (ii) an BERT model for encoding the words in the question followed by a LSTM cell to encode the question, (iii) a MLP classifier with 2 hidden linear layers and relu activation function and a dropout layer in between.

The image and questions representations are passed into 2 linear layers (one for each representation) to project the representations to a space with the same dimentions. The 2 projections are then concatenated and passed to the classifier.

About

VQA Model trained on yes/no questions part of COCO dataset.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages