Exploring Human-like Attention Supervision in Visual Question Answering

Here we provide HLAT Dataset proposed in paper Exploring Human-like Attention Supervision in Visual Question Answering, which has been accepted by AAAI-2018.

In this work, we propose a Human Attention Network (HAN) to predict the attention map for a given image-question pair.

There are some examples that generated by the HAN.

We improve the performance of attention-based VQA models by adding human-like attention supervision.

Our method shows good performance in improving the accuracy of VQA, especially in counting problem, e.g.How many candles are on the table? For more details, please refer to our paper.

HLAT Dataset

Here we provide attention maps generated by the HAN for both the VQA1.0 and the VQA2.0 dataset.

They are saved in .h5 files.

The .h5 files of attention maps can be downloade from here

The .h5 file format has the following data structure: { "pre_attmap" : attention maps for all question id }

Which means that in each .h5 file, there is only one dict, whose key is named as "pre_attmap". The order of the attention maps is as same as the order of the question ids in the file. Therefore we use the order of question ids to get the attention maps for the question-image pairs. The order of question id follows the VQA 1.0 and VQA 2.0 official datasets website.

For VQA1.0 dataset, there are:

369,861 attention maps for question-image pairs in the trainval set
244,302 attention maps for question-image pairs in the testing set
60,864 attention maps for question-image pairs in the test-dev set

For VQA2.0 dataset, there are:

658,111 attention maps for question-image pairs in the trainval set
447,793 attention maps for question-image pairs in the testing set
107,394 attention maps for question-image pairs in the test-dev set

VQA-HAT Dataset

Here we also provide the link of the VQA-HAT Dataset, which is from paper Human Attention in Visual Question Answering: Do Humans and Deep Networks Look at the Same Regions?.

Cite

We would be very appreciated if you cite our work:
@inproceedings{qiao2018exploring,
title={Exploring human-like attention supervision in visual question answering},
author={Qiao, Tingting and Dong, Jianfeng and Xu, Duanqing},
booktitle={Thirty-Second AAAI Conference on Artificial Intelligence},
year={2018}
}

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
images		images
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Exploring Human-like Attention Supervision in Visual Question Answering

HLAT Dataset

The .h5 files of attention maps can be downloade from here

For VQA1.0 dataset, there are:

For VQA2.0 dataset, there are:

VQA-HAT Dataset

Cite

About

Releases

Packages

qiaott/HAN

Folders and files

Latest commit

History

Repository files navigation

Exploring Human-like Attention Supervision in Visual Question Answering

HLAT Dataset

The .h5 files of attention maps can be downloade from here

For VQA1.0 dataset, there are:

For VQA2.0 dataset, there are:

VQA-HAT Dataset

Cite

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages