-
Notifications
You must be signed in to change notification settings - Fork 1
/
index.html
269 lines (236 loc) · 12.2 KB
/
index.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
<!doctype html>
<html lang="en">
<head>
<meta charset="utf-8">
<title>WildQA: In-the-Wild Video Question Answering</title>
<meta name="description"
content="WildQA is a video understanding dataset of videos recorded in outside settings. This project also introduce the Video Evidence Selection task.">
<meta name="keywords"
content="WildQA, VideoQA, Video Question Answering, Video Evidence Selection, Computer Vision, Machine Learning, dataset, Natural Language Processing, Videos, YouTube, in the wild, research, COLING 2022, COLING, Deep Learning, NLP, PyTorch">
<meta name="author"
content="Santiago Castro*, Naihao Deng*, Pingxuan Huang*, Mihai Burzo and Rada Mihalcea">
<meta name="viewport" content="width=device-width, initial-scale=1">
<meta property="og:type" content="website" />
<meta property="og:site_name" content="WildQA: In-the-Wild Video Question Answering" />
<meta property="og:image" content="https://lit.eecs.umich.edu/wildqa/img/example.png" />
<meta property="og:image:height" content="630" />
<meta property="og:image:width" content="1200" />
<meta property="og:title" content="WildQA: In-the-Wild Video Question Answering" />
<meta property="og:description" content="WildQA is a video understanding dataset of videos recorded in outside settings. This project also introduce the Video Evidence Selection task." />
<meta property="og:url" content="https://lit.eecs.umich.edu/wildqa/" />
<meta name="twitter:card" content="summary_large_image" />
<meta name="twitter:site" content="@michigan_AI" />
<meta name="twitter:creator" content="@michigan_AI" />
<script async src="https://www.googletagmanager.com/gtag/js?id=G-42MFV87X10"></script>
<script>
window.dataLayer = window.dataLayer || [];
function gtag(){dataLayer.push(arguments);}
gtag('js', new Date());
gtag('config', 'G-42MFV87X10');
</script>
<link rel="stylesheet" type="text/css" href="main.css"/>
</head>
<body>
<div class="container">
<header>
<a href="https://arc.engin.umich.edu/"><img id="arc" src="img/arc.png" alt="Automotive Research Center logo"></a>
<a href="https://umich.edu/"><img id="um" src="img/um.png" alt="University of Michigan logo"></a>
<h1>WildQA: In-the-Wild Video Question Answering</h1>
<ul id="quick-links">
<li><a href="https://aclanthology.org/2022.coling-1.496.pdf">Paper</a></li>
<li><a href="https://github.com/MichiganNLP/In-the-wild-QA">Data + Code</a></li>
<li><a href="https://aclanthology.org/2022.coling-1.496">ACL Anthology page</a></li>
<li><a href="https://github.com/MichiganNLP/In-the-wild-QA#citation">BibTeX Citation</a></li>
</ul>
</header>
<section class="section-alt">
<div class="content">
<h2>Abstract</h2>
<p id="abstract">
Existing video understanding datasets mostly focus on human interactions, with little attention being paid to the "in the wild" settings, where the videos are recorded outdoors. We propose <b>WildQA</b>, a video understanding dataset of videos recorded in outside settings. In addition to video question answering (Video QA), we also introduce the new task of identifying visual support for a given question and answer (Video Evidence Selection). Through evaluations using a wide range of baseline models, we show that WildQA poses new challenges to the vision and language research communities.
</p>
</div>
</section>
<section>
<div class="content">
<a href="https://aclanthology.org/2022.coling-1.496.pdf">
<ol id="thumbnails">
<li><img src="img/thumbs/0.png" alt="thumbnail, page 0"/></li>
<li><img src="img/thumbs/1.png" alt="thumbnail, page 1"/></li>
<li><img src="img/thumbs/2.png" alt="thumbnail, page 2"/></li>
<li><img src="img/thumbs/3.png" alt="thumbnail, page 3"/></li>
<li><img src="img/thumbs/4.png" alt="thumbnail, page 4"/></li>
<li><img src="img/thumbs/5.png" alt="thumbnail, page 5"/></li>
<li><img src="img/thumbs/6.png" alt="thumbnail, page 6"/></li>
<li><img src="img/thumbs/7.png" alt="thumbnail, page 7"/></li>
<li><img src="img/thumbs/8.png" alt="thumbnail, page 8"/></li>
</ol>
</a>
</div>
</section>
<section>
<div class="content">
<ol id="authors">
<li>
<a href="https://santi.uy">
<div class="author-img-container">
<img src="img/authors/santi.jpeg" alt="Santiago Castro profile picture">
</div>
Santiago Castro
</a>
</li>
<li>
<a href="https://dnaihao.github.io/">
<div class="author-img-container">
<img src="img/authors/naihao.jpeg" alt="Naihao Deng profile picture">
</div>
Naihao Deng
</a>
</li>
<li>
<div>
<div class="author-img-container">
<img src="img/authors/pingxuan.jpg" alt="Pingxuan Huang profile picture">
</div>
Pingxuan Huang
</div>
</li>
<li>
<a href="https://sites.google.com/umich.edu/mburzo">
<div class="author-img-container">
<img src="img/authors/mihai.jpeg" alt="Mihai G. Burzo profile picture">
</div>
Mihai G. Burzo
</a>
</li>
<li>
<a href="https://web.eecs.umich.edu/~mihalcea/">
<div class="author-img-container">
<img src="img/authors/rada.jpg" alt="Rada Mihalcea profile picture">
</div>
Rada Mihalcea
</a>
</li>
</ol>
</div>
</section>
<section>
<div class="content">
<h2>Example from our Dataset</h2>
<div class="video">
<video id="example-video" style="max-width:80%;max-height:100%" src="https://www.dropbox.com/s/r6imsp6nqrjdq2q/Norwegian-Explorer_11-clip-54.mp4?raw=1" controls> </video>
</div>
<div class="example-instance">
<p style="text-align: center;"><i><span class="Qlabel">Green</span> from the first annotation stage,
<span class="Alabel">Brown</span> from the second annotation stage</i></p>
<p class="example-question"><span class="Qlabel">Q1</span>: What kinds of bodies of water are there?</p>
<ul class="example-answer">
<li><span class="Qlabel">A1</span>: There are rivers and streams.<br/>
<button class="evidence-button" id="evid1.1.1" onclick="playEvidence(id,7.24,14.24)">Evidence 1</button>
</li>
<li><span class="Alabel">A2</span>: There is a long stream or river between the valley.<br/>
<button class="evidence-button" id="evid1.2.1" onclick="playEvidence(id,48.41,55.96)">Evidence 1</button>
<button class="evidence-button" id="evid1.2.2" onclick="playEvidence(id,58.66,75.29)">Evidence 2</button>
</li>
<li><span class="Alabel">A3</span>: The kinds of bodies of water there are streams.<br/>
<button class="evidence-button" id="evid1.3.1" onclick="playEvidence(id,51.17,63.51)">Evidence 1</button>
</li>
<li><span class="Alabel">A4</span>: There are bodies of water in this location.<br/>
<button class="evidence-button" id="evid1.4.1" onclick="playEvidence(id,7.49,11.72)">Evidence 1</button>
<button class="evidence-button" id="evid1.4.2" onclick="playEvidence(id,58.84,64.80)">Evidence 2</button>
</li>
</ul>
<hr class="example-hr">
<p class="example-question"><span class="Qlabel">Q2</span>: Where are the rivers located?</p>
<ul class="example-answer">
<li><span class="Qlabel">A1</span>: In valleys.<br/>
<button class="evidence-button" id="evid2.1.1" onclick="playEvidence(id,36.63,42.15)">Evidence 1</button>
</li>
<li><span class="Alabel">A2</span>: The rivers are located between what seems to be two different valleys based on the man stating that it was a valley. The first river you can see the higher elevation of ground where you see the bases of the trees look slanted. The second river looks further from the valley as you can see the valley looks distant from where the men are. Lastly, the second river seemed to be more plateaued than the first.<br/>
<button class="evidence-button" id="evid2.2.1" onclick="playEvidence(id,27.55,32.46)">Evidence 1</button>
<button class="evidence-button" id="evid2.2.2" onclick="playEvidence(id,15.40,27.00)">Evidence 2</button>
<button class="evidence-button" id="evid2.2.3" onclick="playEvidence(id,63.14,76.70)">Evidence 3</button>
</li>
<li><span class="Alabel">A3</span>: The rivers are located in the mountains near the bases.<br/>
<button class="evidence-button" id="evid2.3.1" onclick="playEvidence(id,35.28,42.15)">Evidence 1</button>
</li>
</ul>
<hr class="example-hr">
<p class="example-question"><span class="Qlabel">Q3</span>: What type of environment is it?</p>
<ul class="example-answer">
<li><span class="Qlabel">A1</span>: Mountainous, temperate forest. <br/>
<button class="evidence-button" id="evid3.1.1" onclick="playEvidence(id,0.0,9.45)">Evidence 1</button>
</li>
<li><span class="Alabel">A2</span>: It is a varied environment with rocky areas, greenery, and a river.<br/>
<button class="evidence-button" id="evid3.2.1" onclick="playEvidence(id,0.18,5.15)">Evidence 1</button>
<button class="evidence-button" id="evid3.2.2" onclick="playEvidence(id,59.52,67.43)">Evidence 2</button>
</li>
<li><span class="Alabel">A3</span>: It's environment type is an intermountain forest.<br/>
<button class="evidence-button" id="evid3.3.1" onclick="playEvidence(id,40.31,49.09)">Evidence 1</button>
</li>
</ul>
</div>
</div>
</section>
<section>
<p id="affiliation">
<a href="https://umich.edu/">
<img id="um-vertical" alt="University of Michigan" src="img/um-vertical.png">
</a>
</p>
</section>
<section class="section-alt">
<div class="content">
<h2>Downloads</h2>
<ul id="downloads">
<li><a href="https://aclanthology.org/2022.coling-1.496.pdf">PDF Paper</a></li>
<li><a href="https://github.com/MichiganNLP/In-the-wild-QA">Data + Code</a> (with instructions)</li>
</ul>
</div>
</section>
<footer>
<div class="content">
<h2>Acknowledgments</h2>
<p id="acknowledgments-text">
We thank the anonymous reviewers for their constructive feedbacks. We thank Artem Abzaliev,
<a href="https://mindojune.github.io/">Do june Min</a>,
<a href="https://oanaignat.github.io/">Oana Ignat</a>
for proofreading and suggestions. We thank William McNamee for the help with the video
collection process, and all the annotators for their hard work on data annotation. We thank
<a href="https://flaminghorizon.github.io/">Yiqun Yao</a> for the helpful
discussions during the early stage of the project. This material is based in part upon work supported by the
<a href="https://arc.engin.umich.edu/">Automotive Research Center (“ARC”)</a>.
Any opinions, findings, conclusions, or recommendations expressed in this material are those of the authors
and do not necessarily reflect the views of ARC or any other related entity.
</p>
<p>
Web page inspired by the
<a href="https://lit.eecs.umich.edu/lifeqa/">LifeQA web page</a>.
</p>
</div>
</footer>
</div>
</body>
</html>
<script type="text/javascript">
function playEvidence($id,$start,$end){
const $video = document.getElementById("example-video");
$video.pause();
document.getElementById($id).style.color = "rgb(117, 116, 116)";
function checkTime() {
if ($video.currentTime >= $end) {
$video.pause();
} else {
/* call checkTime every 1/10th second until endTime */
setTimeout(checkTime, 100);
}
}
$video.focus();
$video.currentTime = $start;
setTimeout(function () {
// to prevent `The play() request was interrupted by a call to pause().`
$video.play();
}, 150);
checkTime();
}
</script>