index.html

<!doctype html>
<html lang="en">

<head>
  <meta charset="utf-8">

  <title>WildQA: In-the-Wild Video Question Answering</title>
  <meta name="description"
        content="WildQA is a video understanding dataset of videos recorded in outside settings. This project also introduce the Video Evidence Selection task.">
  <meta name="keywords"
        content="WildQA, VideoQA, Video Question Answering, Video Evidence Selection, Computer Vision, Machine Learning, dataset, Natural Language Processing, Videos, YouTube, in the wild, research, COLING 2022, COLING, Deep Learning, NLP, PyTorch">
  <meta name="author"
        content="Santiago Castro*, Naihao Deng*, Pingxuan Huang*, Mihai Burzo and Rada Mihalcea">

  <meta name="viewport" content="width=device-width, initial-scale=1">

  <meta property="og:type" content="website" />
  <meta property="og:site_name" content="WildQA: In-the-Wild Video Question Answering" />
  <meta property="og:image" content="https://lit.eecs.umich.edu/wildqa/img/example.png" />
  <meta property="og:image:height" content="630" />
  <meta property="og:image:width" content="1200" />
  <meta property="og:title" content="WildQA: In-the-Wild Video Question Answering" />
  <meta property="og:description" content="WildQA is a video understanding dataset of videos recorded in outside settings. This project also introduce the Video Evidence Selection task." />
  <meta property="og:url" content="https://lit.eecs.umich.edu/wildqa/" />
  <meta name="twitter:card" content="summary_large_image" />
  <meta name="twitter:site" content="@michigan_AI" />
  <meta name="twitter:creator" content="@michigan_AI" />

  <script async src="https://www.googletagmanager.com/gtag/js?id=G-42MFV87X10"></script>
  <script>
    window.dataLayer = window.dataLayer || [];
    function gtag(){dataLayer.push(arguments);}
    gtag('js', new Date());

    gtag('config', 'G-42MFV87X10');
  </script>

  <link rel="stylesheet" type="text/css" href="main.css"/>
</head>

<body>

<div class="container">

  <header>
    <a href="https://arc.engin.umich.edu/"><img id="arc" src="img/arc.png" alt="Automotive Research Center logo"></a>

    <a href="https://umich.edu/"><img id="um" src="img/um.png" alt="University of Michigan logo"></a>

    <h1>WildQA: In-the-Wild Video Question Answering</h1>

    <ul id="quick-links">
      <li><a href="https://aclanthology.org/2022.coling-1.496.pdf">Paper</a></li>

      <li><a href="https://github.com/MichiganNLP/In-the-wild-QA">Data + Code</a></li>

      <li><a href="https://aclanthology.org/2022.coling-1.496">ACL Anthology page</a></li>

      <li><a href="https://github.com/MichiganNLP/In-the-wild-QA#citation">BibTeX Citation</a></li>
    </ul>
  </header>

  <section class="section-alt">
    <div class="content">
      <h2>Abstract</h2>

      <p id="abstract">
        Existing video understanding datasets mostly focus on human interactions, with little attention being paid to the "in the wild" settings, where the videos are recorded outdoors. We propose <b>WildQA</b>, a video understanding dataset of videos recorded in outside settings. In addition to video question answering (Video QA), we also introduce the new task of identifying visual support for a given question and answer (Video Evidence Selection). Through evaluations using a wide range of baseline models, we show that WildQA poses new challenges to the vision and language research communities.
      </p>
    </div>
  </section>

  <section>
    <div class="content">
      <a href="https://aclanthology.org/2022.coling-1.496.pdf">
        <ol id="thumbnails">
          <li><img src="img/thumbs/0.png" alt="thumbnail, page 0"/></li>
          <li><img src="img/thumbs/1.png" alt="thumbnail, page 1"/></li>
          <li><img src="img/thumbs/2.png" alt="thumbnail, page 2"/></li>
          <li><img src="img/thumbs/3.png" alt="thumbnail, page 3"/></li>
          <li><img src="img/thumbs/4.png" alt="thumbnail, page 4"/></li>
          <li><img src="img/thumbs/5.png" alt="thumbnail, page 5"/></li>
          <li><img src="img/thumbs/6.png" alt="thumbnail, page 6"/></li>
          <li><img src="img/thumbs/7.png" alt="thumbnail, page 7"/></li>
          <li><img src="img/thumbs/8.png" alt="thumbnail, page 8"/></li>
        </ol>
      </a>
    </div>
  </section>

  <section>
    <div class="content">
      <ol id="authors">
        <li>
          <a href="https://santi.uy">
            <div class="author-img-container">
              <img src="img/authors/santi.jpeg" alt="Santiago Castro profile picture">
            </div>
            Santiago Castro
          </a>
        </li>
        <li>
          <a href="https://dnaihao.github.io/">
            <div class="author-img-container">
              <img src="img/authors/naihao.jpeg" alt="Naihao Deng profile picture">
            </div>
            Naihao Deng
          </a>
        </li>
        <li>
          <div>
            <div class="author-img-container">
              <img src="img/authors/pingxuan.jpg" alt="Pingxuan Huang profile picture">
            </div>
            Pingxuan Huang
          </div>
        </li>
        <li>
          <a href="https://sites.google.com/umich.edu/mburzo">
            <div class="author-img-container">
              <img src="img/authors/mihai.jpeg" alt="Mihai G. Burzo profile picture">
            </div>
            Mihai G. Burzo
          </a>
        </li>
        <li>
          <a href="https://web.eecs.umich.edu/~mihalcea/">
            <div class="author-img-container">
              <img src="img/authors/rada.jpg" alt="Rada Mihalcea profile picture">
            </div>
            Rada Mihalcea
          </a>
        </li>
      </ol>
    </div>
  </section>

  <section>
    <div class="content">
      <h2>Example from our Dataset</h2>
      <div class="video">
        <video id="example-video" style="max-width:80%;max-height:100%" src="https://www.dropbox.com/s/r6imsp6nqrjdq2q/Norwegian-Explorer_11-clip-54.mp4?raw=1" controls> </video>
      </div>
      <div class="example-instance">
        <p style="text-align: center;"><i><span class="Qlabel">Green</span> from the first annotation stage,
           <span class="Alabel">Brown</span> from the second annotation stage</i></p>
        <p class="example-question"><span class="Qlabel">Q1</span>: What kinds of bodies of water are there?</p>
        <ul class="example-answer">
          <li><span class="Qlabel">A1</span>: There are rivers and streams.<br/>
            <button class="evidence-button" id="evid1.1.1" onclick="playEvidence(id,7.24,14.24)">Evidence 1</button>
          </li>
          <li><span class="Alabel">A2</span>: There is a long stream or river between the valley.<br/>
            <button class="evidence-button" id="evid1.2.1" onclick="playEvidence(id,48.41,55.96)">Evidence 1</button>
            <button class="evidence-button" id="evid1.2.2" onclick="playEvidence(id,58.66,75.29)">Evidence 2</button>
          </li>
          <li><span class="Alabel">A3</span>: The kinds of bodies of water there are streams.<br/>
            <button class="evidence-button" id="evid1.3.1" onclick="playEvidence(id,51.17,63.51)">Evidence 1</button>
          </li>
          <li><span class="Alabel">A4</span>: There are bodies of water in this location.<br/>
            <button class="evidence-button" id="evid1.4.1" onclick="playEvidence(id,7.49,11.72)">Evidence 1</button>
            <button class="evidence-button" id="evid1.4.2" onclick="playEvidence(id,58.84,64.80)">Evidence 2</button>
          </li>
        </ul>
        <hr class="example-hr">
        <p class="example-question"><span class="Qlabel">Q2</span>: Where are the rivers located?</p>
        <ul class="example-answer">
          <li><span class="Qlabel">A1</span>: In valleys.<br/>
            <button class="evidence-button" id="evid2.1.1" onclick="playEvidence(id,36.63,42.15)">Evidence 1</button>
          </li>
          <li><span class="Alabel">A2</span>: The rivers are located between what seems to be two different valleys based on the man stating that it was a valley. The first river you can see the higher elevation of ground where you see the bases of the trees look slanted. The second river looks further from the valley as you can see the valley looks distant from where the men are. Lastly, the second river seemed to be more plateaued than the first.<br/>
            <button class="evidence-button" id="evid2.2.1" onclick="playEvidence(id,27.55,32.46)">Evidence 1</button>
            <button class="evidence-button" id="evid2.2.2" onclick="playEvidence(id,15.40,27.00)">Evidence 2</button>
            <button class="evidence-button" id="evid2.2.3" onclick="playEvidence(id,63.14,76.70)">Evidence 3</button>
          </li>
          <li><span class="Alabel">A3</span>: The rivers are located in the mountains near the bases.<br/>
            <button class="evidence-button" id="evid2.3.1" onclick="playEvidence(id,35.28,42.15)">Evidence 1</button>
          </li>
        </ul>
        <hr class="example-hr">
        <p class="example-question"><span class="Qlabel">Q3</span>: What type of environment is it?</p>
        <ul class="example-answer">
          <li><span class="Qlabel">A1</span>: Mountainous, temperate forest. <br/>
            <button class="evidence-button" id="evid3.1.1" onclick="playEvidence(id,0.0,9.45)">Evidence 1</button>
          </li>
          <li><span class="Alabel">A2</span>: It is a varied environment with rocky areas, greenery, and a river.<br/>
            <button class="evidence-button" id="evid3.2.1" onclick="playEvidence(id,0.18,5.15)">Evidence 1</button>
            <button class="evidence-button" id="evid3.2.2" onclick="playEvidence(id,59.52,67.43)">Evidence 2</button>
          </li>
          <li><span class="Alabel">A3</span>: It's environment type is an intermountain forest.<br/>
            <button class="evidence-button" id="evid3.3.1" onclick="playEvidence(id,40.31,49.09)">Evidence 1</button>
          </li>
        </ul>
      </div>
    </div>
  </section>

  <section>
    <p id="affiliation">
      <a href="https://umich.edu/">
        <img id="um-vertical" alt="University of Michigan" src="img/um-vertical.png">
      </a>
    </p>
  </section>

  <section class="section-alt">
    <div class="content">
      <h2>Downloads</h2>

      <ul id="downloads">
        <li><a href="https://aclanthology.org/2022.coling-1.496.pdf">PDF Paper</a></li>
        <li><a href="https://github.com/MichiganNLP/In-the-wild-QA">Data + Code</a> (with instructions)</li>
      </ul>
    </div>
  </section>

  <footer>
    <div class="content">
      <h2>Acknowledgments</h2>
      <p id="acknowledgments-text">
        We thank the anonymous reviewers for their constructive feedbacks. We thank Artem Abzaliev, 
        <a href="https://mindojune.github.io/">Do june Min</a>, 
        <a href="https://oanaignat.github.io/">Oana Ignat</a> 
        for proofreading and suggestions. We thank William McNamee for the help with the video 
        collection process, and all the annotators for their hard work on data annotation. We thank 
        <a href="https://flaminghorizon.github.io/">Yiqun Yao</a> for the helpful 
        discussions during the early stage of the project. This material is based in part upon work supported by the
        <a href="https://arc.engin.umich.edu/">Automotive Research Center (“ARC”)</a>. 
        Any opinions, findings, conclusions, or recommendations expressed in this material are those of the authors
        and do not necessarily reflect the views of ARC or any other related entity.
      </p>

      <p>
        Web page inspired by the
        <a href="https://lit.eecs.umich.edu/lifeqa/">LifeQA web page</a>.
      </p>
    </div>
  </footer>

</div>

</body>

</html>

<script type="text/javascript">
  function playEvidence($id,$start,$end){
    const $video = document.getElementById("example-video");
    $video.pause();
    document.getElementById($id).style.color = "rgb(117, 116, 116)";
    
    function checkTime() {
        if ($video.currentTime >= $end) {
           $video.pause();
        } else {
           /* call checkTime every 1/10th second until endTime */
           setTimeout(checkTime, 100);
        }
    }

    $video.focus();
    $video.currentTime = $start;
    setTimeout(function () {
      // to prevent `The play() request was interrupted by a call to pause().`
      $video.play();
      }, 150);
    checkTime();
  }

</script>