Skip to content

Latest commit

 

History

History
65 lines (58 loc) · 9.84 KB

datasets.md

File metadata and controls

65 lines (58 loc) · 9.84 KB

Common multimodal datasets

Image Datasets

COCO
conceptual 3M
coenceptual 12M

Video&language Dataset

Dataset paper Clips Captions Videos Duration Source Year Tasks collection method
Chalades paper 10K 16K 10,000 82h daily household videos 2016 action recoginition & captioning AMT
MSRVTT paper 10k 200k 7,180 40h web-crawed videos with 257 queries 2016 retreival and captioning AMT
Didemo paper 27k 41k 10,464 87h randomly select over 14,000 videos from YFCC100M 2017 Moment localization crowdsoucing
M-VAD paper 49k 56k 92 84h DVD movies 2015 retreival crowdsourcing
MPII-MD paper 69k 68k 94 41h Web Movies 2015 captioning crowdsourcing
ActivityNet paper 100k 100k 20,000 849h online human activities videos 2017 captioning & retrieval AMT
TGIF paper 69k 68k 94 41h a year’s worth of GIF posts from Tumblr 2015 captioning CrowdFlower
YouCook2 paper 14k 14k 2,000 176h online cooking videos 2018 retreival & captioning well-trained native English speakers
LSMDC paper 128k 128k 200 150h comination of M-VAD and MPII-MD datasets 2017 captioning /
HowTo100M paper 136M 136M 1.221M 134,472h large-scaled online videos 2019 action step localization & retreival ASR
Kinetics-700 paper 650K / 650K / an extension of kinetics-700 dataset 2019 action recoginition /
AVA-Kinetics paper 230K / 230K / combines the annotation style of AVA and kinetics dataset 2020 action recoginition /
HACS paper 1.5M / 504K / large scale human action localization dataset 2019 action recoginition&captioning crowdsourcing
Tiny-Virat paper 13K / 13K / low-resolution action recognition dataset (surveillance videos) 2020 action recognition /
Action Genome paper 234K / 234K / video scene graph 2020 action recoginition& representations encoding eventpartonomies crowdsourcing
SoccerNet paper 650K 764h 650K / European Football League video 2018 event classification in football game video transformed from the data from league websites
ActivityNet Entities paper 650K / 650K / ground the visual entity with the activitynet video objects 2018 video understanding & action recognition crowdsourcing
VidSitu paper 136K / 29K / the events and related roles in the movies 2021 semantic role and co-referencing prediction AMT
VATEX paper 41.3k 826k 41.3k 114h38m human behavior video from YouTube 2019 action recoginition&captioning /
MSVD paper 2k 70k 2k 4h55m web videos 2011 video captioning AMT
MovieNet paper 420k 25k 420k / Web Movies 2020 Genre classification & cinematic style analysis & character recognition & scene analysis & story understanding crowdsourcing
MovieGraphs paper 7.6k 70k 51 150h scene graph representation of movie 2018 description retreival & dialog retrieval & Movie Clip Retrieval crowdsourcing
QVHIGHLIGHTS paper 10.3k 10.2k 10.3k / daily or travel vlog and news 2021 moment retreival & highlight detection AMT
UCF101 paper 13.3k 1600m 13.3k / user-uploaded videos 2012 action recoginition crowdsourcing
HMDB51 paper 7K / 7K / action videos from Youtube/Google 2011 action recoginition&captioning crowdsourcing
Moments-in-Time paper 1M / 1M / edited videos from YouTube, Flickr, Vine, Metacafe and other sources 2017 action&event recognition AMT
AVA paper 57.6K 300k 57.6K / Web Movies with human bounding boxes 2017 atomic visual actions recogintion crowdsourcing
HVU paper 57.2K 9M 57.2K / Youtube 2020 multi-label and multi-task video understanding semi-automatic crowdsourcing strategy
Oops! paper 20K / 20K / in-the-wild videos of unintentional action 2019 unintentional action recoginition AMT
CrossTask paper 4.7K / 4.7K / weakly supervising learning from instructional videos 2019 video classification crowdsourcing
COIN paper 11.8K / 11.8K / Comprehensive instructional video analysis 2019 step localization & action recoginition crowdsourcing
Sports-1M paper 1.1M / 1.1M / sports video from Youtube 2014 video classification crowdsourcing labed with taxonomy
20BN-SOMETHING-SOMETHING paper 220K 318K 220K / show humans performing pre-defined basic actions with everyday objects 2017 action recoginition AMT
DALY paper 8.1K / 8.1K / Daily Action Localization in YouTube 2016 video classification crowdsourcing
FineGym paper 8.1K / 8.1K / gymnastic videos with temporal actions and sub-actions 2020 video action recognition&detection&generation crowdsourcing
MultiSports paper 3.2K / 3.2K / competition videos with high resolution held in recent years 2021 spatio-temporal action detection /
“Wildlife Action” paper 10.6K / 10.6K / downloaded from YouTube 2020 animal action recognition YouTube’s Data API
“Action Recogniation of Large Animals” paper / / / / downloaded from YouTube 2018 animal action recognition YouTube’s Data API
“First-Person Animal Action” paper / / / / collected by a dog wearing a GoPro size camera 2014 first-person animal activity recogniation /
AnimalWeb paper / / / / collected by a dog wearing a GoPro size camera 2014 first-person animal activity recogniation /

Video Dataset

Dataset Videos Duration Source Year
Youtube8M 6M 350,000 YouTube 2018
FineAction 16,732 - YouTube 24 May 2021
VideoLT 256,218 819,898 YouTube 6 May 2021

dataset collection tools

voxel
amazon turkers
shaip