-
Notifications
You must be signed in to change notification settings - Fork 322
kinetics datasets #67
Comments
I have the same problem, and hope someone can help me. The original dataset is too big, it's not easy to obtain, especially in China |
Me too, i sent the email and have't received reply. |
Here is a way to get the kinetics400. |
Hi @busixingxing , Thanks a lot for providing the above link (it contains the train and val data). Can you also provide a link for the test data set for Kinetics400 ? |
Hi, I am not the people who maintain this link, so I am not sure where to get the test data. I think most of researchers just use validation data to estimate the performance of the model. |
@busixingxing Thank for the link it is very helpful. |
I just did count the videos in my training set, the number is 234619, also
13734 in validation set. I probably ignored find the DATASET.MD you are
referring to, when I did the data copy.
The original dataset only provided the youtube link, and the video can be
removed anytime by people who uploaded them. The link I got might be a
mirror of data from a facebook researcher working on SlowFast model. Even
though my number is not exact the same with yours, it should be close
enough.
My only experiment with this dataset, is that I ran the training with
Facebook slowfast model, and I can repeat their evaluation accuracy on
validation set. Therefore, I would assume the dataset is ok to use.
Zehua Wei
ShiroKL <[email protected]> 于2020年1月14日周二 上午3:15写道:
… @busixingxing <https://github.com/busixingxing> Thank for the link it is
very helpful.
I have a question regarding the number of video in the archive. I found
234584 videos for the training videos but in the DATASET.MD file it was
said there are 234643 videos. I was wondering if the difference is normal ?
Does this archive is not the same than the INSTALL.MD was referring to ?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#67?email_source=notifications&email_token=ADBCVEGQX6WSBZPINLTYC2DQ5WNFFA5CNFSM4IKMKUFKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEI4HQ7Q#issuecomment-574126206>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ADBCVEFWTJEEN6VLTMCTH6DQ5WNFFANCNFSM4IKMKUFA>
.
|
@busixingxing Thank for your reply. I investigated a little more and it seems that I have the same number of training files than you before the extraction of the frames. Some extraction does not work because the files are "corrupted" (for instance no streams only sound audio) which results in 234584 training files.
validation : If you can check few of them and if they are not corrupted that would be great to create a small archive in order to complete the previous one. |
Since the dataset is big, I did not have a lot of space to extract all the video to frames. In SlowFast, maybe the researcher already set up a filter, so corrupted video did not block the normal training pipeline when I train the model. I did find out there are some video that have 0 frames or less than 100 frames, and it would stop the training in the other library I used, the MMAction, I have to set up my own filter. I used mmcv to get the number of frames of each video first, if the number of frame is less than 30 in training set, then use another video in the same class to replace it. In validation set, I had to set the threshold to 85 frames, because the sampling methods in testing seems to be different and require more frames. Hope this message may help you, if you do not want to extract the frames next time. |
Thanks for sharing @busixingxing |
You can send me a email, and I can share a link to you. [email protected]
Lovelyczl <[email protected]> 于2020年2月11日周二 下午7:41写道:
… Thanks for sharing @busixingxing <https://github.com/busixingxing>
But I still can't download it in China. My VPN connection is ineffective,
I am facing rapid disconnection. And it can't retry automatically and
continue download as well.
Could you provide a copy of dataset or share the way of downloading in
dropbox.
Thanks a lot!
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#67?email_source=notifications&email_token=ADBCVEEWXZ74457JJB6FACTRCNVV5A5CNFSM4IKMKUFKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOELPJHJI#issuecomment-585012133>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ADBCVEDZOBDQSRNUHTJ732LRCNVV5ANCNFSM4IKMKUFA>
.
|
I really need it. Thanks a lot! |
Hi, Thanks for providing the link for the data. You said there was 13734 videos in validation set which is much less than 19761 validation videos used in the paper. Does it matters? |
The real situation is, nobody can crawl the complete dataset from youtube anymore. Yes, 13k validation set is a bit less from 19k, but a quite big dataset still. I used SlowFast Model Zoo's SlowFast R50 8*8 to test on those 13k videos, and I can repeat the result close to the accuracy of the Model Zoo using the downloaded caffe2 model file. I did not have a chance to train my SlowFast fully due to limited GPUs, I tried, and it took 4 GPUs 20 days to train 100 epochs. The default training schedule is 196 epochs. My result from FB researcher's Caffe2 pretain model is :
Their result is
They also said " testing Caffe2 pretrained model in PyTorch might have a small difference in performance". Then I would assume the validation set download from the dropbox link is safe to use. |
Hi, Thanks for replying. However I don't think 74.8 is close to 77.0. I downloaded the dataset from the link you gave. But I can find 19736 videos in the validation set. The only problem is they have different format. some videos are .mp4, some are mkv and some are web. Is it because that you only count mp4 then you got 13K? Could you help me to check? |
I think you are right. There are a lot of videos that are not in mp4 format, I did not count that for the reply in my previous response, sorry for the confusion. When I wrote the script, I only used the .mp4. That may be another reason my result for top-1 is lower. My coworker mentioned he did another work unified all videos' format after my test job. Depend on your input pipeline, maybe simply change .mkv to .mp4 works too. |
Hi, @busixingxing , I download the 'compress.tar.gz', but it failed when unzipping 217 classes,. I wonder that if you can provide the md5 code. Some errors: tar: Skipping to next header |
Hi, I don't have the md5 either, I deleted the raw file after the unzip process. |
The training set has 234619 .mp4 videos. And 19761 videos not 13734 videos in the validation set. It has 13734 .mp4 videos remaining videos are with .webm or .mkv extension. |
Could someone provide a new dropbox link for the dataset (or similar)? The one in this thread has expired unfortunately. |
The url is out date. Do you mind to share the kinetics400 data set again? |
@LiuChaoXD @lukas-larsson @daodao316 @makecent @KangSooHan |
@youngwanLEE this link is also expired: https://dl.dropboxusercontent.com/s/zda3dfkp52eklvn/compress.tar.gz |
@applesleam |
@youngwanLEE |
I saw you have prepared one copy of the Kinetics dataset and email [email protected] recently, but haven‘t received any reply. If you have time, please check your email and contact to me, thanks a lot!
The text was updated successfully, but these errors were encountered: