Reproducing results #18

zehzhang · 2021-09-07T01:26:25Z

Hi,

Thanks for the great work.

I'm having the same issue as #5 even when I tested the models with the same val split.

I played with SwinT and SwinB, and both of them gave 04%~0.5% lower top-1 accuracy than reported. They are still pretty neat but I just want to make sure I am not doing anything wrong.

Would you confirm that the models and the split files uploaded are the correct ones?

Also, if anyone has successfully reproduce it, please kindly comment here about whether there is anything else I need to do besides downloading the models and configs and run the test scripts.

Thanks,

hust-nj · 2021-09-07T08:30:20Z

Hi, before training and testing, the Kinetics400 training and validation datasets we use are preprocessed by making height of video 256, this may cause little difference. You can contact us by [email protected] for discussing more details.

zehzhang · 2021-09-14T19:55:52Z

Facing the same problem. I use the annotation files in this repo and test the provided pretrain models without any extra modification. Swin-T and Swin-B achieved 78.4% and 80.1% top-1 accuracy on K400, which seemed slightly worse than the reported results. @zehzhang Have you resolved this issue? If so, could you please share your solution?

Thanks for confirming the problem. I got similar decrease with SwinT (-0.4% top1 acc) and SwinB pretrained on ImageNet21k (-0.5% top1 acc). I'm reaching out to the other first co-author (referred to by @hust-nj ) and hopefully will figure out what is going on soon. I will keep this thread updated.

hust-nj · 2021-09-19T08:26:47Z

After a careful comparison, we find out that the performance gap is due to the slight difference on data. Our kinetics-400 data with 256 resolution was obtained (with broken video removed) from nonlocal networks which was also used in many other series of works.

More details and data download link can be found here https://github.com/youngwanLEE/VoV3D/blob/main/DATA.md#kinetics-400, facebookresearch/video-nonlocal-net#67

dragen1860 · 2022-02-10T07:28:09Z

@zehzhang Hi, Thanks for your issues. Do you train video-swin from scratch without imagenet-21k, did it drop severely? thank you.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reproducing results #18

Reproducing results #18

zehzhang commented Sep 7, 2021

hust-nj commented Sep 7, 2021

zehzhang commented Sep 14, 2021 •

edited

Loading

hust-nj commented Sep 19, 2021

dragen1860 commented Feb 10, 2022

Reproducing results #18

Reproducing results #18

Comments

zehzhang commented Sep 7, 2021

hust-nj commented Sep 7, 2021

zehzhang commented Sep 14, 2021 • edited Loading

hust-nj commented Sep 19, 2021

dragen1860 commented Feb 10, 2022

zehzhang commented Sep 14, 2021 •

edited

Loading