Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reproducing results #18

Open
zehzhang opened this issue Sep 7, 2021 · 4 comments
Open

Reproducing results #18

zehzhang opened this issue Sep 7, 2021 · 4 comments

Comments

@zehzhang
Copy link

zehzhang commented Sep 7, 2021

Hi,

Thanks for the great work.

I'm having the same issue as #5 even when I tested the models with the same val split.

I played with SwinT and SwinB, and both of them gave 04%~0.5% lower top-1 accuracy than reported. They are still pretty neat but I just want to make sure I am not doing anything wrong.

Would you confirm that the models and the split files uploaded are the correct ones?

Also, if anyone has successfully reproduce it, please kindly comment here about whether there is anything else I need to do besides downloading the models and configs and run the test scripts.

Thanks,

@hust-nj
Copy link
Member

hust-nj commented Sep 7, 2021

Hi, before training and testing, the Kinetics400 training and validation datasets we use are preprocessed by making height of video 256, this may cause little difference. You can contact us by [email protected] for discussing more details.

@zehzhang
Copy link
Author

zehzhang commented Sep 14, 2021

Facing the same problem. I use the annotation files in this repo and test the provided pretrain models without any extra modification. Swin-T and Swin-B achieved 78.4% and 80.1% top-1 accuracy on K400, which seemed slightly worse than the reported results. @zehzhang Have you resolved this issue? If so, could you please share your solution?

Thanks for confirming the problem. I got similar decrease with SwinT (-0.4% top1 acc) and SwinB pretrained on ImageNet21k (-0.5% top1 acc). I'm reaching out to the other first co-author (referred to by @hust-nj ) and hopefully will figure out what is going on soon. I will keep this thread updated.

@hust-nj
Copy link
Member

hust-nj commented Sep 19, 2021

After a careful comparison, we find out that the performance gap is due to the slight difference on data. Our kinetics-400 data with 256 resolution was obtained (with broken video removed) from nonlocal networks which was also used in many other series of works.

More details and data download link can be found here https://github.com/youngwanLEE/VoV3D/blob/main/DATA.md#kinetics-400, facebookresearch/video-nonlocal-net#67

@dragen1860
Copy link

@zehzhang Hi, Thanks for your issues. Do you train video-swin from scratch without imagenet-21k, did it drop severely? thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants