-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
can not reproduce top1 acc 77.0% on Kinetics #197
Comments
Hi, Thanks for playing with PySlowFast. I am not sure about your detailed setting (especially your dataset size and pre-process), but one thing seems to be wrong is the batch size you use. Could you try to make sure you have 8 batch size per GPU? If you only have 4 GPUs, you could change the LR following the linear scaling rule. |
@takatosp1
|
Running mean and std is only calculated on each of the device (GPU), so the running mean across 8 samples would be different from the running mean across 16 samples, so I think the more equivalent version should be BZ of 32 with half of the original lr. |
@takatosp1 thank you very much for your guide!!! |
@gooners1886 are you able train it on 4 GPUs? |
Also, when I train I3D from scratch, I got "split": "test_final", "top1_acc": "72.82", "top5_acc": "90.65". Which is about 0.5 lower than mentioned in model zoo. @takatosp1 Can you please reveal, what did you achieve with this code while benchmarking. When I test with provided Caffe 2 weight I get For your reference, my dataset is the same as the dataset of Non-local paper. As mentioned by Xiaolong here . I got a copy of it from him, last year. It contains 234619 training videos and 19761 videos. However, I get a warning, [WARNING: meters.py: 302]: clip count tensor([30, 30, 30, ..., 30, 30, 30]) ~= num clips 30, when I run testing script on validation data. pyav=='8.0.1' |
@gurkirt Which pyav==8.0.2 |
Thanks @gurkirt for the kind clarification. @chunfuchen feel free to follow what @gurkirt described and you should able to reproduce the result. |
@takatosp1, Is it expected to get 72.8 instead of 73.4 with the current setup? I know this is a small gap, I just want to make sure if that I am not making any errors, here. |
|
@gurkirt is it possible I could get a copy of kinetics400 from you? Thanks. |
You can find it here facebookresearch/video-nonlocal-net#67 |
@takatosp1 I have followed @gurkirt to download the data. Thanks. |
@chunfuchen same situation. Is there any way to get the intact kinetics dataset? |
Have you reproduced the result of SLOWFAST_8x8_R50? Can you share your config when training with 4GPUS? |
I try to reproduce SLOWFAST_8x8_R50 from scratch. Can you share the configuration to train on 4GPUS machine? Thank you |
I use the config configs/Kinetics/SLOWFAST_8x8_R50.yaml. The date I use is shared by Xiaolong Wang in Nonlocal which contains 234643 training videos and 19761 val videos. I do not have 16 nodes, so I trained the model on 2x8 V100 cards with 8 mini-batch on each card. The base learning rate is scaled to 0.2. The top-1 accuracy is 75.8, 1.2 lower than the official results. For SLOWFAST_8x8_R101_101_101.yaml, the reproduced top-1 acc is 77.2, 0.7 lower than the official results. For SLOW_8x8_R50.yaml, the reproduced top-1 acc is 74.0, 0.8 lower than the official results. Does anyone else suffer from this problem? Is this caused by incomplete data? |
I try to train a model from scratch on Kinetics-400 using same data as Non-local Network.
the config file is configs/Kinetics/SLOWFAST_8x8_R50.yaml
TRAIN.BATCH_SIZE is set to 64
GPU is 4xP40s,
But I got the top1-acc 74.13% on the validation set.
[INFO: logging.py: 67]: json_stats: {"split": "test_final", "top1_acc": "74.13", "top5_acc": "91.08"}
Is there something need to modify to reproduce the 77% reported in the model zoo?
The text was updated successfully, but these errors were encountered: