can not reproduce top1 acc 77.0% on Kinetics #197

gooners1886 · 2020-05-12T04:37:35Z

I try to train a model from scratch on Kinetics-400 using same data as Non-local Network.
the config file is configs/Kinetics/SLOWFAST_8x8_R50.yaml
TRAIN.BATCH_SIZE is set to 64
GPU is 4xP40s,
But I got the top1-acc 74.13% on the validation set.
[INFO: logging.py: 67]: json_stats: {"split": "test_final", "top1_acc": "74.13", "top5_acc": "91.08"}
Is there something need to modify to reproduce the 77% reported in the model zoo?

haooooooqi · 2020-05-12T04:54:32Z

Hi, Thanks for playing with PySlowFast.

I am not sure about your detailed setting (especially your dataset size and pre-process), but one thing seems to be wrong is the batch size you use. Could you try to make sure you have 8 batch size per GPU? If you only have 4 GPUs, you could change the LR following the linear scaling rule.

gooners1886 · 2020-05-12T06:29:14Z

@takatosp1

my dataset is the same as Non-local Network. there are 234584 samples in training set and 19760 samples in val set. I conduct no pre-process because the videos in Non-local Network is already set to 256 for the shorter side. Is this config right?
there are some diff between the your config and mine in GPU num.
your code settting: 8GPU, 8 batch size per GPU, TRAIN.BATCH_SIZE is set to 64
mine code setting: 4gpu, 16 batch szie per GPU . TRAIN.BATCH_SIZE is set to 64
i think since TRAIN.BATCH_SIZE is 64 in both settings, they should be the equal.
is the two setttings above equal? or i need to change the LR?

haooooooqi · 2020-05-12T06:32:46Z

Running mean and std is only calculated on each of the device (GPU), so the running mean across 8 samples would be different from the running mean across 16 samples, so I think the more equivalent version should be BZ of 32 with half of the original lr.

gooners1886 · 2020-05-12T06:40:02Z

@takatosp1 thank you very much for your guide!!!
another question is when i try to train kinetics-400 with 4gpu, batchsize 32, and half lr, should i modify the SOLVER.MAX_EPOCH？ and WARMUP_EPOCHS ? and WARMUP_START_LR ?

gurkirt · 2020-05-30T17:12:53Z

@gooners1886 are you able train it on 4 GPUs?

gurkirt · 2020-05-30T17:44:47Z

Also, when I train I3D from scratch, I got "split": "test_final", "top1_acc": "72.82", "top5_acc": "90.65". Which is about 0.5 lower than mentioned in model zoo. @takatosp1 Can you please reveal, what did you achieve with this code while benchmarking.

When I test with provided Caffe 2 weight I get
I3D_R50 {"split": "test_final", "top1_acc": "73.04", "top5_acc": "90.34"}
SLOWFAST {"split": "test_final", "top1_acc": "76.44", "top5_acc": "92.22"}

For your reference, my dataset is the same as the dataset of Non-local paper. As mentioned by Xiaolong here . I got a copy of it from him, last year. It contains 234619 training videos and 19761 videos. However, I get a warning, [WARNING: meters.py: 302]: clip count tensor([30, 30, 30, ..., 30, 30, 30]) ~= num clips 30, when I run testing script on validation data.

pyav=='8.0.1'

chunfuchen · 2020-06-16T20:49:47Z

@gurkirt Which SLOWFAST model you tested? I tested the SLOWFAST_8x8_R50.pkl model but I only get {"split": "test_final", "top1_acc": "74.55", "top5_acc": "91.36"}. It is about 2.5% worse. I have 19742 data.

pyav==8.0.2
ffmpeg==4.2.3

haooooooqi · 2020-06-17T07:26:05Z

Thanks @gurkirt for the kind clarification. @chunfuchen feel free to follow what @gurkirt described and you should able to reproduce the result.

gurkirt · 2020-06-17T08:30:21Z

@takatosp1, Is it expected to get 72.8 instead of 73.4 with the current setup? I know this is a small gap, I just want to make sure if that I am not making any errors, here.

gurkirt · 2020-06-17T08:31:45Z

@gurkirt Which SLOWFAST model you tested? I tested the SLOWFAST_8x8_R50.pkl model
@chunfuchen I trained I3D_8x8_R50.cfg from scratch and got 72.8.

chunfuchen · 2020-06-17T12:24:16Z

@gurkirt is it possible I could get a copy of kinetics400 from you? Thanks.

gurkirt · 2020-06-17T15:24:52Z

You can find it here facebookresearch/video-nonlocal-net#67

chunfuchen · 2020-06-18T03:17:24Z

@takatosp1 I have followed @gurkirt to download the data.
I tested a model (SLOWFAST_8x8_R50) that has 77% top-1 accuracy on the model zoo page but I only get 76.44%, which is 0.6% lower. (I do not retrain it, just tested the model provided in the github.)
I know the model was trained and tested under caffe2, do you expect there is a 0.6 gap when switching to pytorch?

Thanks.

youngwanLEE · 2020-07-30T01:59:49Z

@chunfuchen same situation.

Is there any way to get the intact kinetics dataset?

bqhuyy · 2020-08-09T09:38:09Z

@takatosp1 thank you very much for your guide!!!
another question is when i try to train kinetics-400 with 4gpu, batchsize 32, and half lr, should i modify the SOLVER.MAX_EPOCH？ and WARMUP_EPOCHS ? and WARMUP_START_LR ?

Have you reproduced the result of SLOWFAST_8x8_R50? Can you share your config when training with 4GPUS?

bqhuyy · 2020-08-09T09:40:58Z

Running mean and std is only calculated on each of the device (GPU), so the running mean across 8 samples would be different from the running mean across 16 samples, so I think the more equivalent version should be BZ of 32 with half of the original lr.

I try to reproduce SLOWFAST_8x8_R50 from scratch. Can you share the configuration to train on 4GPUS machine? Thank you

BoPang1996 · 2020-11-01T06:48:04Z

I use the config configs/Kinetics/SLOWFAST_8x8_R50.yaml. The date I use is shared by Xiaolong Wang in Nonlocal which contains 234643 training videos and 19761 val videos. I do not have 16 nodes, so I trained the model on 2x8 V100 cards with 8 mini-batch on each card. The base learning rate is scaled to 0.2. The top-1 accuracy is 75.8, 1.2 lower than the official results.

For SLOWFAST_8x8_R101_101_101.yaml, the reproduced top-1 acc is 77.2, 0.7 lower than the official results.

For SLOW_8x8_R50.yaml, the reproduced top-1 acc is 74.0, 0.8 lower than the official results.

Does anyone else suffer from this problem? Is this caused by incomplete data?

haooooooqi added the question Further information is requested label May 12, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

can not reproduce top1 acc 77.0% on Kinetics #197

can not reproduce top1 acc 77.0% on Kinetics #197

gooners1886 commented May 12, 2020 •

edited

Loading

haooooooqi commented May 12, 2020

gooners1886 commented May 12, 2020

haooooooqi commented May 12, 2020 •

edited

Loading

gooners1886 commented May 12, 2020

gurkirt commented May 30, 2020

gurkirt commented May 30, 2020

chunfuchen commented Jun 16, 2020 •

edited

Loading

haooooooqi commented Jun 17, 2020

gurkirt commented Jun 17, 2020

gurkirt commented Jun 17, 2020

chunfuchen commented Jun 17, 2020

gurkirt commented Jun 17, 2020

chunfuchen commented Jun 18, 2020

youngwanLEE commented Jul 30, 2020

bqhuyy commented Aug 9, 2020

bqhuyy commented Aug 9, 2020

BoPang1996 commented Nov 1, 2020

can not reproduce top1 acc 77.0% on Kinetics #197

can not reproduce top1 acc 77.0% on Kinetics #197

Comments

gooners1886 commented May 12, 2020 • edited Loading

haooooooqi commented May 12, 2020

gooners1886 commented May 12, 2020

haooooooqi commented May 12, 2020 • edited Loading

gooners1886 commented May 12, 2020

gurkirt commented May 30, 2020

gurkirt commented May 30, 2020

chunfuchen commented Jun 16, 2020 • edited Loading

haooooooqi commented Jun 17, 2020

gurkirt commented Jun 17, 2020

gurkirt commented Jun 17, 2020

chunfuchen commented Jun 17, 2020

gurkirt commented Jun 17, 2020

chunfuchen commented Jun 18, 2020

youngwanLEE commented Jul 30, 2020

bqhuyy commented Aug 9, 2020

bqhuyy commented Aug 9, 2020

BoPang1996 commented Nov 1, 2020

gooners1886 commented May 12, 2020 •

edited

Loading

haooooooqi commented May 12, 2020 •

edited

Loading

chunfuchen commented Jun 16, 2020 •

edited

Loading