All training is done on 20 epochs.
MODEL ARCH : RESNET18
All training is done with an SGD optimiser with momentum
optim.SGD(net.parameters(), lr=0.1, momentum=0.9, weight_decay=5e-4)
20 epochs take 28 minutes on Collab GPU.
-
We train for 20 epochs with 1 cycle policy. Learning rate 0.001 to 0.01 and back. The network overfits at 94%. Get a test accuracy of 90.67. Accuracy keeps increasing till 20 epochs. NO ANNIHILATION
-
We train for 20 epochs with 1 cycle policy with Learning rate 0.01 to 0.1 and back. network overfit s at 96%. test accuarcy is 91.31. Accuracy keeps increasing till 20 epochs. NO ANNIHILATION
-
We train for 20 epochs with 1 cycle policy with Learning rate 0.01 to 0.8 and back TILL 16 Epochs. Then 4 EPOCHS of Learning rate annihilation . That means LR dropped to 0.001 and then trained. Network overfit at 94%. Test accuracy is 90.6 Accuracy converges till 20 epochs. WITH ANNIHILATION
-
We train for 24 epochs with 1 cycle policy with Learning rate 0.01 to 0.1 and back. Then we train network for 2 epochs at 0.001 LR .network overfit s at 98%. test accuarcy is 93.4. Accuracy converges till 26 epochs. WITH ANNIHILATION
-
We train for 28 epochs with 1 cycle policy with Learning rate 0.01 to 0.1 and back. Then we train network for 4 epochs at 0.001 LR .network overfit s at 98%. test accuarcy is 93.2. Accuracy converges till 32 epochs. WITH ANNIHILATION 4 EPOCHS
Epochs is a better measure than time !!
- Haven't tried using AdamW
- Heven't tried decreasing momemtum with increasing LR and vice versa.
- Haven't checked batch sizes weith claims.
- Used only resnet18.