Facts about training

All training is done on 20 epochs.

MODEL ARCH : RESNET18

All training is done with an SGD optimiser with momentum

optim.SGD(net.parameters(), lr=0.1, momentum=0.9, weight_decay=5e-4)

20 epochs take 28 minutes on Collab GPU.

We train for 20 epochs with 1 cycle policy. Learning rate 0.001 to 0.01 and back. The network overfits at 94%. Get a test accuracy of 90.67. Accuracy keeps increasing till 20 epochs. NO ANNIHILATION
We train for 20 epochs with 1 cycle policy with Learning rate 0.01 to 0.1 and back. network overfit s at 96%. test accuarcy is 91.31. Accuracy keeps increasing till 20 epochs. NO ANNIHILATION
We train for 20 epochs with 1 cycle policy with Learning rate 0.01 to 0.8 and back TILL 16 Epochs. Then 4 EPOCHS of Learning rate annihilation . That means LR dropped to 0.001 and then trained. Network overfit at 94%. Test accuracy is 90.6 Accuracy converges till 20 epochs. WITH ANNIHILATION
We train for 24 epochs with 1 cycle policy with Learning rate 0.01 to 0.1 and back. Then we train network for 2 epochs at 0.001 LR .network overfit s at 98%. test accuarcy is 93.4. Accuracy converges till 26 epochs. WITH ANNIHILATION
We train for 28 epochs with 1 cycle policy with Learning rate 0.01 to 0.1 and back. Then we train network for 4 epochs at 0.001 LR .network overfit s at 98%. test accuarcy is 93.2. Accuracy converges till 32 epochs. WITH ANNIHILATION 4 EPOCHS

Epochs is a better measure than time !!

Thoughts