Skip to content

Latest commit

 

History

History
42 lines (24 loc) · 1.74 KB

super_convergence_notes.md

File metadata and controls

42 lines (24 loc) · 1.74 KB

Facts about training

All training is done on 20 epochs.

MODEL ARCH : RESNET18

All training is done with an SGD optimiser with momentum

optim.SGD(net.parameters(), lr=0.1, momentum=0.9, weight_decay=5e-4)

20 epochs take 28 minutes on Collab GPU.

  1. We train for 20 epochs with 1 cycle policy. Learning rate 0.001 to 0.01 and back. The network overfits at 94%. Get a test accuracy of 90.67. Accuracy keeps increasing till 20 epochs. NO ANNIHILATION

  2. We train for 20 epochs with 1 cycle policy with Learning rate 0.01 to 0.1 and back. network overfit s at 96%. test accuarcy is 91.31. Accuracy keeps increasing till 20 epochs. NO ANNIHILATION

  3. We train for 20 epochs with 1 cycle policy with Learning rate 0.01 to 0.8 and back TILL 16 Epochs. Then 4 EPOCHS of Learning rate annihilation . That means LR dropped to 0.001 and then trained. Network overfit at 94%. Test accuracy is 90.6 Accuracy converges till 20 epochs. WITH ANNIHILATION

  4. We train for 24 epochs with 1 cycle policy with Learning rate 0.01 to 0.1 and back. Then we train network for 2 epochs at 0.001 LR .network overfit s at 98%. test accuarcy is 93.4. Accuracy converges till 26 epochs. WITH ANNIHILATION

  5. We train for 28 epochs with 1 cycle policy with Learning rate 0.01 to 0.1 and back. Then we train network for 4 epochs at 0.001 LR .network overfit s at 98%. test accuarcy is 93.2. Accuracy converges till 32 epochs. WITH ANNIHILATION 4 EPOCHS

Epochs is a better measure than time !!

Thoughts

  1. Haven't tried using AdamW
  2. Heven't tried decreasing momemtum with increasing LR and vice versa.
  3. Haven't checked batch sizes weith claims.
  4. Used only resnet18.