Divergent loss with SimCLR #1633

antho214 · 2024-08-15T11:58:31Z

Sometimes, when training using the SimCLR method I get some divergent loss function (see attached screenshot). I wonder if anyone has ever experienced this kind of issue when training with SimCLR. This has happened to me on different occasions with ResNet-18/50 model.
I don't think that this is an issue with the code, but if anyone has ever seen this kind of problem I would be grateful for your input.

Here's some information about the training hyper-parameters:

multi-gpu training with batch_size of 256 per GPU
batch_size: 1024
criterion: NTXentLoss with temperature of 0.1 and gather_distributed=True
optimizer: LARS with a base learning rate of 0.3 and default parameters from here
scheduler: CosineWarmupScheduler with 10k warmup steps

The text was updated successfully, but these errors were encountered:

guarin · 2024-08-16T06:52:20Z

This looks interesting, I haven't encountered this before. What type of data are you using? And do you use sync batchnorm?

antho214 · 2024-08-16T11:06:05Z

I am using microscopy data (large image; typically 2000x2000 pixels) from which I randomly crop 224x224 pixels size images. Using a grid-like sampling I can generate >750k crops that can be used for training.

I did set the sync_batchnorm=True in the trainer.

Something else that I realised is that the value of the loss function becomes constant at 7.624 (mean, min, and max). I tracked these values as well. This value somewhat corresponds to the loss value that I can obtain from two random vectors of size 1024x128 in the NTXentLoss.

IgorSusmelj · 2024-08-21T20:40:12Z

I could imagine that you face some numerical instabilities

do you use fp16 or bf16? (in case you use fp16 try switching to bf16 as it might be more stable)
you could try to plot the histogram of the weights in tensorboard to see if some values are getting very large --> if yes, you could try use additional gradient clipping, weight clipping, stronger weight decay
(more a far shot into the dark) maybe also look at some of the augmented images. Maybe the representations all collapse to the same values because the data looks too similar to each other. You could play with augmentations to fix this.

We have used SimCLR and all kinds of data including medical and microscopy and haven't had issues.

antho214 · 2024-08-26T10:48:46Z

Thank you for taking the time to answer.

I do not use the 16-bit precision when training. I am using the default value of the Trainer which is 32-true according to the documentation.
I will investigate whether the weights are getting large using tensorboard.
I am tracking the augmentations over time and they do look different.

Again, thank you for the feedback. I will update if I find a fix/solution.

antho214 · 2024-08-28T14:24:46Z

I have tracked some of the weights during training. One thing that I notice is that the weights of the batchnorm layers are relatively large (see screenshot). The weights of the convolution layers on the other hand all seem to behave normally, typically centered at 0 and normally distributed with values ranging from approximately [-1, 1].

guarin added the question label Aug 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Divergent loss with SimCLR #1633

Divergent loss with SimCLR #1633

antho214 commented Aug 15, 2024

guarin commented Aug 16, 2024

antho214 commented Aug 16, 2024

IgorSusmelj commented Aug 21, 2024 •

edited

Loading

antho214 commented Aug 26, 2024

antho214 commented Aug 28, 2024

Divergent loss with SimCLR #1633

Divergent loss with SimCLR #1633

Comments

antho214 commented Aug 15, 2024

guarin commented Aug 16, 2024

antho214 commented Aug 16, 2024

IgorSusmelj commented Aug 21, 2024 • edited Loading

antho214 commented Aug 26, 2024

antho214 commented Aug 28, 2024

IgorSusmelj commented Aug 21, 2024 •

edited

Loading