You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello,
I am not sure if this is the correct place for a training question for one of these models but will give it an attempt anyways. I am trying to start with the deeplabv3_resnet50 vision segmentation pre-trained model and run a training on the model to fit it to my application. Unfortunately, no matter how big of a GPU I try to use, I always get an error message about "CUDA out of memory". For my latest attempt on AWS, utilizing a 24GB GPU instance. Are there any suggestions for training a vision segmentation model? Thanks in advance.
Vars:
Dataloader batch size: 5
epochs: 5
classes: 1
Error:
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 860.00 MiB (GPU 0: 21.99 GiB total capacity: 21.38 GiB already allocated; 21.42 GiB reserved in total by PyTorch)
fromtorchvision.models.segmentationimportdeeplabv3_resnet50, DeepLabV3_ResNet50_Weightsmodel=deeplabv3_resnet50(weights=DeepLabV3_ResNet50_Weights.DEFAULT)
num_classes=2# object+ backgroundmodel.classifier[4] =torch.nn.Conv2d(256, 2, kernel_size=(1, 1), stride=(1, 1))
model.aux_classifier[4] =torch.nn.Conv2d(256, 2, kernel_size=(1, 1), stride=(1, 1))
epochs=5device=torch.device('cuda') iftorch.cuda.is_available() elsetorch.device('cpu')
model=model.to(device)
model.train()
losses= []
criterion=smp.losses.DiceLoss(smp.losses.BINARY_MODE, from_logits=True)
optimizer=torch.optim.Adam(model.parameters(), lr=0.001)
foriinrange(epochs):
forbatch_idx, (images, masks) inenumerate(trainloader):
images=images.to(device) # error is thrown when images and masks attempt to load to GPUmasks=masks.to(device)
outputs=model(images)["out"]
loss=criterion(outputs, masks)
losses.append(loss)
# Backward and optimizeoptimizer.zero_grad()
loss.backward()
optimizer.step()
Note: utilizing AWS training jobs with g5.2xlarge container. Container stats are below:
The text was updated successfully, but these errors were encountered:
Hello,
I am not sure if this is the correct place for a training question for one of these models but will give it an attempt anyways. I am trying to start with the deeplabv3_resnet50 vision segmentation pre-trained model and run a training on the model to fit it to my application. Unfortunately, no matter how big of a GPU I try to use, I always get an error message about "CUDA out of memory". For my latest attempt on AWS, utilizing a 24GB GPU instance. Are there any suggestions for training a vision segmentation model? Thanks in advance.
Vars:
Dataloader batch size: 5
epochs: 5
classes: 1
Error:
Note: utilizing AWS training jobs with g5.2xlarge container. Container stats are below:
The text was updated successfully, but these errors were encountered: