RuntimeError: indices should be either on cpu or on the same device as the indexed tensor (cpu) #65

jayshent · 2024-07-25T18:23:23Z

Hi all, I tried to execute the train.py file but encountered the issue below.

Appreciate your inputs in advance

$ python3 train.py --batch-size 32 --cfg cfg/yolov3.cfg --data data/coco.data --weights ''
Apex recommended for faster mixed precision training: https://github.com/NVIDIA/apex
Namespace(adam=False, batch_size=32, bucket='', cache_images=False, cfg='cfg/yolov3.cfg', data='data/coco.data', device='', epochs=300, evolve=False, freeze_layers=False, img_size=[320, 640, 640], multi_scale=False, name='', nosave=False, notest=False, rect=False, resume=False, single_cls=False, weights='')
Using CUDA device0 _CudaDeviceProperties(name='NVIDIA A100 80GB PCIe MIG 3g.40gb', total_memory=40448MB)

Start Tensorboard with "tensorboard --logdir=runs", view at http://localhost:6006/
WARNING: smart bias initialization failure.
WARNING: smart bias initialization failure.
WARNING: smart bias initialization failure.
Model Summary: 222 layers, 6.19491e+07 parameters, 6.19491e+07 gradients
Optimizer groups: 75 .bias, 75 Conv2d.weight, 72 other
Caching labels /home/jayshen26/jay_workspace_2/PyTorch-Spiking-YOLOv3/data/coco/labels/train2014.npy (117264 found, 0 missing, 0 empty, 4514 duplicat
Caching labels /home/jayshen26/jay_workspace_2/PyTorch-Spiking-YOLOv3/data/coco/labels/val2014.npy (4954 found, 0 missing, 0 empty, 197 duplicate, fo
Image sizes 320 - 640 train, 640 test
Using 8 dataloader workers
Starting training for 300 epochs...

 Epoch   gpu_mem      GIoU       obj       cls     total   targets  img_size

0%| | 0/3665 [00:15<?, ?it/s]
Traceback (most recent call last):
File "train.py", line 435, in
train(hyp) # train normally
File "train.py", line 283, in train
loss, loss_items = compute_loss(pred, targets, model)
File "/ibm/gpfs/home/jayshen26/jay_workspace_2/PyTorch-Spiking-YOLOv3/utils/utils.py", line 356, in compute_loss
tcls, tbox, indices, anchors = build_targets(p, targets, model) # targets
File "/ibm/gpfs/home/jayshen26/jay_workspace_2/PyTorch-Spiking-YOLOv3/utils/utils.py", line 441, in build_targets
a, t = at[j], t.repeat(na, 1, 1)[j] # filter
RuntimeError: indices should be either on cpu or on the same device as the indexed tensor (cpu)

The text was updated successfully, but these errors were encountered:

jayshent · 2024-07-25T18:25:48Z

@cwq159 Could you provide me some debug direction? Thanks!

Shenyiyu1 · 2024-08-04T11:59:02Z

Can you solve the problem？

jayshent · 2024-08-07T05:14:59Z

Can you solve the problem？

Unfortunately no, you run into the same error?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RuntimeError: indices should be either on cpu or on the same device as the indexed tensor (cpu) #65

RuntimeError: indices should be either on cpu or on the same device as the indexed tensor (cpu) #65

jayshent commented Jul 25, 2024

jayshent commented Jul 25, 2024

Shenyiyu1 commented Aug 4, 2024

jayshent commented Aug 7, 2024

RuntimeError: indices should be either on cpu or on the same device as the indexed tensor (cpu) #65

RuntimeError: indices should be either on cpu or on the same device as the indexed tensor (cpu) #65

Comments

jayshent commented Jul 25, 2024

jayshent commented Jul 25, 2024

Shenyiyu1 commented Aug 4, 2024

jayshent commented Aug 7, 2024