Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError: indices should be either on cpu or on the same device as the indexed tensor (cpu) #65

Open
jayshent opened this issue Jul 25, 2024 · 3 comments

Comments

@jayshent
Copy link

Hi all, I tried to execute the train.py file but encountered the issue below.

Appreciate your inputs in advance

$ python3 train.py --batch-size 32 --cfg cfg/yolov3.cfg --data data/coco.data --weights ''
Apex recommended for faster mixed precision training: https://github.com/NVIDIA/apex
Namespace(adam=False, batch_size=32, bucket='', cache_images=False, cfg='cfg/yolov3.cfg', data='data/coco.data', device='', epochs=300, evolve=False, freeze_layers=False, img_size=[320, 640, 640], multi_scale=False, name='', nosave=False, notest=False, rect=False, resume=False, single_cls=False, weights='')
Using CUDA device0 _CudaDeviceProperties(name='NVIDIA A100 80GB PCIe MIG 3g.40gb', total_memory=40448MB)

Start Tensorboard with "tensorboard --logdir=runs", view at http://localhost:6006/
WARNING: smart bias initialization failure.
WARNING: smart bias initialization failure.
WARNING: smart bias initialization failure.
Model Summary: 222 layers, 6.19491e+07 parameters, 6.19491e+07 gradients
Optimizer groups: 75 .bias, 75 Conv2d.weight, 72 other
Caching labels /home/jayshen26/jay_workspace_2/PyTorch-Spiking-YOLOv3/data/coco/labels/train2014.npy (117264 found, 0 missing, 0 empty, 4514 duplicat
Caching labels /home/jayshen26/jay_workspace_2/PyTorch-Spiking-YOLOv3/data/coco/labels/val2014.npy (4954 found, 0 missing, 0 empty, 197 duplicate, fo
Image sizes 320 - 640 train, 640 test
Using 8 dataloader workers
Starting training for 300 epochs...

 Epoch   gpu_mem      GIoU       obj       cls     total   targets  img_size

0%| | 0/3665 [00:15<?, ?it/s]
Traceback (most recent call last):
File "train.py", line 435, in
train(hyp) # train normally
File "train.py", line 283, in train
loss, loss_items = compute_loss(pred, targets, model)
File "/ibm/gpfs/home/jayshen26/jay_workspace_2/PyTorch-Spiking-YOLOv3/utils/utils.py", line 356, in compute_loss
tcls, tbox, indices, anchors = build_targets(p, targets, model) # targets
File "/ibm/gpfs/home/jayshen26/jay_workspace_2/PyTorch-Spiking-YOLOv3/utils/utils.py", line 441, in build_targets
a, t = at[j], t.repeat(na, 1, 1)[j] # filter
RuntimeError: indices should be either on cpu or on the same device as the indexed tensor (cpu)

@jayshent
Copy link
Author

@cwq159 Could you provide me some debug direction? Thanks!

@Shenyiyu1
Copy link

Can you solve the problem?

@jayshent
Copy link
Author

jayshent commented Aug 7, 2024

Can you solve the problem?

Unfortunately no, you run into the same error?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants