Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

terminate called without an active exception Aborted (core dumped) #30

Open
MichaelCong opened this issue Jul 23, 2019 · 2 comments
Open

Comments

@MichaelCong
Copy link

python train.py ExtremeNet
loading all datasets...
using 4 threads
loading from cache file: ./cache/coco_extreme_train2017.pkl
loading annotations into memory...
Done (t=12.73s)
creating index...
index created!
loading from cache file: ./cache/coco_extreme_train2017.pkl
loading annotations into memory...
Done (t=12.93s)
creating index...
index created!
loading from cache file: ./cache/coco_extreme_train2017.pkl
loading annotations into memory...
Done (t=10.87s)
creating index...
index created!
loading from cache file: ./cache/coco_extreme_train2017.pkl
loading annotations into memory...
Done (t=15.55s)
creating index...
index created!
system config...
{'batch_size': 24,
'cache_dir': './cache',
'chunk_sizes': [4, 5, 5, 5, 5],
'config_dir': './config',
'data_dir': './data',
'data_rng': <mtrand.RandomState object at 0x7f87c7ffa480>,
'dataset': 'MSCOCOExtreme',
'decay_rate': 10,
'display': 5,
'learning_rate': 0.00025,
'max_iter': 250000,
'nnet_rng': <mtrand.RandomState object at 0x7f87c7ffa4c8>,
'opt_algo': 'adam',
'prefetch_size': 10,
'pretrain': './cache/CornerNet_500000.pkl',
'result_dir': './results',
'sampling_function': 'kp_detection',
'snapshot': 50000,
'snapshot_name': 'ExtremeNet',
'stepsize': 200000,
'test_split': 'testdev',
'train_split': 'train',
'val_iter': 100,
'val_split': 'val',
'weight_decay': False,
'weight_decay_rate': 1e-05,
'weight_decay_type': 'l2'}
db config...
{'ae_threshold': 0.5,
'aggr_weight': 0.1,
'border': 128,
'categories': 80,
'center_thresh': 0.1,
'data_aug': True,
'gaussian_bump': True,
'gaussian_iou': 0.7,
'gaussian_radius': -1,
'input_size': [511, 511],
'lighting': True,
'max_per_image': 100,
'merge_bbox': False,
'nms_algorithm': 'exp_soft_nms',
'nms_kernel': 3,
'nms_threshold': 0.5,
'output_sizes': [[128, 128]],
'rand_color': True,
'rand_crop': True,
'rand_pushes': False,
'rand_samples': False,
'rand_scale_max': 1.4,
'rand_scale_min': 0.6,
'rand_scale_step': 0.1,
'rand_scales': array([0.6, 0.7, 0.8, 0.9, 1. , 1.1, 1.2, 1.3]),
'scores_thresh': 0.1,
'special_crop': False,
'suppres_ghost': True,
'test_scales': [1],
'top_k': 40,
'weight_exp': 8}
len of db: 118287
start prefetching data...
shuffling indices...
start prefetching data...
start prefetching data...
shuffling indices...
shuffling indices...
building model...
module_file: models.ExtremeNet
start prefetching data...
shuffling indices...
total parameters: 198531504
loading from pretrained model
loading from ./cache/CornerNet_500000.pkl
setting learning rate to: 0.00025
training start...
0%| | 0/250000 [00:00<?, ?it/s]
Traceback (most recent call last):
File "train.py", line 225, in
train(training_dbs, None, args.start_iter, args.debug)
File "train.py", line 159, in train
training_loss = nnet.train(**training)
File "/home/rencong/ExtremeNet/nnet/py_factory.py", line 81, in train
loss = self.network(xs, ys)
File "/home/rencong/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "/home/rencong/ExtremeNet/models/py_utils/data_parallel.py", line 66, in forward
inputs, kwargs = self.scatter(inputs, kwargs, self.device_ids, self.chunk_sizes)
File "/home/rencong/ExtremeNet/models/py_utils/data_parallel.py", line 77, in scatter
return scatter_kwargs(inputs, kwargs, device_ids, dim=self.dim, chunk_sizes=self.chunk_sizes)
File "/home/rencong/ExtremeNet/models/py_utils/scatter_gather.py", line 30, in scatter_kwargs
inputs = scatter(inputs, target_gpus, dim, chunk_sizes) if inputs else []
File "/home/rencong/ExtremeNet/models/py_utils/scatter_gather.py", line 25, in scatter
return scatter_map(inputs)
File "/home/rencong/ExtremeNet/models/py_utils/scatter_gather.py", line 18, in scatter_map
return list(zip(map(scatter_map, obj)))
File "/home/rencong/ExtremeNet/models/py_utils/scatter_gather.py", line 20, in scatter_map
return list(map(list, zip(map(scatter_map, obj))))
File "/home/rencong/ExtremeNet/models/py_utils/scatter_gather.py", line 15, in scatter_map
return Scatter.apply(target_gpus, chunk_sizes, dim, obj)
File "/home/rencong/anaconda3/lib/python3.6/site-packages/torch/nn/parallel/_functions.py", line 89, in forward
outputs = comm.scatter(input, target_gpus, chunk_sizes, ctx.dim, streams)
File "/home/rencong/anaconda3/lib/python3.6/site-packages/torch/cuda/comm.py", line 148, in scatter
return tuple(torch._C._scatter(tensor, devices, chunk_sizes, dim, streams))
RuntimeError: CUDA error: invalid device ordinal (exchangeDevice at /opt/conda/conda-bld/pytorch_1550802451070/work/aten/src/ATen/cuda/detail/CUDAGuardImpl.h:28)
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&) + 0x6d (0x7f8821feb69d in /home/rencong/anaconda3/lib/python3.6/site-packages/torch/lib/libc10.so)
frame #1: + 0x4f223c (0x7f881f16d23c in /home/rencong/anaconda3/lib/python3.6/site-packages/torch/lib/libtorch_python.so)
frame #2: + 0x5fc38e (0x7f87fbb9638e in /home/rencong/anaconda3/lib/python3.6/site-packages/torch/lib/libcaffe2.so)
frame #3: + 0x739e55 (0x7f87fbcd3e55 in /home/rencong/anaconda3/lib/python3.6/site-packages/torch/lib/libcaffe2.so)
frame #4: at::TypeDefault::copy(at::Tensor const&, bool, c10::optionalc10::Device) const + 0x74 (0x7f87fbe4f204 in /home/rencong/anaconda3/lib/python3.6/site-packages/torch/lib/libcaffe2.so)
frame #5: at::native::to(at::Tensor const&, at::TensorOptions const&, bool, bool) + 0xc6d (0x7f87fbc327fd in /home/rencong/anaconda3/lib/python3.6/site-packages/torch/lib/libcaffe2.so)
frame #6: at::TypeDefault::to(at::Tensor const&, at::TensorOptions const&, bool, bool) const + 0x2c (0x7f87fbe0bcbc in /home/rencong/anaconda3/lib/python3.6/site-packages/torch/lib/libcaffe2.so)
frame #7: torch::autograd::VariableType::to(at::Tensor const&, at::TensorOptions const&, bool, bool) const + 0x19c (0x7f87fe532e1c in /home/rencong/anaconda3/lib/python3.6/site-packages/torch/lib/libtorch.so.1)
frame #8: torch::cuda::scatter(at::Tensor const&, c10::ArrayRef, c10::optional<std::vector<long, std::allocator > > const&, long, c10::optional<std::vector<c10::optionalat::cuda::CUDAStream, std::allocator<c10::optionalat::cuda::CUDAStream > > > const&) + 0x7a8 (0x7f881f183da8 in /home/rencong/anaconda3/lib/python3.6/site-packages/torch/lib/libtorch_python.so)
frame #9: + 0x5124de (0x7f881f18d4de in /home/rencong/anaconda3/lib/python3.6/site-packages/torch/lib/libtorch_python.so)
frame #10: + 0xfd760 (0x7f881ed78760 in /home/rencong/anaconda3/lib/python3.6/site-packages/torch/lib/libtorch_python.so)

frame #21: THPFunction_apply(_object
, _object
) + 0x6ad (0x7f881ef7482d in /home/rencong/anaconda3/lib/python3.6/site-packages/torch/lib/libtorch_python.so)

terminate called without an active exception
Aborted (core dumped)

@bageheyalu
Copy link

Do you solve this question?

1 similar comment
@ZHR1997
Copy link

ZHR1997 commented Apr 17, 2020

Do you solve this question?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants