terminate called without an active exception Aborted (core dumped) #30

MichaelCong · 2019-07-23T06:33:36Z

python train.py ExtremeNet
loading all datasets...
using 4 threads
loading from cache file: ./cache/coco_extreme_train2017.pkl
loading annotations into memory...
Done (t=12.73s)
creating index...
index created!
loading from cache file: ./cache/coco_extreme_train2017.pkl
loading annotations into memory...
Done (t=12.93s)
creating index...
index created!
loading from cache file: ./cache/coco_extreme_train2017.pkl
loading annotations into memory...
Done (t=10.87s)
creating index...
index created!
loading from cache file: ./cache/coco_extreme_train2017.pkl
loading annotations into memory...
Done (t=15.55s)
creating index...
index created!
system config...
{'batch_size': 24,
'cache_dir': './cache',
'chunk_sizes': [4, 5, 5, 5, 5],
'config_dir': './config',
'data_dir': './data',
'data_rng': <mtrand.RandomState object at 0x7f87c7ffa480>,
'dataset': 'MSCOCOExtreme',
'decay_rate': 10,
'display': 5,
'learning_rate': 0.00025,
'max_iter': 250000,
'nnet_rng': <mtrand.RandomState object at 0x7f87c7ffa4c8>,
'opt_algo': 'adam',
'prefetch_size': 10,
'pretrain': './cache/CornerNet_500000.pkl',
'result_dir': './results',
'sampling_function': 'kp_detection',
'snapshot': 50000,
'snapshot_name': 'ExtremeNet',
'stepsize': 200000,
'test_split': 'testdev',
'train_split': 'train',
'val_iter': 100,
'val_split': 'val',
'weight_decay': False,
'weight_decay_rate': 1e-05,
'weight_decay_type': 'l2'}
db config...
{'ae_threshold': 0.5,
'aggr_weight': 0.1,
'border': 128,
'categories': 80,
'center_thresh': 0.1,
'data_aug': True,
'gaussian_bump': True,
'gaussian_iou': 0.7,
'gaussian_radius': -1,
'input_size': [511, 511],
'lighting': True,
'max_per_image': 100,
'merge_bbox': False,
'nms_algorithm': 'exp_soft_nms',
'nms_kernel': 3,
'nms_threshold': 0.5,
'output_sizes': [[128, 128]],
'rand_color': True,
'rand_crop': True,
'rand_pushes': False,
'rand_samples': False,
'rand_scale_max': 1.4,
'rand_scale_min': 0.6,
'rand_scale_step': 0.1,
'rand_scales': array([0.6, 0.7, 0.8, 0.9, 1. , 1.1, 1.2, 1.3]),
'scores_thresh': 0.1,
'special_crop': False,
'suppres_ghost': True,
'test_scales': [1],
'top_k': 40,
'weight_exp': 8}
len of db: 118287
start prefetching data...
shuffling indices...
start prefetching data...
start prefetching data...
shuffling indices...
shuffling indices...
building model...
module_file: models.ExtremeNet
start prefetching data...
shuffling indices...
total parameters: 198531504
loading from pretrained model
loading from ./cache/CornerNet_500000.pkl
setting learning rate to: 0.00025
training start...
0%| | 0/250000 [00:00<?, ?it/s]
Traceback (most recent call last):
File "train.py", line 225, in
train(training_dbs, None, args.start_iter, args.debug)
File "train.py", line 159, in train
training_loss = nnet.train(**training)
File "/home/rencong/ExtremeNet/nnet/py_factory.py", line 81, in train
loss = self.network(xs, ys)
File "/home/rencong/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "/home/rencong/ExtremeNet/models/py_utils/data_parallel.py", line 66, in forward
inputs, kwargs = self.scatter(inputs, kwargs, self.device_ids, self.chunk_sizes)
File "/home/rencong/ExtremeNet/models/py_utils/data_parallel.py", line 77, in scatter
return scatter_kwargs(inputs, kwargs, device_ids, dim=self.dim, chunk_sizes=self.chunk_sizes)
File "/home/rencong/ExtremeNet/models/py_utils/scatter_gather.py", line 30, in scatter_kwargs
inputs = scatter(inputs, target_gpus, dim, chunk_sizes) if inputs else []
File "/home/rencong/ExtremeNet/models/py_utils/scatter_gather.py", line 25, in scatter
return scatter_map(inputs)
File "/home/rencong/ExtremeNet/models/py_utils/scatter_gather.py", line 18, in scatter_map
return list(zip(map(scatter_map, obj)))
File "/home/rencong/ExtremeNet/models/py_utils/scatter_gather.py", line 20, in scatter_map
return list(map(list, zip(map(scatter_map, obj))))
File "/home/rencong/ExtremeNet/models/py_utils/scatter_gather.py", line 15, in scatter_map
return Scatter.apply(target_gpus, chunk_sizes, dim, obj)
File "/home/rencong/anaconda3/lib/python3.6/site-packages/torch/nn/parallel/_functions.py", line 89, in forward
outputs = comm.scatter(input, target_gpus, chunk_sizes, ctx.dim, streams)
File "/home/rencong/anaconda3/lib/python3.6/site-packages/torch/cuda/comm.py", line 148, in scatter
return tuple(torch._C._scatter(tensor, devices, chunk_sizes, dim, streams))
RuntimeError: CUDA error: invalid device ordinal (exchangeDevice at /opt/conda/conda-bld/pytorch_1550802451070/work/aten/src/ATen/cuda/detail/CUDAGuardImpl.h:28)
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&) + 0x6d (0x7f8821feb69d in /home/rencong/anaconda3/lib/python3.6/site-packages/torch/lib/libc10.so)
frame #1: + 0x4f223c (0x7f881f16d23c in /home/rencong/anaconda3/lib/python3.6/site-packages/torch/lib/libtorch_python.so)
frame #2: + 0x5fc38e (0x7f87fbb9638e in /home/rencong/anaconda3/lib/python3.6/site-packages/torch/lib/libcaffe2.so)
frame #3: + 0x739e55 (0x7f87fbcd3e55 in /home/rencong/anaconda3/lib/python3.6/site-packages/torch/lib/libcaffe2.so)
frame #4: at::TypeDefault::copy(at::Tensor const&, bool, c10::optionalc10::Device) const + 0x74 (0x7f87fbe4f204 in /home/rencong/anaconda3/lib/python3.6/site-packages/torch/lib/libcaffe2.so)
frame #5: at::native::to(at::Tensor const&, at::TensorOptions const&, bool, bool) + 0xc6d (0x7f87fbc327fd in /home/rencong/anaconda3/lib/python3.6/site-packages/torch/lib/libcaffe2.so)
frame #6: at::TypeDefault::to(at::Tensor const&, at::TensorOptions const&, bool, bool) const + 0x2c (0x7f87fbe0bcbc in /home/rencong/anaconda3/lib/python3.6/site-packages/torch/lib/libcaffe2.so)
frame #7: torch::autograd::VariableType::to(at::Tensor const&, at::TensorOptions const&, bool, bool) const + 0x19c (0x7f87fe532e1c in /home/rencong/anaconda3/lib/python3.6/site-packages/torch/lib/libtorch.so.1)
frame #8: torch::cuda::scatter(at::Tensor const&, c10::ArrayRef, c10::optional<std::vector<long, std::allocator > > const&, long, c10::optional<std::vector<c10::optionalat::cuda::CUDAStream, std::allocator<c10::optionalat::cuda::CUDAStream > > > const&) + 0x7a8 (0x7f881f183da8 in /home/rencong/anaconda3/lib/python3.6/site-packages/torch/lib/libtorch_python.so)
frame #9: + 0x5124de (0x7f881f18d4de in /home/rencong/anaconda3/lib/python3.6/site-packages/torch/lib/libtorch_python.so)
frame #10: + 0xfd760 (0x7f881ed78760 in /home/rencong/anaconda3/lib/python3.6/site-packages/torch/lib/libtorch_python.so)

frame #21: THPFunction_apply(_object, _object) + 0x6ad (0x7f881ef7482d in /home/rencong/anaconda3/lib/python3.6/site-packages/torch/lib/libtorch_python.so)

terminate called without an active exception
Aborted (core dumped)

bageheyalu · 2019-09-20T09:15:23Z

Do you solve this question?

ZHR1997 · 2020-04-17T09:07:26Z

Do you solve this question?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

terminate called without an active exception Aborted (core dumped) #30

terminate called without an active exception Aborted (core dumped) #30

MichaelCong commented Jul 23, 2019

bageheyalu commented Sep 20, 2019

ZHR1997 commented Apr 17, 2020

terminate called without an active exception Aborted (core dumped) #30

terminate called without an active exception Aborted (core dumped) #30

Comments

MichaelCong commented Jul 23, 2019

bageheyalu commented Sep 20, 2019

ZHR1997 commented Apr 17, 2020