Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

training error: OutOfRangeError: End of sequence #60

Open
muxizju opened this issue Sep 27, 2018 · 4 comments
Open

training error: OutOfRangeError: End of sequence #60

muxizju opened this issue Sep 27, 2018 · 4 comments

Comments

@muxizju
Copy link

muxizju commented Sep 27, 2018

I use the codes to train my own dataset, but raised this error at sees.run(). The detail printed log is as below in which I changed some args such as net_input_height size and batch_p. my tensorflow version is 1.7. I don't know what's wrong here

Instructions for updating:
Use the retry module or similar alternatives.
2018-09-27 11:12:06,474 [INFO] train: Training using the following parameters:
2018-09-27 11:12:06,474 [INFO] train: batch_k: 4
2018-09-27 11:12:06,474 [INFO] train: batch_p: 8
2018-09-27 11:12:06,474 [INFO] train: checkpoint_frequency: 1000
2018-09-27 11:12:06,474 [INFO] train: crop_augment: False
2018-09-27 11:12:06,474 [INFO] train: decay_start_iteration: 100000
2018-09-27 11:12:06,474 [INFO] train: detailed_logs: False
2018-09-27 11:12:06,474 [INFO] train: embedding_dim: 128
2018-09-27 11:12:06,475 [INFO] train: experiment_root: F:/projector/GestureClassification/TripletBasedGestureRecognition/experiment_root/20180926/
2018-09-27 11:12:06,475 [INFO] train: flip_augment: False
2018-09-27 11:12:06,475 [INFO] train: head_name: fc1024
2018-09-27 11:12:06,475 [INFO] train: image_root: F:/projector/GestureClassification/data/img/20180919/triplet_data/img/
2018-09-27 11:12:06,475 [INFO] train: initial_checkpoint: None
2018-09-27 11:12:06,475 [INFO] train: learning_rate: 0.0003
2018-09-27 11:12:06,475 [INFO] train: loading_threads: 4
2018-09-27 11:12:06,475 [INFO] train: loss: batch_hard
2018-09-27 11:12:06,476 [INFO] train: margin: soft
2018-09-27 11:12:06,476 [INFO] train: metric: euclidean
2018-09-27 11:12:06,476 [INFO] train: model_name: resnet_v1_50
2018-09-27 11:12:06,476 [INFO] train: net_input_height: 64
2018-09-27 11:12:06,476 [INFO] train: net_input_width: 64
2018-09-27 11:12:06,476 [INFO] train: pre_crop_height: 64
2018-09-27 11:12:06,476 [INFO] train: pre_crop_width: 64
2018-09-27 11:12:06,476 [INFO] train: resume: False
2018-09-27 11:12:06,476 [INFO] train: train_iterations: 250000
2018-09-27 11:12:06,476 [INFO] train: train_set: F:/projector/GestureClassification/data/img/20180919/triplet_data/gesture_train.csv
2018-09-27 11:12:07,403 [INFO] tensorflow: Scale of 0 disables regularizer.
2018-09-27 11:12:07,403 [INFO] tensorflow: Scale of 0 disables regularizer.
2018-09-27 11:12:08,569 [WARNING] tensorflow: From F:\projector\GestureClassification\TripletBasedGestureRecognition\triplet-reid\nets\resnet_v1.py:219: calling reduce_mean (from tensorflow.python.ops.math_ops) with keep_dims is deprecated and will be removed in a future version.
Instructions for updating:
keep_dims is deprecated, use keepdims instead
2018-09-27 11:12:08,569 [WARNING] tensorflow: From F:\projector\GestureClassification\TripletBasedGestureRecognition\triplet-reid\nets\resnet_v1.py:219: calling reduce_mean (from tensorflow.python.ops.math_ops) with keep_dims is deprecated and will be removed in a future version.
Instructions for updating:
keep_dims is deprecated, use keepdims instead
D:\Program Files\Python3.5\lib\site-packages\tensorflow\python\ops\gradients_impl.py:100: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.
"Converting sparse IndexedSlices to a dense Tensor of unknown shape. "
2018-09-27 11:12:11.533610: I T:\src\github\tensorflow\tensorflow\core\platform\cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
2018-09-27 11:12:11.936193: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:1344] Found device 0 with properties:
name: GeForce GTX 1060 5GB major: 6 minor: 1 memoryClockRate(GHz): 1.7085
pciBusID: 0000:01:00.0
totalMemory: 5.00GiB freeMemory: 4.12GiB
2018-09-27 11:12:11.936710: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:1423] Adding visible gpu devices: 0
2018-09-27 11:12:14.388590: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:911] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-09-27 11:12:14.388811: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:917] 0
2018-09-27 11:12:14.388948: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:930] 0: N
2018-09-27 11:12:14.415769: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:1041] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 3871 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1060 5GB, pci bus id: 0000:01:00.0, compute capability: 6.1)
2018-09-27 11:12:16.275624: I T:\src\github\tensorflow\tensorflow\core\kernels\cuda_solvers.cc:159] Creating CudaSolver handles for stream 000001A50E54E080
2018-09-27 11:12:20,572 [INFO] tensorflow: F:/projector/GestureClassification/TripletBasedGestureRecognition/experiment_root/20180926/checkpoint-0 is not in all_model_checkpoint_paths. Manually adding it.
2018-09-27 11:12:20,572 [INFO] tensorflow: F:/projector/GestureClassification/TripletBasedGestureRecognition/experiment_root/20180926/checkpoint-0 is not in all_model_checkpoint_paths. Manually adding it.
2018-09-27 11:12:23,207 [INFO] train: Starting training from iteration 0.

Traceback (most recent call last):
File "D:\Program Files\Python3.5\lib\site-packages\tensorflow\python\client\session.py", line 1327, in _do_call
return fn(*args)
File "D:\Program Files\Python3.5\lib\site-packages\tensorflow\python\client\session.py", line 1312, in _run_fn
options, feed_dict, fetch_list, target_list, run_metadata)
File "D:\Program Files\Python3.5\lib\site-packages\tensorflow\python\client\session.py", line 1420, in _call_tf_sessionrun
status, run_metadata)
File "D:\Program Files\Python3.5\lib\site-packages\tensorflow\python\framework\errors_impl.py", line 516, in exit
c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.OutOfRangeError: End of sequence
[[Node: IteratorGetNext = IteratorGetNextoutput_shapes=[[?,64,64,3], [?], [?]], output_types=[DT_FLOAT, DT_STRING, DT_STRING], _device="/job:localhost/replica:0/task:0/device:CPU:0"]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "F:/projector/GestureClassification/TripletBasedGestureRecognition/triplet-reid/train.py", line 439, in
main()
File "F:/projector/GestureClassification/TripletBasedGestureRecognition/triplet-reid/train.py", line 393, in main
prec_at_k, endpoints['emb'], losses, fids])
File "D:\Program Files\Python3.5\lib\site-packages\tensorflow\python\client\session.py", line 905, in run
run_metadata_ptr)
File "D:\Program Files\Python3.5\lib\site-packages\tensorflow\python\client\session.py", line 1140, in _run
feed_dict_tensor, options, run_metadata)
File "D:\Program Files\Python3.5\lib\site-packages\tensorflow\python\client\session.py", line 1321, in _do_run
run_metadata)
File "D:\Program Files\Python3.5\lib\site-packages\tensorflow\python\client\session.py", line 1340, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.OutOfRangeError: End of sequence
[[Node: IteratorGetNext = IteratorGetNextoutput_shapes=[[?,64,64,3], [?], [?]], output_types=[DT_FLOAT, DT_STRING, DT_STRING], _device="/job:localhost/replica:0/task:0/device:CPU:0"]]

Caused by op 'IteratorGetNext', defined at:
File "F:/projector/GestureClassification/TripletBasedGestureRecognition/triplet-reid/train.py", line 439, in
main()
File "F:/projector/GestureClassification/TripletBasedGestureRecognition/triplet-reid/train.py", line 280, in main
images, fids, pids = dataset.make_one_shot_iterator().get_next()
File "D:\Program Files\Python3.5\lib\site-packages\tensorflow\python\data\ops\iterator_ops.py", line 366, in get_next
name=name)), self._output_types,
File "D:\Program Files\Python3.5\lib\site-packages\tensorflow\python\ops\gen_dataset_ops.py", line 1484, in iterator_get_next
output_shapes=output_shapes, name=name)
File "D:\Program Files\Python3.5\lib\site-packages\tensorflow\python\framework\op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "D:\Program Files\Python3.5\lib\site-packages\tensorflow\python\framework\ops.py", line 3290, in create_op
op_def=op_def)
File "D:\Program Files\Python3.5\lib\site-packages\tensorflow\python\framework\ops.py", line 1654, in init
self._traceback = self._graph._extract_stack() # pylint: disable=protected-access

OutOfRangeError (see above for traceback): End of sequence
[[Node: IteratorGetNext = IteratorGetNextoutput_shapes=[[?,64,64,3], [?], [?]], output_types=[DT_FLOAT, DT_STRING, DT_STRING], _device="/job:localhost/replica:0/task:0/device:CPU:0"]]

Process finished with exit code 1

@muxizju
Copy link
Author

muxizju commented Sep 27, 2018

I just found the reason. I have only 7 classes or persons in my dataset but I set batch_P as 8.

# Constrain the dataset size to a multiple of the batch-size, so that
# we don't get overlap at the end of each epoch.
dataset = dataset.take((len(unique_pids) // args.batch_p) * args.batch_p)

this step just take(0) as a result and the iteration of data will end at the first iteration then which raise the error mentioned.

It's a silly mistake but I suggest to add a if-else statement to notice this condition

@lucasb-eyer
Copy link
Member

Thanks for updating with the reason. Indeed we could add code catching this mistake, I'd happily accept a PR doing so!

@duyanfang123
Copy link

you do a good job

@mazatov
Copy link

mazatov commented Mar 12, 2020

@muxizju Just came across this as I also have few classes. My question is what happens with the rest of the classes if I say I have 7 classes and Batch_P is 4. What happens with the other 3 remainder classes. Do they get reiterated into the future batches or just ignored?

I just found the reason. I have only 7 classes or persons in my dataset but I set batch_P as 8.

# Constrain the dataset size to a multiple of the batch-size, so that
# we don't get overlap at the end of each epoch.
dataset = dataset.take((len(unique_pids) // args.batch_p) * args.batch_p)

this step just take(0) as a result and the iteration of data will end at the first iteration then which raise the error mentioned.

It's a silly mistake but I suggest to add a if-else statement to notice this condition

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants