Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IndexError: wait set index too big when calling service early in node lifetime #1133

Closed
dhood opened this issue Jun 12, 2023 · 7 comments
Closed

Comments

@dhood
Copy link
Member

dhood commented Jun 12, 2023

Bug report

During spinning in the following code I received a traceback ending with IndexError: wait set index too big
I'm using a multithreaded executor with multiple nodes in the same file; which was launched in a separate process by ros2 launch during a test.

        self.service_client_node.stop_servo_future = (
            self.service_client_node.stop_servo_cli.call_async(
                self.service_client_node.stop_servo_req
            )
        )
        rclpy.spin_until_future_complete(
            self.service_client_node, self.service_client_node.stop_servo_future
        )
        return self.service_client_node.stop_servo_future.result()

Full traceback:

Exception in thread Thread-1 (spin):
Traceback (most recent call last):
  File "/usr/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.10/threading.py", line 953, in run
    self._target(*self._args, **self._kwargs)
  File "/home/dhood/src/test/install/test_control/lib/python3.10/site-packages/test_control/test_robotgui.py", line 1388, in spin
    self.executor.spin()
  File "/opt/ros/humble/local/lib/python3.10/dist-packages/rclpy/executors.py", line 279, in spin
    self.spin_once()
  File "/opt/ros/humble/local/lib/python3.10/dist-packages/rclpy/executors.py", line 764, in spin_once
    self._spin_once_impl(timeout_sec)
  File "/opt/ros/humble/local/lib/python3.10/dist-packages/rclpy/executors.py", line 761, in _spin_once_impl
    future.result()
  File "/opt/ros/humble/local/lib/python3.10/dist-packages/rclpy/task.py", line 94, in result
    raise self.exception()
  File "/opt/ros/humble/local/lib/python3.10/dist-packages/rclpy/task.py", line 239, in __call__
    self._handler.send(None)
  File "/opt/ros/humble/local/lib/python3.10/dist-packages/rclpy/executors.py", line 418, in handler
    await call_coroutine(entity, arg)
  File "/opt/ros/humble/local/lib/python3.10/dist-packages/rclpy/executors.py", line 372, in _execute_service
    response = await await_or_execute(srv.callback, request, srv.srv_type.Response())
  File "/opt/ros/humble/local/lib/python3.10/dist-packages/rclpy/executors.py", line 107, in await_or_execute
    return callback(*args)
  File "/home/dhood/src/test/install/test_control/lib/python3.10/site-packages/test_control/test_robotgui.py", line 661, in move_to_testing_pose_service_callback
    self.moveRobotToPoint(
  File "/home/dhood/src/test/install/test_control/lib/python3.10/site-packages/test_control/test_robotgui.py", line 916, in moveRobotToPoint
    self.scanningModeStart()
  File "/home/dhood/src/test/install/test_control/lib/python3.10/site-packages/test_control/test_robotgui.py", line 863, in scanningModeStart
    self.stop_servo_request()
  File "/home/dhood/src/test/install/test_control/lib/python3.10/site-packages/test_control/test_robotgui.py", line 1249, in stop_servo_request
    rclpy.spin_until_future_complete(
  File "/opt/ros/humble/local/lib/python3.10/dist-packages/rclpy/__init__.py", line 248, in spin_until_future_complete
    executor.spin_until_future_complete(future, timeout_sec)
  File "/opt/ros/humble/local/lib/python3.10/dist-packages/rclpy/executors.py", line 288, in spin_until_future_complete
    self.spin_once_until_future_complete(future, timeout_sec)
  File "/opt/ros/humble/local/lib/python3.10/dist-packages/rclpy/executors.py", line 716, in spin_once_until_future_complete
    self.spin_once(timeout_sec)
  File "/opt/ros/humble/local/lib/python3.10/dist-packages/rclpy/executors.py", line 705, in spin_once
    handler, entity, node = self.wait_for_ready_callbacks(timeout_sec=timeout_sec)
  File "/opt/ros/humble/local/lib/python3.10/dist-packages/rclpy/executors.py", line 691, in wait_for_ready_callbacks
    return next(self._cb_iter)
  File "/opt/ros/humble/local/lib/python3.10/dist-packages/rclpy/executors.py", line 610, in _wait_for_ready_callbacks
    if wt in waitables and wt.is_ready(wait_set):
  File "/opt/ros/humble/local/lib/python3.10/dist-packages/rclpy/qos_event.py", line 90, in is_ready
    if wait_set.is_ready('event', self._event_index):
IndexError: wait set index too big

I see there's been rework of the qos_event.py file between Humble and Iron. I'm on Humble; any chance that this is something the team has experienced themselves, and that this has been addressed on Iron?

Required Info:

  • Operating System:
    Ubuntu 22.04
  • Installation type:
    Binaries, humble
  • Version or commit hash:
    rclpy: 3.3.8-2jammy.20230426.045804
  • DDS implementation:
    Cyclone

Steps to reproduce issue

I believe it to be a matter of chance; not reliably reproducible.

It happened for me when a node sent a service call immediately after its bringup so maybe discovery triggered it. Perhaps logging services were getting connected..?

Expected behavior

Waitset has the appropriate size; perhaps it's ok for rclpy to catch this issue and try to wait again rather than raising?

Actual behavior

My process was terminated because of the exception.

@clalancette
Copy link
Contributor

I see there's been rework of the qos_event.py file between Humble and Iron. I'm on Humble; any chance that this is something the team has experienced themselves, and that this has been addressed on Iron?

The rework of qos_event to event_handler for Iron was mostly a cosmetic change, so we could add other events that weren't directly related to QoS. So that particular reworking is unlikely to have changed anything here.

I don't have direct experience with the rest of the what is going on here. It does seem somewhat suspicious to me that the QoSEventHandler class is storing an event_index; it seems like something that could disappear from the wait set later on.

@dhood
Copy link
Member Author

dhood commented Jun 14, 2023

ok, it helps at least to know that it's not a known issue affecting Humble only. Sorry that I don't have a reproducible example; I'll keep an eye out if there is more info I can collect! But I had also been thinking that the waitset might just be generally subject to a race condition, reproducible example or not....

@dhood
Copy link
Member Author

dhood commented Oct 13, 2023

Good news! this seemed to be caused by a bug on our end, where a node was being spun in two threads. So I'll close this, thanks!

@dhood dhood closed this as completed Oct 13, 2023
@ashwanthkumar1007
Copy link

Hi @dhood I'm facing the same issue, and I'm running an additional thread as well, can you please share how you have fixed the issue? It would be very much helpful

@dhood
Copy link
Member Author

dhood commented Apr 23, 2024

Ideally you redesign your system so that nodes get passed up to an executor and spun in a single place, like this https://github.com/ros2/examples/blob/master/rclpy/executors/examples_rclpy_executors/composed.py#L33

For us, we were waiting for an action to complete, so I added a function WAIT_until_future_complete that is similar to spin_until_future_complete but without spinning the node; it just sleeps for little bits, assuming somewhere else is responsible for spinning the node.

Or if you want a quick hack to avoid redesign, you could replace your spin_until_future_complete call with a try/catch that ignores the error and spins again

@astroseger
Copy link

It seems I have a similar issue in Jazzy when I spin a node with rclpy.spin() and use rclpy.spin_until_future_complete in another thread.

However, I don't encounter this issue if I use MultiThreadedExecutor to spin a node and still use rclpy.spin_until_future_complete in another thread.

@dhood, is it still recommended to use one MultiThreadedExecutor to spin everything in the code, or is it acceptable in Jazzy to use rclpy.spin_until_future_complete in another thread while spinning other nodes with MultiThreadedExecutor?

@astroseger
Copy link

I've found that SingleThreadedExecutor also works for my use case. When I spin a node with SingleThreadedExecutor and then call rclpy.spin_until_future_complete in another thread, I don’t encounter any errors. SingleThreadedExecutor is preferable for my situation, as MultiThreadedExecutor has performance issues #1223

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants