You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In my efforts to make navigation have some heartbeats I found a particular issue that plagues both ROS1 and ROS2 that would only show up when:
Have long running bringup processes ( greater than heartbeat timeout rate, 4 s by default)
Have remote servers that take some time to connect / discover (for ROS2) that can take more than the heartbeat timeout rate
The default state on bring up is WaitingForSister, which does not have a heartbeat timeout override (https://github.com/ros/bond_core/blob/kinetic-devel/bondcpp/src/BondSM_sm.cpp#L63-L147). Because of this, when the heartbeat timeout is triggered, it will call Default() and throw an exception (and probably crash). This is even though the Connection timeout hasn't yet been triggered because by default its set to 10 seconds, larger than the heartbeat timeout. That means in the case that it takes greater than the heartbeat timeout duration to connect, your liable to crash even though there's valid time left to connect via that timeout.
The 2 solutions without modifying bond are:
Increase your heartbeat timeout to always be larger than your connection timeout, which is a terrible idea. If you want to know that your servers are active but takes a while for everything to connect, you don't want to have to set your heart beat timeout any longer than necessary.
Guarantee that your server can always get up and running in less than heartbeat timeout period. Which given ROS2 and DDS discovery issues and use of remote servers, isn't necessary reliable to 100% of the time work when valid.
I think we need to fix bond in ROS1/ROS2 to deal with this bring up issue better. I think the waiting for sister state needs a valid heartbeat timeout override that won't throw an exception and won't do anything until after connection timeout or the first active status message is received.
The text was updated successfully, but these errors were encountered:
It seems to me that there shouldn't be a heartbeat during the bring up because of the reason that you stated. It was introducing flakiness in the test_bond tests.
For the ROS 2 implementation, I moved the beginning of the heartbeat for once the connection is established, which I believe was the correct intended behavior in #87.
Do you think this is a valid approach? That changeset is still unreleased, so we can modify again before releasing into Humble/Rolling.
That sounds reasonable to me, though I'm pretty sure that Bond is already in Humble/Rolling since Nav2 has binaries out and depends on this. So there's already been a release
Hi,
In my efforts to make navigation have some heartbeats I found a particular issue that plagues both ROS1 and ROS2 that would only show up when:
The default state on bring up is WaitingForSister, which does not have a heartbeat timeout override (https://github.com/ros/bond_core/blob/kinetic-devel/bondcpp/src/BondSM_sm.cpp#L63-L147). Because of this, when the heartbeat timeout is triggered, it will call
Default()
and throw an exception (and probably crash). This is even though the Connection timeout hasn't yet been triggered because by default its set to 10 seconds, larger than the heartbeat timeout. That means in the case that it takes greater than the heartbeat timeout duration to connect, your liable to crash even though there's valid time left to connect via that timeout.The 2 solutions without modifying bond are:
I think we need to fix bond in ROS1/ROS2 to deal with this bring up issue better. I think the waiting for sister state needs a valid heartbeat timeout override that won't throw an exception and won't do anything until after connection timeout or the first active status message is received.
The text was updated successfully, but these errors were encountered: