-
Notifications
You must be signed in to change notification settings - Fork 334
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix JTC segfault #164
Fix JTC segfault #164
Conversation
// Update the active goal | ||
RealtimeGoalHandlePtr rt_goal = std::make_shared<RealtimeGoalHandle>(goal_handle); | ||
rt_goal->preallocated_feedback_->joint_names = joint_names_; | ||
rt_goal->execute(); | ||
rt_active_goal_.writeFromNonRT(rt_goal); | ||
|
||
// Setup goal status checking timer | ||
goal_handle_timer_ = node_->create_wall_timer( | ||
action_monitor_period_.to_chrono<std::chrono::seconds>(), | ||
std::bind(&RealtimeGoalHandle::runNonRealtime, rt_active_goal_)); | ||
std::bind(&RealtimeGoalHandle::runNonRealtime, rt_goal)); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this needs further work, probably in a follow-up PR.
In both the old and new code, I worry that we might end up in a condition where we're overwriting the goal_handle_timer_
while the previous rt_active_goal_
has pending operations, and thus dropping those operations. I think we should call goal_handle_timer_::execute_callback()
or rt_active_goal_.readFromNonRT()->runNonRealtime()
before creating the new timer.
Similarly, I think we could probably end up in a race condition where something is resetting the rt_active_goal_
right after this function writes the new rt_goal
, and we could end up losing the goal handle. This would involve a little more work to solve, for example only resetting the rt_active_goal_
if new_data_available_ == false
. That variable is private, but something along those lines.
Would appreciate your thoughts so we can consider appropriate follow-up PRs. This PR leaves the behaviour unchanged from before, and I haven't yet run into either of those cases, so I think we're ok to leave it to a future task.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This write-up above is a perfect start for the description of a follow-up issue ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The CI failure looks unrelated: ros-tooling/setup-ros v0.1.3 was just released a few days ago, maybe something changed? Edit: Indeed, looks like a bump in Edit 2: PR opened, see #165 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good to me, just 2 notes about creating follow-up issues
Thanks a bunch for this fix, a big step toward a green pipeline! :D |
Purpose
Addresses #132. Fixes a segfault that could occur in regular use of the joint_trajectory_controller, as exposed by the unit tests.
Summary
There was a race condition in the JTC, where
rt_active_goal_
could bereset()
in one thread (joint_trajectory_controller.cpp:568
) but then dereferenced elsewhere. This would cause a segfault.For example, in
JointTrajectoryController::update()
, we check thatrt_active_goal_
is non-null, and then dereference it a couple lines later. But if we get unlucky with timing, we can reset the shared_ptr in another thread in-between the non-null check and the dereference. We don't currently have the appropriate thread safety mecanisms in place.By taking a copy of the
rt_active_goal_
shared ptr before checking and using it, we ensure the local copy will never becoming invalid while we're holding it. TheRealtimeBuffer
is required since we need to read and write to the shared ptr concurrently from multiple threads.Testing done
✔️
colcon test --packages-select joint_trajectory_controller --retest-until-fail 100