Fix JTC segfault #164

matthew-reynolds · 2021-04-05T01:48:10Z

Purpose

Addresses #132. Fixes a segfault that could occur in regular use of the joint_trajectory_controller, as exposed by the unit tests.

Summary

There was a race condition in the JTC, where rt_active_goal_ could be reset() in one thread (joint_trajectory_controller.cpp:568) but then dereferenced elsewhere. This would cause a segfault.

For example, in JointTrajectoryController::update(), we check that rt_active_goal_ is non-null, and then dereference it a couple lines later. But if we get unlucky with timing, we can reset the shared_ptr in another thread in-between the non-null check and the dereference. We don't currently have the appropriate thread safety mecanisms in place.

By taking a copy of the rt_active_goal_ shared ptr before checking and using it, we ensure the local copy will never becoming invalid while we're holding it. The RealtimeBuffer is required since we need to read and write to the shared ptr concurrently from multiple threads.

Testing done

✔️ colcon test --packages-select joint_trajectory_controller --retest-until-fail 100

matthew-reynolds · 2021-04-05T01:56:42Z

joint_trajectory_controller/src/joint_trajectory_controller.cpp

+  // Update the active goal
+  RealtimeGoalHandlePtr rt_goal = std::make_shared<RealtimeGoalHandle>(goal_handle);
+  rt_goal->preallocated_feedback_->joint_names = joint_names_;
+  rt_goal->execute();
+  rt_active_goal_.writeFromNonRT(rt_goal);
+
  // Setup goal status checking timer
  goal_handle_timer_ = node_->create_wall_timer(
    action_monitor_period_.to_chrono<std::chrono::seconds>(),
-    std::bind(&RealtimeGoalHandle::runNonRealtime, rt_active_goal_));
+    std::bind(&RealtimeGoalHandle::runNonRealtime, rt_goal));
 }


I think this needs further work, probably in a follow-up PR.

In both the old and new code, I worry that we might end up in a condition where we're overwriting the goal_handle_timer_ while the previous rt_active_goal_ has pending operations, and thus dropping those operations. I think we should call goal_handle_timer_::execute_callback() or rt_active_goal_.readFromNonRT()->runNonRealtime() before creating the new timer.

Similarly, I think we could probably end up in a race condition where something is resetting the rt_active_goal_ right after this function writes the new rt_goal, and we could end up losing the goal handle. This would involve a little more work to solve, for example only resetting the rt_active_goal_ if new_data_available_ == false. That variable is private, but something along those lines.

Would appreciate your thoughts so we can consider appropriate follow-up PRs. This PR leaves the behaviour unchanged from before, and I haven't yet run into either of those cases, so I think we're ok to leave it to a future task.

This write-up above is a perfect start for the description of a follow-up issue ;)

@destogl beat me to it! See #166

matthew-reynolds · 2021-04-05T02:02:34Z

The CI failure looks unrelated: colcon: error: Mixin 'coverage-gcc' is not available for 'test'.

ros-tooling/setup-ros v0.1.3 was just released a few days ago, maybe something changed?

Edit: Indeed, looks like a bump in colcon-mixin changed a warning to an error. See ros-tooling/action-ros-ci#525. I will open a PR to update our CI to use action-ros-ci v0.2.0

Edit 2: PR opened, see #165

joint_trajectory_controller/src/joint_trajectory_controller.cpp

bmagyar

looks good to me, just 2 notes about creating follow-up issues

bmagyar · 2021-04-08T09:08:01Z

Thanks a bunch for this fix, a big step toward a green pipeline! :D

matthew-reynolds added 2 commits April 4, 2021 21:31

Use a copy of the rt_active_goal to avoid segfault

93d2ce5

Use RealtimeBuffer for thread-safety

01d68f1

matthew-reynolds commented Apr 5, 2021

View reviewed changes

matthew-reynolds mentioned this pull request Apr 5, 2021

Bump action-ros-ci to v0.2 #165

Closed

2 tasks

destogl mentioned this pull request Apr 7, 2021

[JointTrajectoryController] Avoid potential race conditions #166

Open

bmagyar reviewed Apr 7, 2021

View reviewed changes

joint_trajectory_controller/src/joint_trajectory_controller.cpp Show resolved Hide resolved

bmagyar reviewed Apr 7, 2021

View reviewed changes

bmagyar merged commit 0a6fe52 into ros-controls:master Apr 8, 2021

matthew-reynolds deleted the fix_jtc_segfault branch April 8, 2021 13:39

matthew-reynolds mentioned this pull request Apr 8, 2021

[JTC] Add link to TODOs to provide better trackability #169

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix JTC segfault #164

Fix JTC segfault #164

matthew-reynolds commented Apr 5, 2021

matthew-reynolds Apr 5, 2021 •

edited

Loading

bmagyar Apr 7, 2021

matthew-reynolds Apr 8, 2021

matthew-reynolds commented Apr 5, 2021 •

edited

Loading

bmagyar left a comment

bmagyar commented Apr 8, 2021

Fix JTC segfault #164

Fix JTC segfault #164

Conversation

matthew-reynolds commented Apr 5, 2021

Purpose

Summary

Testing done

matthew-reynolds Apr 5, 2021 • edited Loading

Choose a reason for hiding this comment

bmagyar Apr 7, 2021

Choose a reason for hiding this comment

matthew-reynolds Apr 8, 2021

Choose a reason for hiding this comment

matthew-reynolds commented Apr 5, 2021 • edited Loading

bmagyar left a comment

Choose a reason for hiding this comment

bmagyar commented Apr 8, 2021

matthew-reynolds Apr 5, 2021 •

edited

Loading

matthew-reynolds commented Apr 5, 2021 •

edited

Loading