Fix actions feedback race #2612

mauropasse · 2024-08-27T11:29:21Z

Fixes issue mentioned in #2451.

When executing the ActionClient, actions feedback & status had predominance over the action results, so if the client spins slower than the server feedback's rate, it will never process the action result (it will be busy processing feedbacks).

Also adding unit test to verify the fix works. The unit test also tests many other actions interactions.

Signed-off-by: Mauro Passerino <[email protected]>

- See ros2#2451 Signed-off-by: Mauro Passerino <[email protected]>

ahcorde · 2024-08-27T11:32:58Z

rclcpp_action/test/test_actions.cpp

+        test_info = std::make_shared<TestInfo>();
+        rclcpp::init(0, nullptr);
+        auto p = GetParam();
+        std::cout << "Test permutation: "


#include <iostream>

done in 13a27f6

ahcorde · 2024-08-27T11:33:07Z

rclcpp_action/test/test_actions.cpp

+                  << (p.use_client_ipc ? "IPC Client }" : "Non-IPC Client }") << std::endl;
+
+        executor = test_info->create_executor(p.use_events_executor);
+        executor_thread = std::thread([&]() {


include <thread>

done in 13a27f6

ahcorde · 2024-08-27T11:33:33Z

rclcpp_action/test/test_actions.cpp

+        rclcpp::shutdown();
+    }
+
+    rclcpp::Executor::UniquePtr executor;


include <memory>

done in 13a27f6

ahcorde · 2024-08-27T11:33:48Z

rclcpp_action/test/test_actions.cpp

+    executor->add_node(server_node);
+    executor->add_node(client_node);
+
+    bool server_available = action_client->wait_for_action_server(std::chrono::seconds(1));


include <chrono>

ahcorde · 2024-08-27T11:36:21Z

rclcpp_action/test/test_actions.hpp

+// See the License for the specific language governing permissions and
+// limitations under the License.#include <gtest/gtest.h>
+
+#pragma once


please use include guards

done in 13a27f6

ahcorde · 2024-08-27T11:36:39Z

rclcpp_action/test/test_actions.hpp

+  rclcpp_action::GoalResponse
+  handle_goal(
+    const rclcpp_action::GoalUUID & uuid,
+    std::shared_ptr<const Fibonacci::Goal> goal)


include <memory>

ahcorde · 2024-08-27T11:37:06Z

rclcpp_action/test/test_actions.hpp

+  }
+
+  bool result_is_correct(
+    std::vector<int> result_sequence,


include <vector>

ahcorde · 2024-08-27T11:37:18Z

rclcpp_action/test/test_actions.hpp

+      return false;
+    }
+
+    for (size_t i = 0; i < result_sequence.size(); i++) {


Suggested change

for (size_t i = 0; i < result_sequence.size(); i++) {

for (size_t i = 0; i < result_sequence.size(); ++i) {

ahcorde · 2024-08-27T11:37:42Z

rclcpp_action/test/test_actions.hpp

+
+private:
+  GoalHandleSharedPtr server_goal_handle_;
+  std::atomic<bool> result_cb_called{false};


include <atomic>

ahcorde · 2024-08-27T11:40:04Z

rclcpp_action/test/test_actions.cpp

+
+    auto result_future = action_client->async_get_result(goal_handle);
+    auto result_response_wait = result_future.wait_for(std::chrono::seconds(5));
+    ASSERT_TRUE(result_response_wait == std::future_status::ready) << "Cancel result response not on time";


include <future> and probably linters will fail here, this line has more then 100 characters

fujitatomoya · 2024-08-27T17:01:49Z

this PR includes #2471 with latest code base, so i will close #2471.

alsora · 2024-08-28T01:14:12Z

rclcpp_action/src/client.cpp

@@ -353,16 +353,6 @@ ClientBase::is_ready(const rcl_wait_set_t & wait_set)

  pimpl_->next_ready_event = std::numeric_limits<size_t>::max();

-  if (is_feedback_ready) {


can you add here as a comment why the order matters (i.e. what you wrote in the PR description)

i second this comment. control service data should be priority which changes action client state.

done in e66d6e5

fujitatomoya · 2024-08-28T04:16:00Z

rclcpp_action/src/client.cpp

@@ -353,16 +353,6 @@ ClientBase::is_ready(const rcl_wait_set_t & wait_set)

  pimpl_->next_ready_event = std::numeric_limits<size_t>::max();

-  if (is_feedback_ready) {


i second this comment. control service data should be priority which changes action client state.

fujitatomoya · 2024-08-28T04:24:52Z

rclcpp_action/test/test_actions.hpp

+    this->server_goal_handle_->abort(result);
+  }
+
+  // Server: Handle goal callback


Do we need this comment? other places do not have this kind of comment...

fujitatomoya · 2024-08-28T05:49:26Z

rclcpp_action/CMakeLists.txt

@@ -79,6 +79,20 @@ if(BUILD_TESTING)

  add_subdirectory(test/benchmark)

+  ament_add_gtest(test_ros2_actions test/test_actions.cpp TIMEOUT 180)


ros2 sounds redundant? how about test_actions?

Suggested change

ament_add_gtest(test_ros2_actions test/test_actions.cpp TIMEOUT 180)

ament_add_gtest(test_actions test/test_actions.cpp TIMEOUT 180)

fujitatomoya · 2024-08-28T05:49:59Z

rclcpp_action/test/test_actions.cpp

+int main(int argc, char **argv)
+{
+    ::testing::InitGoogleTest(&argc, argv);
+    return RUN_ALL_TESTS();
+}


This main function needs to stay here? I do not see any other tests have main function like this?

jmachowinski · 2024-08-29T10:42:10Z

Hard no go from my side. This introduces a new bug, that you would not receive all feedbacks, before the result.
See
rclcpp_action.TestClientAgainstServer.async_send_goal_with_feedback_callback_wait_for_result
for more details.

fujitatomoya · 2024-08-30T00:43:59Z

even though https://design.ros2.org/articles/actions.html#clientserver-interaction-examples does not specify that feedback messages needs to be delivered before result, all examples tell me result comes to the client after feedback messages. besides, if the feedback message comes after the result response, it is strange behavior, i am not sure what application is supposed to process this message in the callback.

and i think, this action design and requirement is already broken with current implementation. because rmw uses different channels for feedback and result, the messages are not queued in order, that means there is always possibility that feedback messages would come after result response. (by changing the order to take the data, it can mitigate a bit but not a perfect solution...)

IMO, if we can make it better for user-experience, changing the order would be acceptable? maybe the order needs to be reconsidered well ? (goal response->cancel response->feedback->status->result response)

any thoughts? @mauropasse @jmachowinski @ahcorde @alsora

mauropasse · 2024-08-30T11:27:17Z

My thoughts are that ROS 2 actions' feedback and status messages are useful during the action's execution, as they allow the user to understand how the process is progressing.

However, the responses (goal/cancel/result) are the final pieces of information when an action has completed, and these should be considered the most important. Feedback and status messages received by the client after the action has finished could (should?) be ignored if the user already knows the action's outcome.

This is why I think responses should be prioritized over feedback. Moreover, a response is sent only once, whereas there can be a large number of feedback messages.

I also want to point out that this issue affects only the single-threaded executor. It does not impact the events executor, as events are processed in the order they are generated.

Also I'm unsure about judging the correctness for the new proposed priority of execution, based on the results of previous tests that fail now?

fujitatomoya · 2024-08-30T15:31:23Z

@mauropasse thanks for your comment!

Feedback and status messages received by the client after the action has finished could (should?) be ignored if the user already knows the action's outcome.

This is i am not sure yet by design. To be honest, i thought that is okay feedback and status messages would come result (or even cancel), and either ActionClient or application can ignore that. Let's wait for more feedback on this.

Also I'm unsure about judging the correctness for the new proposed priority of execution, based on the results of previous tests that fail now?

I believe that @jmachowinski just wanted to confirm this behavior just like me. I think that is totally fine to change the test once behavior is changed.

jmachowinski · 2024-08-31T10:22:04Z

I checked the initial bug report, and must say, the test itself is highly flawed.

What is comes down to is you got a provider running with higher frequency than the consumer is processing.

From my point of view the bug report makes wrong assumptions as to how spin_some works / should work. To be fair, the documentation of the function is misleading, as you need real deep knowledge of the executor internals so know what 'Collect all work' really means. The obvious fix to the problem is to use spin_all.

As to the action code in general, as I stated before I think the design is highly flawed. But I don't see a 'simple' fix for the issue, like the one proposed here.

As to the importance of receiving the last feedback before the goal, I agree with @mauropasse that in a (our) real world application, one can normally ignore feedback and it's not important at all. The problem with this change though is, that it will break the tutorial
https://docs.ros.org/en/jazzy/Tutorials/Intermediate/Writing-an-Action-Server-Client/Cpp.html and possible break user code, that rely on this behavior.

Signed-off-by: Mauro Passerino <[email protected]>

mauropasse · 2024-09-18T07:09:33Z

In my last commits I addressed comments from this PR.
I ended up lowering only the Feedback priority, so:

Feedback > Status > Goal Response > Result Response > Cancel Response (original)
Status > Goal Response > Result Response > Cancel Response > Feedback (new priority)

In this way:

Test modifications on test_client.cpp for it to pass, are minimal.
The demo ros2/demos/action_tutorials/action_tutorials_cpp/src/fibonacci_* has same behavior as before.
There's still the risk of breaking user code.

Mauro Passerino added 2 commits August 27, 2024 12:19

Add test for actions

07dc81a

Signed-off-by: Mauro Passerino <[email protected]>

Fix actions feedback race

eb57f71

- See ros2#2451 Signed-off-by: Mauro Passerino <[email protected]>

mauropasse requested review from ivanpauno, hidmic and wjwwood as code owners August 27, 2024 11:29

mauropasse mentioned this pull request Aug 27, 2024

Humble backport and new fixes irobot-ros/rclcpp#154

Merged

ahcorde requested changes Aug 27, 2024

View reviewed changes

fujitatomoya mentioned this pull request Aug 27, 2024

rclcpp_action: take and execute service entities in priority. #2471

Closed

alsora reviewed Aug 28, 2024

View reviewed changes

fujitatomoya reviewed Aug 28, 2024

View reviewed changes

Mauro Passerino added 2 commits September 18, 2024 07:38

Address PR comments and fix test

13a27f6

Signed-off-by: Mauro Passerino <[email protected]>

add comment

e66d6e5

Signed-off-by: Mauro Passerino <[email protected]>

wjwwood assigned mauropasse Oct 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix actions feedback race #2612

Fix actions feedback race #2612

mauropasse commented Aug 27, 2024

ahcorde Aug 27, 2024

mauropasse Sep 18, 2024

ahcorde Aug 27, 2024

mauropasse Sep 18, 2024

ahcorde Aug 27, 2024

mauropasse Sep 18, 2024

ahcorde Aug 27, 2024

ahcorde Aug 27, 2024

mauropasse Sep 18, 2024

ahcorde Aug 27, 2024

ahcorde Aug 27, 2024

ahcorde Aug 27, 2024

ahcorde Aug 27, 2024

ahcorde Aug 27, 2024

fujitatomoya commented Aug 27, 2024

alsora Aug 28, 2024

fujitatomoya Aug 28, 2024

mauropasse Sep 18, 2024

fujitatomoya Aug 28, 2024

fujitatomoya Aug 28, 2024

fujitatomoya Aug 28, 2024

fujitatomoya Aug 28, 2024

jmachowinski commented Aug 29, 2024

fujitatomoya commented Aug 30, 2024

mauropasse commented Aug 30, 2024

fujitatomoya commented Aug 30, 2024

jmachowinski commented Aug 31, 2024

mauropasse commented Sep 18, 2024

	for (size_t i = 0; i < result_sequence.size(); i++) {
	for (size_t i = 0; i < result_sequence.size(); ++i) {

		@@ -353,16 +353,6 @@ ClientBase::is_ready(const rcl_wait_set_t & wait_set)

		pimpl_->next_ready_event = std::numeric_limits<size_t>::max();

		if (is_feedback_ready) {

		@@ -79,6 +79,20 @@ if(BUILD_TESTING)

		add_subdirectory(test/benchmark)

		ament_add_gtest(test_ros2_actions test/test_actions.cpp TIMEOUT 180)

	ament_add_gtest(test_ros2_actions test/test_actions.cpp TIMEOUT 180)
	ament_add_gtest(test_actions test/test_actions.cpp TIMEOUT 180)

Fix actions feedback race #2612

Are you sure you want to change the base?

Fix actions feedback race #2612

Conversation

mauropasse commented Aug 27, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fujitatomoya commented Aug 27, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jmachowinski commented Aug 29, 2024

fujitatomoya commented Aug 30, 2024

mauropasse commented Aug 30, 2024

fujitatomoya commented Aug 30, 2024

jmachowinski commented Aug 31, 2024

mauropasse commented Sep 18, 2024