Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rcl_shutdown is not thread-safe when used in both signal_handler and Context.__exit__. #1352

Open
YuanYuYuan opened this issue Sep 2, 2024 · 4 comments

Comments

@YuanYuYuan
Copy link

YuanYuYuan commented Sep 2, 2024

Bug report

We found this sporadic failure with rclpy (rolling) due to the race condition while calling rcl_shutdown.
In this issue, the conflict happens if rmw_shutdown is slow so that rcutils_atomic_store on Thread 1 is set after the check rcl_context_is_valid on Thread 2. Therefore rcl_shutdown would be called twice and cause an error.

Analysis

Required Info:

@Barry-Xu-2018
Copy link
Contributor

After checking code, there is a issue.

Thread 1
Before calling rcl_shutdown(), g_contexts_mutex is locked.

namespace rclpy
{
void shutdown_contexts()
{
// graceful shutdown all contexts
std::lock_guard<std::mutex> guard{g_contexts_mutex};
for (auto * c : g_contexts) {
rcl_ret_t ret = rcl_shutdown(c);
(void)ret;
}
g_contexts.clear();
}

Thread 2

Before calling rcl_shutdown(), g_contexts_mutex isn't used.

void
Context::shutdown()
{
{
std::lock_guard<std::mutex> guard{g_contexts_mutex};
auto iter = std::find(g_contexts.begin(), g_contexts.end(), rcl_context_.get());
if (iter != g_contexts.end()) {
g_contexts.erase(iter);
}
}
rcl_ret_t ret = rcl_shutdown(rcl_context_.get());
if (RCL_RET_OK != ret) {
throw RCLError("failed to shutdown");
}
}

If thread 2 executes rcl_shutdown first, thread 1 can still call rcl_shutdown again.
So rcl_shutdown should be safeguarded by g_contexts_mutex in Context::shutdown().

I create a fix #1353.
Could you test in your environment to see if the issue no longer occurs with this PR ?

@YuanYuYuan
Copy link
Author

Hi @Barry-Xu-2018! Thanks for your attempt. But it seemed not to work...

@Barry-Xu-2018
Copy link
Contributor

My fixing only prevented simultaneous calls to rcl_shutdown().
Another problem is that repeated calls to shutdown() cause an exception.

I have updated fixing. Please try again.

Currently, rclpy is designed to throw an exception if rcl_shutdown() is called multiple times on the same context (there's a specific test case for this). So, the error is expected behavior. However, in your situation, it shouldn't throw an error, but rather ignore the second shutdown call.
I will discuss with other members on how to fix this problem.

@YuanYuYuan
Copy link
Author

Hi @Barry-Xu-2018, I have confirmed your latest fix resolves the issue (no more duplicated rcl_shutdown). Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants