-
Notifications
You must be signed in to change notification settings - Fork 90
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Switch to one participant per context model #145
Conversation
There's a problem in
That's needed, so a reader/writer can be matched with a
Yeah, I agree. I was thinking to move all that logic there, but I didn't have the time to do it. |
What can happen is that a reader is shown when listing subscriptions, but not shown when listing the subscription that a node created. This could be probably avoided if the information is somehow communicated through reader/writer userdata (see ros2/design#250 (comment)). |
@eboasson this was fast! I will do a review ASAP. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've left some minimal comments and questions, otherwise LGTM with green CI.
I'm not used to cyclonedds API, so I might be missing some details.
rmw_cyclonedds_cpp/CMakeLists.txt
Outdated
@@ -64,13 +69,18 @@ target_include_directories(rmw_cyclonedds_cpp PUBLIC | |||
$<INSTALL_INTERFACE:include> | |||
) | |||
|
|||
target_link_libraries(rmw_cyclonedds_cpp CycloneDDS::ddsc) | |||
target_link_libraries(rmw_cyclonedds_cpp | |||
${rmw_dds_common_LIBRARIES__rosidl_typesupport_introspection_cpp} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this line shouldn't be necessary after ros2/rmw_dds_common#12
@@ -0,0 +1,53 @@ | |||
// Copyright 2019 Open Source Robotics Foundation, Inc. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file shouldn't be necessary after ros2/rmw_dds_common#12.
} | ||
/* I suppose I could attach lambda functions one way or another, which would | ||
definitely be more elegant, but this avoids having to deal with the C++ | ||
freakishness that is involved and works, too. */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is fine, using lambdas would probably be more confusing.
std::lock_guard<std::mutex> lock(gcdds.domains_lock); | ||
if (!check_create_domain_locked(did, localhost_only)) { | ||
return nullptr; | ||
if (!check_create_domain(domain_id, options->localhost_only)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In rmw_fastrtps
, a Participant
is created when the first Node
is created within a Context
.
This is also a valid option, and shouldn't create cross-vendor problems.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have no idea when the context gets initialized relative to when the first node is created. If the two always go hand in hand, then it is not worth changing. But if applications often create the context much earlier (say, before checking their command-line arguments), it would be better to do it the same way as in rmw_fastrtps
. I'm happy to go with this for now.
rmw_cyclonedds_cpp/src/rmw_node.cpp
Outdated
if (ppant_qos == nullptr) { | ||
return RMW_RET_BAD_ALLOC; | ||
} | ||
std::string user_data = std::string("securitycontext=") + std::string( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
securitycontext
was renamed enclave
rmw_cyclonedds_cpp/src/rmw_node.cpp
Outdated
// - Whatever the graph cache implementation does, it shouldn't involve much more than local state | ||
// updates and triggering a guard condition, and so that should be safe. | ||
// - There might be a significant cost to calling the graph cache function to update its internal | ||
// state, and having that occur on the thread receiving data from the network may not be wise |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, the last point is the main reason I've used a separate thread.
rmw_dds_common
can probably be largely improved (and the way ros discovery data is shared too).
context->implementation_identifier, | ||
eclipse_cyclonedds_identifier, | ||
return RMW_RET_INCORRECT_RMW_IMPLEMENTATION); | ||
delete context->impl; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this deleting everything created in rmw_init
correctly? I didn't check thoroughly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch! The RMW publisher, subscription and guard condition objects for interfacing wiht the graph cache weren't deleted (the underlying DDS objects did get deleted as a consequence of deleting the participant).
I've added deleting them to the rmw_context_impl_t
destructor and adjusted the initialisation code a bit so that I think it should all work fine now. I've tried it, also with some fault injection, but I might be getting some of the C++-ey bits wrong. @rotu perhaps you could have a look as well?
Signed-off-by: Erik Boasson <[email protected]>
Signed-off-by: Erik Boasson <[email protected]>
Signed-off-by: Erik Boasson <[email protected]>
There is no benefit to having a pair per node. Signed-off-by: Erik Boasson <[email protected]>
Signed-off-by: Erik Boasson <[email protected]>
c8108b4
to
81b6c61
Compare
@ivanpauno I've rebased it to get all the updates |
Signed-off-by: Erik Boasson <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor comment, LGTM!
@eboasson Either this PR introduced a race condition, or this change randomly triggered a race condition elsewhere. I wouldn't discard the second option, considering ros2/build_farmer#248. Let me know if you can dig into the problem. |
Signed-off-by: Erik Boasson <[email protected]>
I should be able to at least do some investigation on macOS |
Signed-off-by: Erik Boasson <[email protected]>
@ivanpauno instead of hanging, that test fails for me:
There's no way the QoS matching suddenly stopped working, but it could be a publisher process that keeps running for too long. That also bears some similarity to a hanging test. I also get, at the very end:
and that is weird, too, and might be a clue to the other symptoms. (@dennis-adlink has kindly put some effort into reproducing it as well so I could do some other urgent things, but last I heard he didn't encounter any failure.) These details need to be understood first ... Also, the CI run now fails because I added a function to Cyclone ( |
The gcdds destructur always runs upon termination of the process, also when the process hasn't deleted the RMW objects first, and so possibly before the rmw_context_impl_t destructor runs. As a consequence the discovery thread may never be stopped and run into an error from dds_waitset_wait because the waitset has been deleted. Deleting the domain in the destructor of rmw_context_impl_t avoids this problem. Signed-off-by: Erik Boasson <[email protected]>
The
message was caused by a global destructor shutting down Cyclone DDS while the discovery thread was still running, which in turn was a consequence of the process not deleting the RMW objects before terminating. I've pushed a fix for that (which simply means it ends abruptly, but that is probably the intended behaviour). Other than that, the error I'm getting:
is caused by test script being unable to handle the warnings for incompatible QoS pairs that ROS2 now prints. When using Fast-RTPS the test succeeds because that Fast-RTPS doesn't support the feature yet (eProsima/Fast-DDS#852). Each test takes ridiculously long, but I have not seen any hanging tests. So I assume your hunch that the hangs tests might be caused by some things in other packages is correct. I think we ought to do another CI run with the fixes I pushed since the previous one and see what comes out. |
We're also using
I will give it a try locally and then run CI, using both this change and |
Locally, it works fine. Edit: Originally, I triggered some of the jobs with only |
@eboasson Failures are unrelated, so I will merge this after the update in Thanks for working on this! |
If you mean ros2/ros2#905, then yes, this corrects my mistake. If you mean something else, please elaborate. |
@ivanpauno @eboasson I think this commit is causing build warnings on Linux:
|
Sorry, I checked the test failures but I didn't see the warnings. |
I think this created a build mess by introducing a dependency on FastRTPS.
|
No, it doesn't. Please continue the discussion in ros2/rmw_dds_common#16. |
Since ros2#145, the CI build of rmw_cyclonedds_cpp has been failing on Windows due to inadvertently injecting fastrtps into the build process. fastrtps fails to build (eProsima/Fast-DDS#1173) causing the CI to fail. There doesn't seem to be a better way to suppress this in action-ros-ci ros-tooling/action-ros-ci#177 Fixes ros2#164
Since ros2#145, the CI build of rmw_cyclonedds_cpp has been failing on Windows due to inadvertently injecting fastrtps into the build process. fastrtps fails to build (eProsima/Fast-DDS#1173) causing the CI to fail. There doesn't seem to be a better way to suppress this in action-ros-ci ros-tooling/action-ros-ci#177 Fixes ros2#164
Since ros2#145, the CI build of rmw_cyclonedds_cpp has been failing on Windows due to inadvertently injecting fastrtps into the build process. fastrtps fails to build (eProsima/Fast-DDS#1173) causing the CI to fail. There doesn't seem to be a better way to suppress this in action-ros-ci ros-tooling/action-ros-ci#177 Fixes ros2#164
Since ros2#145, the CI build of rmw_cyclonedds_cpp has been failing on Windows due to inadvertently injecting fastrtps into the build process. fastrtps fails to build (eProsima/Fast-DDS#1173) causing the CI to fail. There doesn't seem to be a better way to suppress this in action-ros-ci ros-tooling/action-ros-ci#177 Fixes ros2#164
Since ros2#145, the CI build of rmw_cyclonedds_cpp has been failing on Windows due to inadvertently injecting fastrtps into the build process. fastrtps fails to build (eProsima/Fast-DDS#1173) causing the CI to fail. There doesn't seem to be a better way to suppress this in action-ros-ci ros-tooling/action-ros-ci#177 Fixes ros2#164
Since #145, the CI build of rmw_cyclonedds_cpp has been failing on Windows due to inadvertently injecting fastrtps into the build process. fastrtps fails to build (eProsima/Fast-DDS#1173) causing the CI to fail. There doesn't seem to be a better way to suppress this in action-ros-ci ros-tooling/action-ros-ci#177 Fixes #164
@ivanpauno this seems to work and it would be great if you could find the time to look it over. I have only tested it locally using rcl/test and test_communication.
rmw_dds_common
.I think there'll be at least one round of cleaning up to come 😄 and I'm also not quite sure I've done the right thing with the unique_ptrs (not that much of a C++ man ... @rotu will hopefully be able to advise there).