Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Intra-Process Communications for all language clients #251

Open
emersonknapp opened this issue Aug 23, 2019 · 15 comments
Open

Intra-Process Communications for all language clients #251

emersonknapp opened this issue Aug 23, 2019 · 15 comments
Labels
backlog enhancement New feature or request

Comments

@emersonknapp
Copy link
Contributor

Description

This issue is a call for a design of zero-copy intra-process communications available to all ROS2 language clients.

The current implementation of this feature exists in rclcpp and therefore is not usable for Python (or less-supported languages C, Java, Rust, etc.)

Acceptance Criteria

To close this issue, we want a design document that proposes the architecture for

  • Intra-process optimized communications
  • zero copy
  • Available to all ROS2 language clients (therefore exists at or below the level of rcl)

As a follow up, will attempt to collect existing thoughts from #239 and add as comments below.

Note

I do not consider myself personally an expert on this, however I'm very interested in collaborating towards a top down view of what this part of the ROS2 core should look like, and figuring out how the community can pull together towards a solution

@dirk-thomas dirk-thomas added the enhancement New feature or request label Aug 23, 2019
@ivanpauno
Copy link
Member

ivanpauno commented Aug 26, 2019

I think that we can take some ideas from Connext "zero copy transfer over shared memory":

That's actually interprocess communication over shared memory, but something similar can be replicated using a buffer instead of a piece of shared memory.

The basic idea is that you have to ask to the publisher for a new message, instead of allocating an unique_ptr:

msg = publisher->get_new_message();
if (msg != nullptr) {
  msg->data = "asd";
  publisher->publish(msg);
}

Currently, message lifetime can be extended to be longer than the scope of the callback (in cpp). That would not be possible if we go ahead with something like this (or at least, it will be really hard to implement that feature).


The implementation could live in rcl or rmw, I'm not sure what would be better.

@allenh1
Copy link

allenh1 commented Aug 26, 2019

@ivanpauno I don't think publisher->get_new_message() ever return nullptr. I'd prefer a more asynchronous way to fetch a message, or potentially blocking on that call instead. I'm not very fond of the blocking call idea, but maybe an asynchronous trigger could be set up?

Maybe it could be set up so that we can std::invoke a callback in the publish() function? This isn't great though, since this would need to be done in rcl, which means it would be wasting cycles checking if there are std::binded callbacks on non-shared memory platforms.

I'm not seeing a way to make this happen in anything above rmw, except of course when there are multiple nodes inside the same process.

Sorry for the rambles, very interested in this idea.

@fujitatomoya
Copy link
Collaborator

just sharing my thought,

The implementation could live in rcl or rmw, I'm not sure what would be better.

i believe that it is better to be implemented in rmw, not rcl.

  • it sounds rmw responsibility to take care of transportation. (rmw)
  • provide consistent/compatible API to frontend, concealed by rmw.
  • taking advantage/comparison of each rmw implementation.

@emersonknapp
Copy link
Contributor Author

Collecting some relevant parts of the previous discussion here for easier review, and to feed the design:

Re: location of implementation @gbiggs wrote

This is a tangential comment, but I wonder if we could achieve the same zero-copies-when-same-process result by reducing the number of copies requires for going into and out of the rmw layer to zero and using a DDS implementation that also supports zero copies (ignoring that there may not be any and that the standard API may not support this, both of which are solvable issues). One of the reasons for using DDS is to push all the communication issues down into an expert-vendor-supplied library, after all.

Re: location of implementation @raghaprasad wrote

How about moving the intra_process_management into an rmw ?
This rmw could handle only intra_process communication and delegate inter-process communication to a any of the chosen DDS rmw implementations.

Support for zero copies is an important objective, but its not the only one. It has been observed that creating DDS participants is pretty resource heavy in terms of net memory required (atleast for FastRTPS & OpenSplice) and the discovery process is CPU intensive (due to multicast).
This new rmw could drastically simplify the discovery process and most certainly reduce the memory footprint by needing only one participant per process to support inter_process communication.

Re: smart-ptr messages @gbiggs wrote

But it is possible to do the rmw and rcl APIs and implementations such that they manage their raw pointers properly and provide a smart_ptr interface-compatible object in rclcpp. I'm not saying it would be easy, but this is how the STL is designed to be used and it would be the most powerful solution.

Re: implementation @ivanpauno wrote

I would like to see something mimicking connext Zero Copy Transfer Over Shared Memory semantics (by default connext use shared memory, but it doesn't use zero copy transfer, which have an specific semantics). Basically, instead of creating a unique pointer and then publishing it:

auto msg = std::make_unique<MSG_TYPE>();
/* Fill the message here */
publisher->publish(std::move(msg))

You ask to the publisher a piece of memory, fill it, and then publish:

auto msg = publisher->new_message();
/* Fill the message here */
publisher->publish(std::move(msg)); // I'm using move semantics because the message will be undefined after calling publish. But how we wrap the msg for this is an implementation detail.

For dds vendors that have implemented zero copy transport, this could just wrap it.
For others, we could have a default implementation that's used in those cases. That implementation could not use shared memory that allows INTERprocess zero copy transport, but just use a preallocated buffer in each publisher that allows INTRAprocess zero copy transport. This implementation is a good start for later doing something like this (if we want to do it).

I also think this idea will look idiomatic in other languages (for example, in python), and performance should be quite similar.

@emersonknapp
Copy link
Contributor Author

A question: do we want to have intra-process communication always optimized in ROS2, regardless of choice of RMW?

If yes we want it always available, what about this idea?

  • an independent full implementation of the RMW API - rmw_intraprocess
  • instantiate both
    • rmw_intraprocess for use by nodes within the same process
    • The cross-process rmw implementation chosen via environment
  • have rcl or rmw layer route API calls to the appropriate of the two co-existing RMWs based on whether the communication is within the process

Or, this is a possible outcome, should we just expect that intraprocess communications should be the job of the choice of RMW implementation, and just push development to add this to our RMW impl of choice, e.g. FastRTPS or CycloneDDS or wherever?

@dirk-thomas
Copy link
Member

How about moving the intra_process_management into an rmw ?
This rmw could handle only intra_process communication and delegate inter-process communication to a any of the chosen DDS rmw implementations.

Support for zero copies is an important objective, but its not the only one. It has been observed that creating DDS participants is pretty resource heavy in terms of net memory required (atleast for FastRTPS & OpenSplice) and the discovery process is CPU intensive (due to multicast).
This new rmw could drastically simplify the discovery process and most certainly reduce the memory footprint by needing only one participant per process to support inter_process communication.

The overhead described here is addressed by the proposal in #250 and isn't related to intra process communication. Even with intra process communication every node / participant has to perform discovery and comes with that overhead.

@ivanpauno
Copy link
Member

@ivanpauno I don't think publisher->get_new_message() ever return nullptr. I'd prefer a more asynchronous way to fetch a message, or potentially blocking on that call instead. I'm not very fond of the blocking call idea, but maybe an asynchronous trigger could be set up?

I guess that it's possible to not return ever nullptr (probably with locking behavior), I just added it because I'm not super sure about how the implementation would be.

i believe that it is better to be implemented in rmw, not rcl.

  • it sounds rmw responsibility to take care of transportation. (rmw)
  • provide consistent/compatible API to frontend, concealed by rmw.
  • taking advantage/comparison of each rmw implementation.

I agree, specially with the first and last points.
Each time I think about the intraprocess communication problem, I'm more convinced that it's a problem that should be addressed by the underlying middleware (FastRTPS, Connext, OpenSplice, etc), and we should only wrap their zero copy transfer API. Of course, that's probably out of our scope and we have to provide a solution on top of the middleware. But that have the cost of re-implementing a lot of things (supporting a lot of different QoS features, etc).

Or, this is a possible outcome, should we just expect that intraprocess communications should be the job of the choice of RMW implementation, and just push development to add this to our RMW impl of choice, e.g. FastRTPS or CycloneDDS or wherever?

👍

@qootec
Copy link

qootec commented Sep 17, 2019

I initially posted this as an topic on answers.ros.org (see https://answers.ros.org/question/333180/ros2-micro-ros-intra-process/) but was advised by the moderator to move it to discourse... I think the core of my concern touches your discussion.

(My context: ROS2 inside a machine controller)

Looking at your proposals for intra-process communication, I fail to see whether you also take into account the multi-priority requirements such (often embedded) environments typically have.

I currently see fragmented solution elements or approaches:

  • From Micro-ROS: Multiple executors could be hosted in the same process/node, each having their own queue for messages (or in fact their handlers) of the corresponding priority (based on their handlers' callbackgroup priority).

  • From ROS2: ROS2 does not create its own queuing mechanism, but instead relies on the queues already available in the DDS middleware.

  • From ROS2 (close to this topic): use_intra_process_comms() … if true, messages will go through a special intra-process communication code path. So potentially excluding DDS. Then how will they get queued / priority managed?

  • (RTI) DDS has a Transport_Priority_QoS defined per DataWriter, which is then to be kept in sync with the cbGroup priority?

Is there any documented vision on how your intra-process-communication would co-exist with multi-priority queuing/handling?

Johan

@gavanderhoorn
Copy link
Contributor

I initially posted this as an topic on answers.ros.org (see https://answers.ros.org/question/333180/ros2-micro-ros-intra-process/) but was advised by the moderator to move it to discourse...

I did, but this is not the embedded category on ROS Discourse.

@atyshka
Copy link

atyshka commented Aug 12, 2020

Any updates on this roughly a year later?

@ivanpauno
Copy link
Member

Any updates on this roughly a year later?

Not that I know of.
The problem isn't trivial, and AFAIK there is no people assigned to work on it.

@twaddellberkeley
Copy link

Hi @ivanpauno, is there any work on this problem, if not do you need help? Would love to dive into it.

Cheers

@ivanpauno
Copy link
Member

AFAIK, nobody is working on this right now.
I'm not sure if there's a plan to work on the topic soon.

@emersonknapp
Copy link
Contributor Author

I'm not sure, but does the Cyclone+iceoryx combo do this automatically for C++ nodes in the same process?

@ivanpauno
Copy link
Member

I'm not sure, but does the Cyclone+iceoryx combo do this automatically for C++ nodes in the same process?

Not zero copy, zero-copy requires a different API.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backlog enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

9 participants