Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CommsClient: Binding from multiple nodes doesn't work (even on different ports). #638

Open
peci1 opened this issue Sep 30, 2020 · 2 comments
Assignees
Labels
bug Something isn't working

Comments

@peci1
Copy link
Collaborator

peci1 commented Sep 30, 2020

I've found no mention in documentation that there can be only a single node using CommsClient::Bind() with a given address, no matter the port number. But that is the reality. Further in this issue is my argumentation why this should be considered a bug (either in CommsClient implementation or in documentation).

This is the code executed in CommsClient's OnBind() method:

this->commsModelOnMessageService = nh.advertiseService(
"/" + address, &CommsClient::OnMessageRos, this);

It clearly shows that it creates a service server whose name doesn't depend on port number. If your robot's name is X1, then the service server is /X1 no matter what port you bind to. The code checks that the server is only started once per CommsClient instance, and messages from all ports are gathered by this CommsClient. It then redistributes the messages to the bound clients. But if I start another node with another instance of CommsClient, the service is already advertised and thus the new instance cannot listen for new messages, even if it would listen on different ports than the first instance.

I've spent hours digging into this bug and have come to several thoughts:

  • /$ADDRESS is the worst name one could imagine... it makes sense to me to run robots in namespaces according to their name, and this effectively makes it impossible to remap just the CommsClient's service, because all topics on our robots start with the same prefix. What about /subt/comms/$ADDRESS, /comms/$ADDRESS or similar?
  • Why is it a service at all?

if (_req.dst_address() == subt::communication_broker::kBroadcast)
{
for (const std::string &dest : this->robotNames)
{
ros::service::call(dest, req, res);
}
}
else
{
ros::service::call(_req.dst_address(), req, res);
}

The SubtRosRelay code calls the services, but doesn't care about their result (or even success). Moreover, if some client's processing of the message takes longer, it will block the whole message relay. If it were a standard ROS topic, it would not block and it would even allow multiple binds to the same address:port pair (which is not normally supported with UNIX ports, but could be nice).

  • From the previous point, if multiple binds of the same address:port pair are not allowed, then the user should be allowed to choose which CommsClient will receive the beacon packets. Because so far, the bind for beacons is done automatically in the CommsClient's constructor, so the first one that gets created wins. Not really practical. And I think this would be a very good reason to support multiple binding.
  • (slightly unrelated) The documentation (and comms tutorial) do not explicitly mention the port on which the base station communicates (the default port). It could help getting a clearer understanding of the reporting API.

I'm aware of the code freeze so I do not expect a solution for this issue for cave. However, I strongly suggest to take it into consideration for the final stage. I wanted to have a look into the "mutlirobot coordination" tutorial which is mentioned in the end of the comms tutorial, because that is IMO the place where multi-node binding will be needed (if you don't want to mix coordination and artifact reporting in a single node). But the tutorial has not yet be released.

As a workaround I'm writing a "demux" of the datagram services that will "relay" the service calls to a topic, and a variant of CommsClient that would subscribe to the topic instead of the service. That should allow us to finally run multiple nodes using CommsClient (or our variant of it). If anybody wants, I can share the workaround.

@zbynekwinkler
Copy link

I'd be definitely interested in a ROS node wrapping CommsClient with topics to subscribe to and publish to. We have been suggesting to move to this since #35.

@peci1
Copy link
Collaborator Author

peci1 commented Feb 1, 2021

Solution proposed in #772.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants