Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for running multiple CommsClient nodes #772

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

peci1
Copy link
Collaborator

@peci1 peci1 commented Feb 1, 2021

This PR adds proper support for having multiple CommsClients in different ROS nodes (or even multiple clients in one ROS node).

Solves: #638, #604 (yes, multicast!) and #270.

It is a rather large PR. I must admit much larger than I imagined when I started to work on it (given the fact that our half-sufficient workaround was just one file; but that workaround still did not work when nodes got restarted). Yet I still think this is the (more or less) minimal set of changes to provide proper support for multiple clients at least on the ROS side. There can still be only a single CommsClient with _useIgnition=true. But that's not a big problem because that client is a part of the bridge (or sim) container, so it's easy to make sure there is only one. On the other hand, solution containers using _useIgnition=false will now get much bigger freedom in spawning commsclients.

I even added support for calling Unbind(), although it seems to me that it's not that useful in SubT challenge (but hey, what if!).

What this PR brings

  1. Creating multiple instances of subt::communication_client::CommsClient (even in one ROS node, e.g. several nodelets can have their own commsclients). Each of these clients has its own separate list of bound endpoints which get unregistered when this client goes out of scope.
  2. Binding one endpoint multiple times even from the same client instance (e.g. to add multiple independent processing callbacks).
  3. Unbinding endpoints by calling CommsClient::Unbind(). This will stop calling the callback bound in the corresponding Bind() call.
  4. Restarting ROS nodes with CommsClients. This was not possible before because each endpoint could only be bound once and there was no unbind option. This is a big step towards reliability and robustness of the solution containers - i.e. the important commsclient nodes can finally get respawn="true" and they will still work even after a crash.
  5. The execution of the bound callbacks of one client does not block/delay the execution of callbacks of other clients (because they are dispatched via a ROS topic instead of service).
  6. Multicast addresses work as they should.

What this PR breaks

This PR of course breaks a lot of APIs/ABIs. I tried to keep the changes for downstream users to a minimum, and a proof that I was successful in that is that there's literally no "downstream" code in osrf/subt that would need any changes (BaseStation/Controller plugins, tests, subt_example_node). Here's a view of the changes from a few different perspectives.

Users of subt::communication_client::CommsClient(..., _useIgnition=false)

  1. The only change is the return value of Bind() call which now returns a list of registered endpoint IDs instead of plain bool. I added extra backwards compatibility feature by overloading operator! for the result type, so that people can still easily test if (!client->Bind(...)) .... I couldn't figure out how to support also a positive test like if(client->Bind(...)).... People who use just the CommsClient and did not check the return value of Bind() should be completely unaffected by this PR (except gaining all of the bonuses it brings).
  2. The constructor of CommsClient has two more optional parameters.
  • _listenBeacons specifies whether this client should automatically bind to the beacon port and fill the Neighbors() array
  • _rosNh is an optional ROS nodehandle to use for the created/subscribed ROS topics/services (e.g. in case users want to remap some of the topics)

Direct users of the ROS API:

  1. Definitions of the service calls in subt_ros/srv have changed. They're now using client/endpoint IDs instead of plain names for unregistration, unbind and as return values. The Bind.srv also requires a related client ID instead of address. It should be pretty easy to maintain a mapping between client/endpoint IDs and their corresponding names.
  2. Incoming messages from the SubRosRelay no longer come via a service call to /robot, but are instead published on topic /robot/comms. This is the key part that allows multiple CommsClients to coexist.

Users of subt::communication_client::CommsClient(..., _useIgnition=true)

  1. As with _useIgnition=false, there is nothing changed but the return type of Bind().
  2. This client instance does not, however, support multiple instances running at the same time (although creation of additional clients doesn't fail; they will just not receive any messages).

Direct users of the Ignition Transport API:

  1. Definitions of the registration/binding services changed. See Broker::OnAddrRegistration, Broker::OnAddrUnregistration and Broker::OnEndpointRegistration for their new signatures. All UInt32 fields stand for some ID (clientId or endpointId).

Validation

To be more sure this PR actually does what it says, I added lots of unit/integration tests. They can probably still be extended, but I think they capture the basic functionality well. As another measure to verify validity of the changes, I ran a subt_comms_test with two subt_example robots, and everything went fine. I was even able to restart each of the "solution" launch files without restarting the simulator and the test still went ok (but it needed subt_comms_test to be restarted because it lost track of the connections).

As a last point, I want to run a full systems test with our solution containers and a full 6-robot team. I will exchange our custom multi-client workaround for the changes in this PR and observe the behavior of the system. Our system normally runs two commsclients on the teambase - on for communicating with other robots, and one for communication with the basestation. So this test should practically verify that a multi-client setup works. I will also try to kill some of the communication nodes to see whether restarting them is possible and they are fully operational afterwards.

This allows multiple clients from the same ROS node
(or even from different ROS nodes), adds support for Unbind()
command and allows ROS nodes with CommsClient to be restarted
and still able to bind to the comms broker.
@osrf-jenkins
Copy link

Build finished. 36 tests run, 0 skipped, 1 failed.

@osrf-jenkins
Copy link

Build finished. 36 tests run, 0 skipped, 1 failed.

@peci1
Copy link
Collaborator Author

peci1 commented Feb 3, 2021

I successfully ran a full 3-robot test with both simulator and our solution code switched to use this PR

  • The robots successfully communicated with each other and with teambase.
  • UGVs were each running two commslients (each in a separate ROS node), and TEAMBASE was running 3 commsclients (each in separate ROS node).
  • Both broadcast and unicast was used for the communication.
  • Hundreds of endpoints were bound during the test without an issue.
    • Many fast concurrent binds discovered a small concurrency issue which I fixed.
  • Communication with base station and artifact reports went through smoothly.
  • I even tried restarting the nodes with commsclients "on-the-fly" and they happily reconnected and started sending/receiving messages after the restart.
  • There were no error or warning messages from the comms broker about duplicate registration or binds.

@peci1
Copy link
Collaborator Author

peci1 commented Mar 3, 2021

Updated with master (no conflict).

@nkoenig nkoenig requested a review from caguero March 22, 2021 15:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants