-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Nodelet unloading broken #50
Comments
Thanks for the detailed report. Regarding one: I see how it can be confusing. the message "Unloading nodelet <nodelet_name> from manager <nodelet_manager_name>" means that a request to unload the nodelet has been sent. It doesn't mean that the nodelet is actually unloading or fully unloaded. nodelet_core/nodelet/src/loader.cpp Line 338 in 6dcff94
Do you think that we should change the verbosity level of this message should be changed to be more transparent to the user? |
As far as I could see, the manager is torn down before all nodelets because roslaunch has 10 process killers running in parallel. Once it gets to the manager and the manager gets the SIGINT signal, it just stops without properly clearing the remaining nodelets. My guess would be that nodelet.cpp should install a signal handler as well for the "manager" case and initiate a shutdown for all nodes from this side. Or maybe I am reading this wrong. As for the verbosity: maybe it should be the same as the "Unloading nodelet from manager" so it is paired (start & end). |
@mikaelarguedas re point 1: is it the intended behaviour that when quitting the nodelet manager, all remaining running nodelets are not being destructed? If not, I'd like to open a new issue for it in order to get #51 merged, as this is a slightly different issue. To test this, prepare the following:
Now, perform these tests:
|
* add test whether bond breaking on unload works (tests #50) * disable callback for broken bond when we are breaking it This avoids the nodelet::LoaderROS::unload() method to be called twice for the same nodelet, causing an error output. * use AsyncSpinner for nodelet load in order for the shutdown procedure to work During shutdown, the bonds still need to communicate their status in order for the nodelet to properly/cleanly/quickly unload. This requires the node to spin. * add test whether LoaderROS::unload() is called twice (tests #50)
closing this in favor of #55 |
I notice an issue with nodelets not being properly unloaded. Consider a launch file that starts
n
nodelets (with n > 9, I recommend n = 20) in the same nodelet manager:and once these 20 nodelets are up and running, pressing Ctrl-C. One would expect that the shutdown process is quite linear, with all 20 nodelets and the manager being shut down quickly. However it seems the first 9 nodelets are shut down very quickly, and the other n-9 nodelets are shut down at a rate of one per second. Give this a try (either on Indigo, Jade or Kinetic).
Upon delving into the code, I found out what is happening.
First, the perception of 9 nodelets quitting quickly is wrong. As roslaunch uses 10 process killers in parallel, up to the 10 nodelets will be announced quickly immediately after hitting Ctrl-C (giving the wrong impression) but what actually happens is that the first n-9 nodelets take 1 second each to shutdown (in a serial manner). As soon as n-9 nodelets are taken down (or rather their loader nodes) the 10th process killer slot is filled with the nodelet manager. And sending SIGINT to the nodelet manager seems to just tear down all the remaining nodelets immediately. I am not sure this is the intended behaviour, or is it?
Second, the main issue here is that the entire nodelet unloading process seems to be broken due to bond status messages not being processed during unloading. When a nodelet loader process is signaled to quit, it will call the
unload
service of the nodelet manager. This in turn will try to break the bond which involves sending a bond status message "active = False" to the nodelet loader process and waiting for it to also send "active = False". However, if you look at nodelet.cpp you will see that no spinner is active.ros::spinOnce
was last called before starting the unload procedure, and as such no further messages (incl. bond status messages) will be received or published anymore. Hence the 1 second timeout for breaking the bond is applying.I tried to fix this quickly by introducing an AsyncSpinner. This improved the required shutdown time (by roughly n-9 seconds), however it triggered a new problem: for every nodelet I now get an error like
/nodelet_manager: Failed to find nodelet with name '/n01' to unload.
This, third, seems to be caused byLoaderROS::unload()
to be called twice: first by the service call, and then when the bond is broken via the broken callback set up inloader.cpp:serviceLoad()
.The text was updated successfully, but these errors were encountered: