-
Notifications
You must be signed in to change notification settings - Fork 49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
send all pending responses before shutting down #228
Conversation
The async implementation doesn't wait for responses to be sent to clients when shutting down. These responses are lost if the application exists fast enough before the write task gets to send them. That means that the response for a request that triggers shutdown of the server might be lost without the client knowing that the server was actually stopped. Signed-off-by: Alexandru Matei <[email protected]>
) | ||
} | ||
} | ||
|
||
struct ServerWriter { | ||
rx: MessageReceiver, | ||
_server_shutdown: shutdown::Waiter |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't need to call wait
in exit
?
Sorry I forget the details.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no need for that, the _server_shutdown waiter acts as a barrier for the shutdown.wait_all_exit() call found here:
ttrpc-rust/src/asynchronous/server.rs
Lines 243 to 254 in 6cf4d5b
pub async fn disconnect(&mut self) { | |
self.shutdown.shutdown(); | |
self.shutdown | |
.wait_all_exit() | |
.await | |
.map_err(|e| { | |
trace!("wait connection exit error: {}", e); | |
}) | |
.ok(); | |
trace!("wait connection exit."); | |
} |
When the ServerWriter goes out of scope here, after sending all replies, the _server_shutdown waiter is dropped. When that happens the server shutdown notifier will return from the wait_all_exit() call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, thanks.
The async implementation doesn't wait for responses to be sent to clients when shutting down. These responses are lost if the application exists fast enough before the write task gets to send them. That means that the response for a request that triggers shutdown of the server might be lost without the client knowing that the server was actually stopped.
I observed this behavior in kata-containers. The kata shim sends
DestroySandboxRequest
request to kata-agent, which runs inside the VM, to tell it to clean up and exit. The handler code for this request sets an event to tell the main thread that the process can be stopped. After that it returns an empty response to the shim. The main thread calls ttrpc_server_obj.shutdown() and the process exits. The ttrpc shutdown code doesn't wait for the reply to be sent. As such, kata shim might receive an error in the middle of the rpc call toDestroySandbox
without the call actually failing at all.