Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

send all pending responses before shutting down #228

Merged
merged 1 commit into from
May 11, 2024

Conversation

alex-matei
Copy link
Contributor

@alex-matei alex-matei commented May 7, 2024

The async implementation doesn't wait for responses to be sent to clients when shutting down. These responses are lost if the application exists fast enough before the write task gets to send them. That means that the response for a request that triggers shutdown of the server might be lost without the client knowing that the server was actually stopped.

I observed this behavior in kata-containers. The kata shim sends DestroySandboxRequest request to kata-agent, which runs inside the VM, to tell it to clean up and exit. The handler code for this request sets an event to tell the main thread that the process can be stopped. After that it returns an empty response to the shim. The main thread calls ttrpc_server_obj.shutdown() and the process exits. The ttrpc shutdown code doesn't wait for the reply to be sent. As such, kata shim might receive an error in the middle of the rpc call to DestroySandbox without the call actually failing at all.

The async implementation doesn't wait for responses to be sent to clients
when shutting down. These responses are lost if the application exists fast
enough before the write task gets to send them. That means that the
response for a request that triggers shutdown of the server might be lost
without the client knowing that the server was actually stopped.

Signed-off-by: Alexandru Matei <[email protected]>
)
}
}

struct ServerWriter {
rx: MessageReceiver,
_server_shutdown: shutdown::Waiter
Copy link
Collaborator

@wllenyj wllenyj May 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't need to call wait in exit?
Sorry I forget the details.

Copy link
Contributor Author

@alex-matei alex-matei May 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no need for that, the _server_shutdown waiter acts as a barrier for the shutdown.wait_all_exit() call found here:

pub async fn disconnect(&mut self) {
self.shutdown.shutdown();
self.shutdown
.wait_all_exit()
.await
.map_err(|e| {
trace!("wait connection exit error: {}", e);
})
.ok();
trace!("wait connection exit.");
}

When the ServerWriter goes out of scope here, after sending all replies, the _server_shutdown waiter is dropped. When that happens the server shutdown notifier will return from the wait_all_exit() call.

Copy link
Collaborator

@lifupan lifupan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, thanks.

@lifupan lifupan merged commit 152ac12 into containerd:master May 11, 2024
14 of 15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants