Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature]: Graceful shutdown of webservers #33377

Closed
SimonSimCity opened this issue Oct 31, 2024 · 8 comments · Fixed by #34130
Closed

[Feature]: Graceful shutdown of webservers #33377

SimonSimCity opened this issue Oct 31, 2024 · 8 comments · Fixed by #34130
Assignees
Labels

Comments

@SimonSimCity
Copy link

SimonSimCity commented Oct 31, 2024

🚀 Feature Request

I want a graceful shutdown for webservers (the process should be sent a SIGINT signal).

Example

No response

Motivation

For my e2e testing, which are full system tests, I want to spin up the frontend, backend and a database. For the database I want to use a command like docker run --rm -a stdout -a stderr --name ecorize-e2e-db -e POSTGRES_PASSWORD=mysecretpassword -p 5432:5432 postgres.

The current approach of killing the process has the downside that the clean-up action cannot be executed (https://docs.docker.com/reference/cli/docker/container/run/#rm).

I'm currently using concurrently to realize this. The downside there is that it's harder for me to set up different environment variables per process.

@Skn0tt
Copy link
Member

Skn0tt commented Oct 31, 2024

Greetings from one Simon to another. This is a wonderful idea, thanks for opening the feature request! Implementing in #33379.

@Skn0tt Skn0tt self-assigned this Oct 31, 2024
@Skn0tt Skn0tt added the v1.49 label Oct 31, 2024
@mxschmitt mxschmitt added v1.50 and removed v1.49 labels Nov 1, 2024
@mxschmitt
Copy link
Member

Short investigation: Looks like as of today we launch it in a new process group and send SIGKILL to the whole process Group. This can be reflected on that screenshot:

Image

We don't know yet why Docker doesn't shut the containers down with that signal. We'd love to make it work for Docker including docker, docker-compose and docker compose. We might need to add an opt-in flag in order to not break existing scenarios. Maybe also a new manually set default for create-playwright.

cc @Skn0tt

@SimonSimCity
Copy link
Author

We don't know yet why Docker doesn't shut the containers down with that signal.

I don't know whether I understand correctly what you mean here. SIGKILL is a command which does not allow a process to do any work (there seem to be minor exceptions - https://unix.stackexchange.com/questions/5642/what-if-kill-9-does-not-work) - at least no non-kernel code.

The --rm flag on my docker command is one example of such a cleanup process which cannot run once the process or one of its parent received SIGKILL.

If you don't want to kill the process immediately but allow it to shut down gracefully, you might take a look at other kill-related signals first: https://unix.stackexchange.com/a/251267/19157

Docker for example when stopping a container first issues SIGTERM and waits for the process to exit. If it doesn't, it by default sends SIGKILL after 10 sec: https://docs.docker.com/reference/cli/docker/container/stop/

@Skn0tt
Copy link
Member

Skn0tt commented Nov 4, 2024

Some more context from the team call for my own notes:

docker compose cleans up properly after itself when kill9'd. But when Playwright SIGKILLs it, the containers aren't cleaned up. Let's find out what's the difference there - maybe docker compose behaves differently when it's in a TTY vs child process?

Sending a SIGINT before SIGKILL is fine, but we need to make the timeout configurable.

@Skn0tt
Copy link
Member

Skn0tt commented Nov 6, 2024

I've looked into how Docker behaves when the process is killed. Once the docker run --rm process is stopped, there two kinds of cleanup that should have happened:

  1. the spawned container shouldn't be running anymore. this is standard docker run behaviour.
  2. anonymous volumes created by the container should be removed. this should only happen because of --rm.

In my testing, i'm seeing 1. happen regardless of wether the client process is shutdown forcefully or with grace. It seems this is implemented inside the Docker daemon, so it works even though the client process isn't around after SIGKILL.

The 2. works on graceful shutdown, but not when it's forced. I'm assuming that volume removal is implemented in the client.


My takeaway from this is that we should give the webServer processes time to shut down gracefully, so that cleanup can happen even for processes without a daemon.

There's no easy way for us to detect if a process respects SIGINT though. My open PR sends a SIGINT, and if the process doesn't exit within a timeout that defaults to 500ms, it sends the usual SIGKILL.

500ms is short, but should be enough for most local dev servers I believe. For dev servers that don't respect SIGINT, it means that the shutdown is only 500ms delayed, not more.

@Skn0tt
Copy link
Member

Skn0tt commented Nov 6, 2024

docker compose cleans up properly after itself when kill9'd. But when Playwright SIGKILLs it, the containers aren't cleaned up. Let's find out what's the difference there - maybe docker compose behaves differently when it's in a TTY vs child process?

docker compose is another special case, because it spawns a docker-compose child process under the hood. This is an implementation detail of docker compose, but it has an interesting effect:

When you kill -9 <pid> the parent process, the child process detects that it orphaned and initiates a graceful cleanup. But Playwright kills the entire process group (kill -9 -<pid>). So the child process also gets killed, and there's nothing left to perform the cleanup.

@Skn0tt
Copy link
Member

Skn0tt commented Nov 6, 2024

To illustrate the above, try the following:

In examples/todomvc, add a webserver with docker compose up and place the following in docker-compose.yml:

services:
  db:
    image: postgres
    environment:
      POSTGRES_PASSWORD: test

Now run npx playwright test integration:19 --project chromium and check docker ps. You'll see a dangling database container. Clean it up manually with docker compose down.
Now remove the - in processLauncher to kill only the top process, not the entire process group:

process.kill(-spawnedProcess.pid, 'SIGKILL');

Repeat the test, and you'll see that the container got cleaned up correctly.

@SimonSimCity
Copy link
Author

I found another resource confirming that SIGKILL is handled by the kennel, and therefore the application cannot run any code anymore (except for finishing a kernel code segment) https://unix.stackexchange.com/a/485657/21160

To the kill -9 <PID> I read that children will be sent SIGHUP (https://stackoverflow.com/a/21869303/517914). In the previous link, there's also a section saying that orphaned children are automatically adopted by PID 1.

Postgres will stop when receiving SIGHUP. https://www.postgresql.org/docs/7.0/signals.htm

I don't know by heart which process is a child of a docker compose command, but if it's the PID 1 inside the container, it would be interesting to know what happens when running a command which ignores those signals - like busybox/sh: https://stackoverflow.com/a/60537655/517914

That said, it's probably best to call a SIGTERM, on the application playwright is spawning, wait for the program to exit, and call SIGKILL including child processes if it doesn't do it in time.

SIGINT doesn't seem to be a good choice here, as it's meant for the program to interrupt the current procedure and wait for further instructions, which might not be what you want. Interactive shells might remain open. https://unix.stackexchange.com/a/251267/19157
Bash seems to stop scripts and loops on SIGINT, but seems to ignore it otherwise: https://www.gnu.org/software/bash/manual/html_node/Signals.html

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
3 participants