Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clean up running CI processes when killed #1207

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 20 additions & 1 deletion ci/build-helpers.sh
Original file line number Diff line number Diff line change
Expand Up @@ -195,7 +195,13 @@ function short_timeout () { general_timeout "$TIMEOUT" "$@"; }
function long_timeout () { general_timeout "$LONG_TIMEOUT" "$@"; }
function general_timeout () {
local ret=0 short_cmd
$timeout_exec -s9 "$@" || ret=$?
# Beware: backgrounding $timeout_exec means that it cannot read stdin!
$timeout_exec -s9 "$@" &
# `timeout` makes a new process group. If we're killed now, also kill it.
# shellcheck disable=SC2064 # expanding $! now is intentional
trap "kill $!; trap_handler" INT TERM
wait || ret=$?
trap trap_handler INT TERM # reset traps once `timeout` has exited
# 124 if command times out; 137 if command is killed (including by timeout itself)
if [ $ret -eq 124 ] || [ $ret -eq 137 ]; then
# BASH_{SOURCE,LINENO}[0] is where we're being called from, which is inside
Expand All @@ -212,3 +218,16 @@ function general_timeout () {
fi
return $ret
}

# Set this function to be invoked when killed using `trap trap_handler INT TERM`.
# When killed, make sure we exit cleanly by killing all subshells and processes
# in our process group. Long-running commands need special handling as commands
# are invoked by bash in their own process group; see the general_timeout
# function for details.
function trap_handler () {
# Reset trap handler so we don't recurse forever when killing ourselves below.
trap - INT TERM
# Kill everything in our process group (i.e. subshells that bash spawned).
# This does *not* include subprocesses run by bash, only subshells.
kill -- -$$
}
2 changes: 1 addition & 1 deletion ci/build-loop.sh
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@ echo "$HASHES" | tail -n "+$((BUILD_SEQ + 1))" | cat -n | while read -r ahead bt
# API request quota too quickly.
succeeded) ;;
esac
) < /dev/null # Stop commands from slurping hashes, just in case.
) & wait # Stop commands from slurping hashes; make "trap" work.
done

case "$BUILD_TYPE" in
Expand Down
17 changes: 13 additions & 4 deletions ci/continuous-builder.sh
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,10 @@

. build-helpers.sh

# Make sure we exit cleanly and don't leave leftover processes running when
# killed. Bash doesn't do this automatically, unfortunately.
trap trap_handler INT TERM

if [ "$1" != --skip-setup ]; then
if [ -r ~/.continuous-builder ]; then
# Tell ShellCheck not to check the sourced file here. Assume the .env files are fine.
Expand Down Expand Up @@ -107,9 +111,12 @@ if [ -n "$HASHES" ]; then
# Run the build
. build-loop.sh

# Something inside this subshell is reading stdin, which causes some hashes
# from above to be ignored. It shouldn't, so redirect from /dev/null.
) < /dev/null
# Something inside this subshell is reading stdin, which used to cause
# some hashes from above to be ignored. Let "&" redirect from /dev/null.
# Additionally, use "& wait" so that the trap set above can run properly.
# If we just ran the subshell normally, bash would wait for it to exit
# before invoking the trap handler, which would take too long.
) & wait

# Outside subshell; we're back in the parent directory.
# Delete builds older than 2 days, then keep deleting until we've got at
Expand Down Expand Up @@ -161,7 +168,9 @@ run_duration=$(($(date +%s) - run_start_time))
# shellcheck disable=SC2031 # TIMEOUT was modified in a subshell; that's fine.
timeout=$(get_config_value timeout "${TIMEOUT:-120}")
if [ "$run_duration" -lt "$timeout" ]; then
sleep $((timeout - run_duration)) || :
# Wrap this in short_timeout so that sleep gets killed properly when the outer
# script is killed.
TIMEOUT=$timeout short_timeout sleep $((timeout - run_duration))
fi

# Re-exec ourselves. This lets us update pick up updates to this script, e.g.
Expand Down