Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wanted: reaping old tasks #45

Open
rektide opened this issue Aug 17, 2021 · 3 comments
Open

Wanted: reaping old tasks #45

rektide opened this issue Aug 17, 2021 · 3 comments

Comments

@rektide
Copy link

rektide commented Aug 17, 2021

I admit it, we have some containers that have slow memory leaks.

We'd love to have some good proven solutions for reaping old tasks. Something that will kill tasks over a week old, say. Slowly so as not to disrupt general service availability.

Was hoping to find something here, did not.

@cristim
Copy link
Contributor

cristim commented Aug 17, 2021

Thanks, that's actually a good idea for a new tool, I'll try to implement it once I'm done with my current work, I'll let you know once I have something ready for you.

@cristim
Copy link
Contributor

cristim commented Aug 17, 2021

Have you considered running the application on Spot instances? They're more likely to be interrupted so that the tasks uptime can be reduced. Or just bounce the ec2 instances from time to time using something like chaos-lambda

@nathanpeck
Copy link
Owner

nathanpeck commented Aug 17, 2021

Hey @rektide. I don't have a specific link for this, but I do have a solution for you:

ECS task definitions allows you to specify multiple containers, and these containers can be marked as "essential". If an "essential" container exits then the entire task is stopped and replaced. My suggestion is to run a tiny busybox container alongside your application container, mark that busybox container as essential, and configure the command to just be a sleep for however long you want the task to stay up. When the sleep ends, then that busybox sidecar will stop, and because it is marked as essential the entire task will be stopped and replaced by the ECS service.

This will effectively put a kill timer on your tasks and force them to restart on a schedule.

Alternatively if you want something that is a bit less bruteforce, and which has more logic about when to restart the tasks I'd suggest building a small Lambda function. The Lambda function can be configured to run on a schedule during your off peak hours. It can use the AWS SDK to list the tasks for the cluster, and issue a StopTask API call for any that are older than a certain threshold.

Edit: One more caveat/suggestion from Jon Wood on Twitter is to add a bit of jitter: timeout plus a random number of seconds. That way all your tasks don't die at once and cause an outage before they can be replaced.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants