-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Delete unused containers #173
Comments
Samtools 1.3 is used in the SplAdder pipeline, which hasn't been integrated into the general purpose RNA-seq pipeline yet. ( The other tools are all part of the UNC pipeline (which we no longer support as a production pipeline). The duplicates are embarrassing though, please torch em. |
How do we recreate those images if we lose the binaries? |
Well, we'd check out a revision of this repo from before we deleted the Dockerfiles, and we'd build from there. That being said, I think there's a pretty good argument for deleting them:
I don't think the question (what if quay went away?) posted over on PR #174 is terribly important. I mean, yes, quay disappearing could happen, but if quay did go away, and all our pushed images disappeared, would we go back in time and rebuild all containers ever used in deprecated pipelines? I wager not. If we would do that, then we should have an internal backup of all containers that we've ever built on ToT, because there's no guarantee that we could do a fresh build of the containers that download resources from the web. I don't think this is a terribly reasonable contingency to plan for. Not that I think quay is immune from failure—companies are fundamentally ephemeral—but, it's like asking what would happen if GitHub shut down. Do we mirror all GitHub repositories that we've ever built artifacts from in order to tolerate the disappearance of GitHub? Etc etc. |
As an aside, contingent on space, this:
Would not be unreasonable to do. |
Re quay.io sinking being an unrealistic contingency: agreed. But there still is the possibility of phat-phingering and rogue/broken scripts that cause accidental deletion of images. Backup should be the contingency plan for that. Re backup of images. Let's do that. I propose backing up each top-level image and each image it's based on as an object in S3. IOW, the entire image tree. I also don't want to drag around unused ballast. It's not sustainable. Checking out a specific commit is acceptable if you can find the specific commit. That can be laborious without tags to help you. We do tag the images with the commit of the project repo. That should be included in the backup. And we need a policy governing which tool versions are kept and which are deleted. |
See #175 for backup plan. What do you think about the following policy:
|
So, I am not very good at making mistakes, but this seems like one of those policies where someone might still make a mistake. Since we are using revision control, it would be impossible to recover from a mistake, thus I propose that we strengthen the protocol further:
Sorry for the snark, but the proposal was a bit much. I'm all for protocol, but this is way overboard.
Concretely, I don't think it makes sense to do them as separate PRs, but I think it would make sense to remove each tool in a separate commit. This way, if we discover that we need to pull back the Dockerfile for a tool we removed, we just revert that single commit, which would be pretty clean. If we really wanted, we could have a standardized commit message (to make the log easier to grep for said commit), but it should be sufficient to ensure that the commit message has the name of the tool we're removing in it.
The "it's too hard to find the commit that removed a tool" is overstated. Since we have all of the Dockerfiles nested in subdirectories, it should take <30sec with
The downstream refactor of toil-scripts into toil-lib and a bunch of constituent projects should make it easy to track what Docker images we need. Specifically, we're moving all tools into Quick aside before getting to my actual next comment: We should start tagging this repository, and the tags should line up with (future)
Define contributors. Is that all 10 people who've committed code to this repo? If so, man, I'd hate to have to wrangle 10 people to get a minor change approved. What I'd suggest instead, is that we make sure that all dockerfiles in this repo have up-to-date and comprehensive I'd be fine if we allowed anyone who's contributed to |
Bear with me as I have good intentions. I want to ensure that we can reliably revert deletions and that we ask the right set of people, those who might be adversely affected by a deletion.
Yes, that is better.
This is a misquote of what I actually said.
It implies that you know where to look and that we never rename anything or restructure the source tree, break up the repository etc. etc. It assumes that the image was built from the revision of the file at the commit preceding the commit that deletes the tool. I prefer tagging the image with the commit as we currently do now.
I take that back. For PRs with deletions, I propose pinging the people who we think might have an interest in the tools that are being deleted. This is not a definition but I hope we can just use common sense. I'm afraid that only asking people in MAINTAINERS is not enough. |
I think we're in agreement that the simple way to do this is to remove each tool in a separate commit, right?
We're not deleting the image in quay, we're just deleting the Dockerfile for a tool in ToT. If someone needs to update it at a later point in time, we revert the deletion. As you pointed out, we currently tag all images with the revision that they were built from, so I really don't understand why we're debating whether it is tractable to find a commit where a tool existed? Unless I've missed something, no one is proposing changing the tagging scheme.
Isn't that what we're literally debating here? Whether it is hard or not to find the last commit where a Dockerfile for a tool existed?
If we tag the correct MAINTAINERS and keep the maintainers up to date, the maintainers should CC in the affected people... |
I got worried when you brought up
Team members using a tool in an actively maintained pipeline should also be considered. I personally think a MAINTAINERS file is pointless because GH makes that information is readily apparent. |
IDK, we're literally talking about 30 seconds with Anyways, if we're worried about not being able to build old docker images in the future, we should probably fix all of the places where we're depending on URLs that are not guaranteed to be stable.
I was referring to the MAINTAINER directive in the Dockerfile. I agree that this information is also available through GitHub. My point is more that the person who is maintaining the image should be responsible for knowing where it is used, and should be able to authoritatively comment whether it is OK to delete the container or not. If they can't be responsible for that, then that's a management problem. |
Yeah, I need to break this out into separate commits anyways, so this isn't ready for a merge. |
Delete unused containers (resolves #173)
This was bothering me.
@jvivian I grepped through toil-scripts and didn't see any of these, so I assume they're good to remove:
The text was updated successfully, but these errors were encountered: