-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Formalize container best practice (esp. for complex tools) #37
Comments
I actually see both approaches living in parallel. I think we should advertise building these containers upfront as best practice, but if they are not available we build them. Somewhere on my ToDo list is to extend https://github.com/BioContainers/mulled and create a small tiny website to assemble conda packages and create mixed-mulled containers. The names should be normalised and hashed in a unique way. The aim is to get from a random assembled requirements.txt file (which the same packages) the same container back. We could also think about to integrate this in the travis testing from IUC. So that IUC creates these containers on PR-merge. I think there is a benefit in generating them outside of Galaxy, for the reasons you mentioned, but also because I want to generate more ... like Singularity images - and with this everyone can profit - and we get in turn more care and funding for BioConda. |
Do we register and push a container to an external repository when we build one? I much prefer option 1 for reproducibility. I can see 2 being important for development, but I wouldn't like to see production Galaxy instances using this approach. |
I think we'll need both. We can't be sure that a dependency (in a container) really works (almost) everywhere until we have tested it in a bare bones container, so ideally the iuc tool tests would build and run (maybe also push on merge) the container. planemo could have a For production instances we should probably not default to building locally. Also if you build locally you will not know upfront if the built container will work, so what would you do if the container doesn't work? Rebuild until it does? That seems wasteful and is already a minus point for conda_auto_install. |
Thanks all - I don't agree with every nuance - but in large part I agree with most of this. I appreciate yinz taking the time to respond. My goal for the next few days of development was to establish that we can state having an existing container is considered best practice. I'll take that and work on it. Hopefully we will have a process in place by the GCC. I will however say in defense of (2) as long as it is cached it is no worse for reproducibility than allowing each site to install the binary dependencies locally once - as we do now and have always done. I get that (1) is much better than we've traditionally done so we should do it. In response to |
I agree, I think this is a good idea for certain scenarios. I was just mentioning this as an example for the extra work that would be involved in managing the container lifecycle. |
tl;dr - Should it be a best practice to (1) register combinations of requirements for complex tools and publish all needed combinations to a container registry or (2) should Galaxy just build complex containers as it needs to for such tools.
I think there is probably broad consensus that the "mulled" approach to building containers should be part of a best practice for using containers with Galaxy. From an operations perspective this produces both tiny containers that are very easy and quick to deploy and manage, from a reproducibility and support perspective this allows the same (best-practice Conda) binaries to work on bare metal or inside of a container, and from a developer perspective this will ideally become much more transparent than a
Dockerfile
-based approach.The follow up recommendation is less clear in my opinion. We currently have thousands of containers for individual requirements that can be used with tools that work with BioConda and only have a single
requirement
tag. For tools that contain multiplerequirement
tags - which I contend are not a corner case but a very mainstream and typical use case - we could recommend two different things as a best practice.Put another way - should Galaxy (1) fetch the containers it needs or (2) build them.
Pros of (1) are:
Pros of (2) are:
Ping @bgruening, @mvdbeek, @jxtx.
The text was updated successfully, but these errors were encountered: