-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GitHub Action to build and upload Conda package whenever a new release is made #67
base: main
Are you sure you want to change the base?
Conversation
…lease is published
uses: conda-incubator/setup-miniconda@v3 | ||
with: | ||
auto-update-conda: true | ||
python-version: 3.11 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason to use python 3.11 versus something newer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The newest version is 3.12 (3.13 is still prerelease) so we are pretty "bleeding edge" here. The Python version doesn't really matter all that much for this anyway, I simply included it so that it's explicit in case it breaks in the future.
Hugging Face uses 3.8 in their v2 usage. If you go to the setup-miniconda
repo their examples include a variety of versions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My mistake/ I was thinking this was 2.11, not 3.11. Ignore my comment/
with: | ||
name: 'PHG2_VERSION' | ||
value: '${{ env.VERSION }}' | ||
token: ${{secrets.PHGV2CD}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I realize "secrets.<>" is used in existing code. Where are the secrets values stored (ie what file is accessed?). Do we have these created for maizegenetics.net, or is this per project?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nevermind - I see from your slack posting this was generated for maize-genetics
PHG2_VERSION_MD5: ${{ vars.PHG2_VERSION_MD5 }} | ||
|
||
jobs: | ||
build-upload-conda: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will you update the README.md installation section with information on how to install this? Brandon is writing the detailed documentation, but individually we are updating the simple usage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure can do!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can add more to the Quick Start in future - I wasn't sure how much we wanted included since stuff is still in flux.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is my concern: If they pull the apps into the base conda environment, then the programs are seen via ProcessBuilder(). I think this is what you originally tested. I did not understand if this worked when our Clikt commands precede the ProcessBuilder() commands with a conda environment setting. Do all environments inherit what is in the base environment? if yes, then this was ok.
But I think we agreed we don't want to encourage adding to the base environment.
If users create a new conda environment and load into that this will not work with the ProcessBuilder() commands unless the environment is one that we know. So either the user should always create a conda environment named phgv2-conda , or they will have to pass a new paramter with the conda environment name to every class that has a ProcessBuilder call.
let me know if I'm missing something here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please see here for a response - I wanted to make sure everybody was on the same page and understood the discussion from this morning.
…g_v2 into build-upload-conda-package
@lynnjo @zrm22 @aberthel @btmonier @pjbradbury @tcasstevens For everybody's benefit and to clear up any confusion, I want to respond to Lynn's question from here in a more visible way so that we are all on the same page. If you look in the source, the ProcessBuilder commands follow this form: conda run -n phgv2-conda agc The above runs the given program (in this case From a system with a brand new Conda installation, issue the following commands: conda create --name my_env --channel conda-forge --channel maize-genetics phg2
conda activate my_env
phg2 setup-environment
conda run -n phgv2-conda anchorwave --help You should be greeted with the familiar AnchorWave help info. Furthermore, we can confirm the behavior I describe via: conda run -n phgv2-conda conda list Which will run Conda within the
What I was trying to describe in Programmer Meeting is that if the dependencies are a part of the package installation itself, we wouldn't have to rely on a bespoke One might argue that users of the plain tarball application will no longer have access to the supplied dependencies, and to that I say: it's not a problem! For example, AnchorWave doesn't install minimap2 for you despite depending on it. We ought to provide a singular "blessed" installation method (Conda) while also providing a generic release (the tarballs) for advanced users. If somebody has little technical or bioinformatics knowledge, we direct them to installing the package via Conda, which will manage the dependencies for them and allow for a straightforward upgrade path. Advanced users can install the tarball and add its executable to their Additionally, there are engineering concerns that I didn't have the time to explain at the meeting. Let's say a bug is discovered in one of our dependencies included in the We should be using our tools/dependencies how they were designed to be used, doing it any other way will inevitably lead to bugs and breakage, which in a small lab is particularly harmful due to everybody's limits on time. Much like the unit and integration tests, thoughtful design and engineering will ultimately save us time and lead to better science. |
|
||
``` | ||
conda update phg2 | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on these changes, the documentation indicates the user can only execute phg from the conda setup (the original ./phg commands have been removed).
Did we decide Conda is an option, or did we decide conda is the only option for installing phg?
If the latter, the instructions in the PHGv2- Building and Loading have an inconsistency. Is our program named "phg" or "phg2" ? (we need to make a collective decision)
It looks like we'll have 2 environments - one that contains the phg(2) executable, and one that is created by phg(2) to run agc, tiledbvcf, etc. Is that correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, I'm not opposed to loading phg only via conda. I just want to ensure our documentation has consistent examples and that we're all on the same page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did we decide Conda is an option, or did we decide conda is the only option for installing phg?
I don't think we decided anything, this is a question for @zrm22 or the group, same as the naming question.
It looks like we'll have 2 environments - one that contains the phg(2) executable, and one that is created by phg(2) to run agc, tiledbvcf, etc. Is that correct?
There is only a single environment, phgv2-conda
. The other environment containing the actual program (the Conda package) isn't really relevant as I wrote in my long comment above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think for now I would prefer explaining both the conda install and tarball installation instructions.
I agree that long term if the user installs through conda we should only have one environment(no more need for SetupEnvironment) with all the dependencies, but I think it gets tricky as every time we make a call to anchorwave, tileDB, bgzip or bcftools we need to wrap the command that ProcessBuilder executes within a conda run command where we need to specify the environment name.
The way we have this setup its consistent as we give the environment a name. If the user needs to create their environment and then just add the PHG conda env to it, we no longer have this control.
To fix either we need to have Processbuilder first figure out what envs are there and try to determine which one is the PHG one(what happens if you have more than 1?) or we have the user submit the name as a param(this would need to basically be in every command). We may be able to have the user create an Environment variable that gets picked up by Clikt automatically, but it is hard to say.
@@ -1,4 +1,7 @@ | |||
# PHG version 2 | |||
> [!TIP] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Put this tip under the badges.
@@ -27,56 +54,3 @@ The redesign leverages the powerful TileDB-VCF database, which is widely used in | |||
Composite Reference Haplotypes | |||
|
|||
More information on terminology can be found [here](docs/terminology.md). | |||
|
|||
# Example usage |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These need to be brought back in. We are making an effort so the user does not need to jump through 3-4 pages just to get basic information to run the software. Obviously this is not all the documentation, but having the 10-15 commands that need to be run at the main page of the Repo is a good idea for what we are trying to do here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to have the example usage section brought back into the README. We are trying to show how simple it is to run through the pipeline and in my opinion the best way to do that is to show the full basic pipeline at the homepage of the repo.
Description
This branch adds an additional GitHub Action that is triggered whenever a release is made, which in our case is done by the other Action in this repository.
It uses a repository secret,
CONDA_TOKEN
, generated for the maize-genetics Anaconda group. Its scope is limited to API read and write access.The existing action for building a release is modified to update two new repository variables,
PHG2_VERSION
andPHG2_VERSION_MD5
. These are used in the Conda build process to pull the most recent release (that which triggers this action).This will allow us to keep the Conda package consistent with the newest GitHub release without the need to manually build and upload each time. It is particularly helpful if there is a new version that fixes some bug - this way you don't have to download and extract a tarball to update, simply doing
conda update phg2
instead to retrieve the latest version.Type of change
What type of changes does your code introduce? Put an
x
in boxes that apply.CHANGE
(fix or feature that would cause existing functionality to not work as expected)FEATURE
(non-breaking change which adds functionality)BUGFIX
(non-breaking change which fixes an issue)ENHANCEMENT
(non-breaking change which improves existing functionality)NONE
(if none of the other choices apply. Example, tooling, build system, CI, docs, etc.)Checklist:
Changelog entry
Please add a one-line changelog entry below. This will be copied to the changelog file during the release process.