GitHub Action to build and upload Conda package whenever a new release is made #67

matthewwiese · 2023-12-08T16:51:26Z

Description

This branch adds an additional GitHub Action that is triggered whenever a release is made, which in our case is done by the other Action in this repository.

It uses a repository secret, CONDA_TOKEN, generated for the maize-genetics Anaconda group. Its scope is limited to API read and write access.

The existing action for building a release is modified to update two new repository variables, PHG2_VERSION and PHG2_VERSION_MD5. These are used in the Conda build process to pull the most recent release (that which triggers this action).

This will allow us to keep the Conda package consistent with the newest GitHub release without the need to manually build and upload each time. It is particularly helpful if there is a new version that fixes some bug - this way you don't have to download and extract a tarball to update, simply doing conda update phg2 instead to retrieve the latest version.

Type of change

What type of changes does your code introduce? Put an x in boxes that apply.

CHANGE (fix or feature that would cause existing functionality to not work as expected)
FEATURE (non-breaking change which adds functionality)
BUGFIX (non-breaking change which fixes an issue)
ENHANCEMENT (non-breaking change which improves existing functionality)
NONE (if none of the other choices apply. Example, tooling, build system, CI, docs, etc.)

Checklist:

I have performed a self-review of my code
I have commented my code, particularly in hard-to-understand areas
I have updated relevant documentation

Changelog entry

Please add a one-line changelog entry below. This will be copied to the changelog file during the release process.

GitHub Action to build and upload Conda package whenever a new release is made

…tackoverflow.com/a/57969570

…lease is published

conda/meta.yaml

lynnjo · 2023-12-11T15:24:41Z

.github/workflows/build_upload_conda.yml

+        uses: conda-incubator/setup-miniconda@v3
+        with:
+          auto-update-conda: true
+          python-version: 3.11


Is there a reason to use python 3.11 versus something newer?

The newest version is 3.12 (3.13 is still prerelease) so we are pretty "bleeding edge" here. The Python version doesn't really matter all that much for this anyway, I simply included it so that it's explicit in case it breaks in the future.

Hugging Face uses 3.8 in their v2 usage. If you go to the setup-miniconda repo their examples include a variety of versions.

My mistake/ I was thinking this was 2.11, not 3.11. Ignore my comment/

lynnjo · 2023-12-11T15:27:21Z

.github/workflows/run_deploy_on_merge.yml

+        with:
+          name: 'PHG2_VERSION'
+          value: '${{ env.VERSION }}'
+          token: ${{secrets.PHGV2CD}}


I realize "secrets.<>" is used in existing code. Where are the secrets values stored (ie what file is accessed?). Do we have these created for maizegenetics.net, or is this per project?

Nevermind - I see from your slack posting this was generated for maize-genetics

lynnjo · 2023-12-11T15:32:12Z

.github/workflows/build_upload_conda.yml

+  PHG2_VERSION_MD5: ${{ vars.PHG2_VERSION_MD5 }}
+
+jobs:
+  build-upload-conda:


Will you update the README.md installation section with information on how to install this? Brandon is writing the detailed documentation, but individually we are updating the simple usage.

Sure can do!

Let me know what you think.

We can add more to the Quick Start in future - I wasn't sure how much we wanted included since stuff is still in flux.

Here is my concern: If they pull the apps into the base conda environment, then the programs are seen via ProcessBuilder(). I think this is what you originally tested. I did not understand if this worked when our Clikt commands precede the ProcessBuilder() commands with a conda environment setting. Do all environments inherit what is in the base environment? if yes, then this was ok.

But I think we agreed we don't want to encourage adding to the base environment.

If users create a new conda environment and load into that this will not work with the ProcessBuilder() commands unless the environment is one that we know. So either the user should always create a conda environment named phgv2-conda , or they will have to pass a new paramter with the conda environment name to every class that has a ProcessBuilder call.

let me know if I'm missing something here.

Please see here for a response - I wanted to make sure everybody was on the same page and understood the discussion from this morning.

…g_v2 into build-upload-conda-package

matthewwiese · 2023-12-11T20:58:13Z

@lynnjo @zrm22 @aberthel @btmonier @pjbradbury @tcasstevens

For everybody's benefit and to clear up any confusion, I want to respond to Lynn's question from here in a more visible way so that we are all on the same page.

If you look in the source, the ProcessBuilder commands follow this form:

conda run -n phgv2-conda agc

The above runs the given program (in this case agc) within the Conda environment specified by the -n parameter, in this case the phgv2-conda environment that is created by phg2 setup-environment. This allows you to run any software installed in a given environment without having to be in the environment itself. Please refer to the conda run docs here.

From a system with a brand new Conda installation, issue the following commands:

conda create --name my_env --channel conda-forge --channel maize-genetics phg2
conda activate my_env
phg2 setup-environment
conda run -n phgv2-conda anchorwave --help

You should be greeted with the familiar AnchorWave help info. Furthermore, we can confirm the behavior I describe via:

conda run -n phgv2-conda conda list

Which will run Conda within the phgv2-conda environment, listing all packages installed into said environment (all without having to be in the environment itself). The first line output should look something like the following depending on your OS:

# packages in environment at /home/matt/miniconda3/envs/phgv2-conda:

What I was trying to describe in Programmer Meeting is that if the dependencies are a part of the package installation itself, we wouldn't have to rely on a bespoke phgv2-conda environment. ProcessBuilder would find the software installed alongside PHG2, whether that be in the base environment or a user's own custom environment.

One might argue that users of the plain tarball application will no longer have access to the supplied dependencies, and to that I say: it's not a problem! For example, AnchorWave doesn't install minimap2 for you despite depending on it. We ought to provide a singular "blessed" installation method (Conda) while also providing a generic release (the tarballs) for advanced users. If somebody has little technical or bioinformatics knowledge, we direct them to installing the package via Conda, which will manage the dependencies for them and allow for a straightforward upgrade path. Advanced users can install the tarball and add its executable to their PATH if they wish - such users ought to be expected to have the requisite knowledge of downloading/building from source the other dependencies that we require.

Additionally, there are engineering concerns that I didn't have the time to explain at the meeting. Let's say a bug is discovered in one of our dependencies included in the phgv2-conda environment which requires an update to some latest version - existing users will have to manually delete and recreate the phgv2-conda environment, or enter it and update the problematic package themselves.

We should be using our tools/dependencies how they were designed to be used, doing it any other way will inevitably lead to bugs and breakage, which in a small lab is particularly harmful due to everybody's limits on time. Much like the unit and integration tests, thoughtful design and engineering will ultimately save us time and lead to better science.

lynnjo · 2023-12-12T16:42:18Z

README.md

+
+```
+conda update phg2
+```


Based on these changes, the documentation indicates the user can only execute phg from the conda setup (the original ./phg commands have been removed).

Did we decide Conda is an option, or did we decide conda is the only option for installing phg?

If the latter, the instructions in the PHGv2- Building and Loading have an inconsistency. Is our program named "phg" or "phg2" ? (we need to make a collective decision)

It looks like we'll have 2 environments - one that contains the phg(2) executable, and one that is created by phg(2) to run agc, tiledbvcf, etc. Is that correct?

BTW, I'm not opposed to loading phg only via conda. I just want to ensure our documentation has consistent examples and that we're all on the same page.

Did we decide Conda is an option, or did we decide conda is the only option for installing phg?

I don't think we decided anything, this is a question for @zrm22 or the group, same as the naming question.

It looks like we'll have 2 environments - one that contains the phg(2) executable, and one that is created by phg(2) to run agc, tiledbvcf, etc. Is that correct?

There is only a single environment, phgv2-conda. The other environment containing the actual program (the Conda package) isn't really relevant as I wrote in my long comment above.

I think for now I would prefer explaining both the conda install and tarball installation instructions.

I agree that long term if the user installs through conda we should only have one environment(no more need for SetupEnvironment) with all the dependencies, but I think it gets tricky as every time we make a call to anchorwave, tileDB, bgzip or bcftools we need to wrap the command that ProcessBuilder executes within a conda run command where we need to specify the environment name.

The way we have this setup its consistent as we give the environment a name. If the user needs to create their environment and then just add the PHG conda env to it, we no longer have this control.

To fix either we need to have Processbuilder first figure out what envs are there and try to determine which one is the PHG one(what happens if you have more than 1?) or we have the user submit the name as a param(this would need to basically be in every command). We may be able to have the user create an Environment variable that gets picked up by Clikt automatically, but it is hard to say.

zrm22 · 2023-12-13T15:14:39Z

README.md

@@ -1,4 +1,7 @@
 # PHG version 2
+> [!TIP]


Put this tip under the badges.

zrm22 · 2023-12-13T15:17:03Z

README.md

@@ -27,56 +54,3 @@ The redesign leverages the powerful TileDB-VCF database, which is widely used in
    Composite Reference Haplotypes 

 More information on terminology can be found [here](docs/terminology.md).
-
-# Example usage


These need to be brought back in. We are making an effort so the user does not need to jump through 3-4 pages just to get basic information to run the software. Obviously this is not all the documentation, but having the 10-15 commands that need to be run at the main page of the Repo is a good idea for what we are trying to do here.

zrm22

We need to have the example usage section brought back into the README. We are trying to show how simple it is to run through the pipeline and in my opinion the best way to do that is to show the full basic pipeline at the homepage of the repo.

matthewwiese added 11 commits December 8, 2023 11:42

Add Action to test setting repo variables

3d27932

To get Action to appear in UI, trigger on PR open

eb77b89

Tweaks

50af3d9

Even after existing in the UI it won't allow a manual run :(

070922f

Yaml syntax

874fb2c

Correct means of modifying GitHub Actions env mid-run; see: https://s…

a25953c

…tackoverflow.com/a/57969570

Remove test_set_variable.yml action

8af9804

Add Conda build and upload action

1ccf2de

Add conda-forge for openjdk

b569af1

Remove pull request trigger

c20e680

Update PHG2_VERSION and PHG2_VERSION_MD5 repo variables when a new re…

f188bf9

…lease is published

matthewwiese changed the title ~~Do not merge this! Testing Actions for automatic Conda build + upload~~ GitHub Action to build and upload Conda package whenever a new release is made Dec 8, 2023

matthewwiese requested review from tcasstevens, lynnjo, btmonier, aberthel, pjbradbury and zrm22 December 8, 2023 18:04

zrm22 reviewed Dec 8, 2023

View reviewed changes

conda/meta.yaml Outdated Show resolved Hide resolved

Include dynamic PHG2_RELEASE variable

c6c94b8

btmonier approved these changes Dec 11, 2023

View reviewed changes

zrm22 approved these changes Dec 11, 2023

View reviewed changes

lynnjo reviewed Dec 11, 2023

View reviewed changes

matthewwiese added 5 commits December 11, 2023 11:22

Add note on installing via Conda plus some reorganization

3435668

Merge branch 'build-upload-conda-package' of github:maize-genetics/ph…

67fe6da

…g_v2 into build-upload-conda-package

Add clarity regarding the base environment

62944f5

Include conda-forge channel so that openjdk is found

d77f238

Neglected to include conda-forge here

f1674cb

tcasstevens approved these changes Dec 12, 2023

View reviewed changes

lynnjo reviewed Dec 12, 2023

View reviewed changes

lynnjo approved these changes Dec 13, 2023

View reviewed changes

pjbradbury approved these changes Dec 13, 2023

View reviewed changes

zrm22 reviewed Dec 13, 2023

View reviewed changes

README.md

@@ -1,4 +1,7 @@

# PHG version 2

> [!TIP]

Copy link

Collaborator

zrm22 Dec 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Put this tip under the badges.

zrm22 reviewed Dec 13, 2023

View reviewed changes

zrm22 requested changes Dec 13, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GitHub Action to build and upload Conda package whenever a new release is made #67

GitHub Action to build and upload Conda package whenever a new release is made #67

matthewwiese commented Dec 8, 2023 •

edited

Loading

lynnjo Dec 11, 2023

matthewwiese Dec 11, 2023

lynnjo Dec 11, 2023

lynnjo Dec 11, 2023

lynnjo Dec 11, 2023

lynnjo Dec 11, 2023

matthewwiese Dec 11, 2023

matthewwiese Dec 11, 2023

lynnjo Dec 11, 2023

matthewwiese Dec 11, 2023

matthewwiese commented Dec 11, 2023

lynnjo Dec 12, 2023

lynnjo Dec 12, 2023

matthewwiese Dec 12, 2023

zrm22 Dec 13, 2023

zrm22 Dec 13, 2023

zrm22 Dec 13, 2023

zrm22 left a comment

GitHub Action to build and upload Conda package whenever a new release is made #67

Are you sure you want to change the base?

GitHub Action to build and upload Conda package whenever a new release is made #67

Conversation

matthewwiese commented Dec 8, 2023 • edited Loading

Description

Type of change

Checklist:

Changelog entry

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

matthewwiese commented Dec 11, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zrm22 left a comment

Choose a reason for hiding this comment

matthewwiese commented Dec 8, 2023 •

edited

Loading