Initial testing framework #1218

islas · 2024-07-23T02:11:11Z

This PR introduces testing capabilities through a series of compartmentalized commits :

Setting up helper scripts to handle environment loading
Compilation test script able to exercise the make build
A git submodule to the hpc-workflows testing framework
A test definition config using hpc-workflows
CI/CD capabilities with security safeguards in place

The CI/CD capabilities rely on specific github action runner configurations with the assumption of running on Derecho, but is designed with the intention of minimal reliance on machine and CI/CD-specific tooling. Thus, one could port these features to other machines or CI/CD solutions and achieve similar results.

In order to run test scripts outside of a testing framework, the handling of environment setup should not be solely dependent on running within a dedicated test framework. This has the added benefit of compartmentalizing the duties of environment and dependency solving from running the tests. These environment scripts allow for the selection of a particular environment with the default being the fqdn of the current host. From there, arguments are routed using standard POSIX-sh to a respective script. In the case of Derecho (applicable to any system using lmod) all subsequent argument are treated as modules to load into the current session. The hostenv.sh script relies on one "argument" $AS_HOST being passed in via variable setting to facilitate selection. The helpers.sh script provides convenience features for substing checking in sh, delayed environment variable expansion via eval, and quick banner creation. The derecho.sh script is included as the first supported environment.

This script will facilitate the first tests. There are only three requirements of any given test script with the planned testing framework. If a different testing framework is used in the future, these requirements of the test scripts can and should be re-evaluated. The test script should : 1. Take the intended host / configuration environment as the first argument 2. Take the working directory to immediately change to as the second argument 3. Output some key phrase at the end of the test to denote success, anything else (non-zero exit code, no phrase but return zero) is a failure This particular compilation test script satisfies the above while also providing enough flexibility to select core, target, parallel jobs, and other command-line options into the make build. Additionally, for convenience environment variables can be passed in as command-line options to the test script to modularize certain inputs.

islas · 2024-07-23T02:25:26Z

General Instructions

Follow-up details relating to setting up self-hosted runners.

Repository Security Settings

First, minimal security barriers :

Restrict which actions and reusable workflows may be executed in this repository. Do this by going to Settings->Actions->General->Action Permissions within a repository (this may differ for an organization if setting up organization runners but is beyond the scope of this small guide). You may select “Allow , and select non-, actions and reusable workflows” and then check “Allow actions created by GitHub” to allow GitHub-provided actions while still restricting foreign non-vetted code.
Once again going to Settings->Actions->General->Action Permissions go to “Fork pull request workflow from outside collaborators” and select one of the radio buttons :
a. Require approval for first-time contributors OR
b. Require approval for all outside collaborators
The latter is more secure but you may find it overly burdensome to approve for well-known outside contributors. Choose carefully when selecting based on your project’s visibility and expected development. If unsure use (b) as being more secure rarely is bad.
If not already set, in the same Settings->Actions->General->Actions Permissions, change the default GITHUB_TOKEN permissions to read only. Workflows should use the least privilege necessary to complete their tasks, and this ensures that it starts as low as possible.

Now you are ready to set up a self-hosted runner. You should still use good PR review techniques to check that no malicious code is present in a PR BEFORE kicking off the workflow. If there are changes in .github/workflows/ or wherever you keep your tests, you should pay extra attention to these changes before allowing a workflow to run. Also consider using label triggers as an extra layer of security to not have workflows automatically start running. This goes counter to the automation process, but is more secure and could potentially save on compute resources especially if many pushes happen in a PR.

Creating self-hosted runners

Creating a self-hosted runner is now generally straight forward.

Now that security is handled, go to Settings->Actions->Runners->New self-hosted runners
You will be presented with a page of instructions. Select “Linux” as the runner image and x64 as the architecture. The following instructions in the web page consist of creating a directory, downloading the runner image tarball, checking the checksum, and extracting.
If you are comfortable with these instructions, either copy and paste them into your terminal or modify them as you see fit. If you are not STOP, DO NOT PROCEED. We have not done any configuration yet so best to stop and ask someone who knows about self-hosted runners how to best proceed. This is important for security of the system and should not be taken lightly.
Once you have your runner extracted the next instructions direct you to run the ./config.sh script with the URL to your repository and an authentication token. There are other options that may be passed into the configuration script. Please refer to the runner documentation.
Any necessary missing information will be gathered via prompts. I encourage the use of labels like “”, “< runner id ##>”, and “derecho” to help identify runners if more than one will be set up.
Once configuration is done, you may run ./run.sh
Note: You may want to run this in a tmux or screen session to be able to detach and continue to run even when you disconnect from the computer. Additionally or alternatively you may want to have a cron job to regularly check if the runner is up. System reboots and maintenance take down runners and will need to be started again.

Self-hosted runners are removed from github if they are not connected for a period of time! (At the time of writing this 14 days) https://docs.github.com/en/actions/hosting-your-own-runners/managing-self-hosted-runners/removing-self-hosted-runners

Runners communication with github :
https://docs.github.com/en/actions/hosting-your-own-runners/managing-self-hosted-runners/about-self-hosted-runners#communication-requirements

Additional notes

I use /glade/work/$USER/github/runners/<repo>/derecho/<runner id> as a structure for setting up runners. This leads to a generally organized setup.

For runner ids I use <repo>##, increasing monotonically from 01, e.g. wrf01, wrf02

Labels I add to runners : <repo>, <runner id>, <machine> (derecho in this case)

I place logs in /glade/work/$USER/github/runners/<repo>/derecho/logs/

I name the runners <machine>-<runner id>, this is useful when having multiple runners across different machines.

I use screen to create detached sessions of the runners, and name the sockets <runner id>

For quick setup I have helper scripts available upon request, but I encourage first time setup to be done by hand to understand what is happening.

A google doc of this guide with pictures can be found here : https://docs.google.com/document/d/1CJq7NA_bh4ogB37t5Q1m9RO_2XVqPjjdmGQIH4xGWBk/edit?usp=sharing

mgduda · 2024-07-26T23:05:36Z

Would it be possible to place the hpc-workflow submodule within the .github directory to avoid adding something to the top-level MPAS-Model directory that most users shouldn't worry themselves with?

islas · 2024-07-26T23:36:47Z

That'd be doable, though I'd opt for placing it under .ci if that's the case. I think that will result in minor changes to only the .gitmodules file and the actions workflows under .github to reference the new location.

mgduda · 2024-07-29T17:54:44Z

@islas I think the .ci directory is a good idea -- let's go with that.

Following the documentation of the hpc-workflows testing framework and the testing structure found in .ci/, a JSON file for a GNU compilation test was added. This test will compile the atmosphere core using gnu and single precision. If this test is run using the derecho configuration the appropriate modules will attempt to be loaded. For non-derecho environments, per the testing structure under .ci/, if no configuration exists in .ci/hostenv.sh then the current environment will be used verbatim.

…c-workflows This reusable workflow balances quick setup with github actions-specific features. It assumes that the tests can be controlled via a label being set in a PR. To coordinate PR vs primary branch testing, a suffix is generated using either the PR number or the branch name. This suffix is then used to relocate log files to an archival location in an organized fashion. Github artifacts are still used for failed test capture, but logs will also be moved to the archive location for quicker access if one has access to where these tests execute. To allow for parallelized testing available from hpc-workflows, the workflow can make duplicate directories of the repository that can each run their own test instance without clobbering files. Once tests are run, results are gathered, relocated to archival location, reported and printed to the screen, summarized into the actions summary page, and then packaged into an artifact if failure occured. Finally, the test label is removed if the named tests and label match.

This pipeline is triggered if any pushes occur on master or develop OR if a PR is labeled with an appropriate tag as specified by the tests within this workflow. Additionally, a specific label to trigger all tests can be used that will be removed from the PR when all tests finish, regardless of exit status. The pipeline makes extensive use of the reusable test_workflow.yml to instantiate tests on runners. This pipeline currently only includes the definition for one test to be run on a github runner with tags that satisfy "derecho". Likewise, other hard-coded values appearing in here assume a particular runner setup and environment.

islas · 2024-07-29T19:25:01Z

We should be able to set up runners and test this all out inside this PR before this goes in as well

During the review of this testing infrastructure into WRF, changes were requested as well as some minor improvements were made. These include: * Update submodule for fixes in filename handling, job naming, and err output * Naming CI/CD jobs with trigger event identifier * Parallelize copy of duplicate directories * Add notes about public repo permissions on actions * Reword env scripts

islas added 2 commits July 22, 2024 11:50

mgduda self-requested a review July 26, 2024 23:02

mgduda added feature Framework labels Jul 26, 2024

islas added 4 commits July 29, 2024 12:21

Add a framework to easily facilitate testing

f9baf98

islas force-pushed the initial-testing-framework branch from 9a2ff49 to 1b31b47 Compare July 29, 2024 19:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Initial testing framework #1218

Initial testing framework #1218

islas commented Jul 23, 2024

islas commented Jul 23, 2024 •

edited

Loading

mgduda commented Jul 26, 2024

islas commented Jul 26, 2024

mgduda commented Jul 29, 2024

islas commented Jul 29, 2024

Initial testing framework #1218

Are you sure you want to change the base?

Initial testing framework #1218

Conversation

islas commented Jul 23, 2024

islas commented Jul 23, 2024 • edited Loading

General Instructions

Repository Security Settings

Creating self-hosted runners

Additional notes

mgduda commented Jul 26, 2024

islas commented Jul 26, 2024

mgduda commented Jul 29, 2024

islas commented Jul 29, 2024

islas commented Jul 23, 2024 •

edited

Loading