Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial testing framework #1218

Open
wants to merge 7 commits into
base: develop
Choose a base branch
from

Conversation

islas
Copy link
Contributor

@islas islas commented Jul 23, 2024

This PR introduces testing capabilities through a series of compartmentalized commits :

  1. Setting up helper scripts to handle environment loading
  2. Compilation test script able to exercise the make build
  3. A git submodule to the hpc-workflows testing framework
  4. A test definition config using hpc-workflows
  5. CI/CD capabilities with security safeguards in place

The CI/CD capabilities rely on specific github action runner configurations with the assumption of running on Derecho, but is designed with the intention of minimal reliance on machine and CI/CD-specific tooling. Thus, one could port these features to other machines or CI/CD solutions and achieve similar results.

islas added 2 commits July 22, 2024 11:50
In order to run test scripts outside of a testing framework, the handling of
environment setup should not be solely dependent on running within a dedicated
test framework. This has the added benefit of compartmentalizing the duties of
environment and dependency solving from running the tests.

These environment scripts allow for the selection of a particular environment
with the default being the fqdn of the current host. From there, arguments are
routed using standard POSIX-sh to a respective script. In the case of Derecho
(applicable to any system using lmod) all subsequent argument are treated as
modules to load into the current session.

The hostenv.sh script relies on one "argument" $AS_HOST being passed in via
variable setting to facilitate selection.

The helpers.sh script provides convenience features for substing checking in sh,
delayed environment variable expansion via eval, and quick banner creation.

The derecho.sh script is included as the first supported environment.
This script will facilitate the first tests. There are only three requirements
of any given test script with the planned testing framework. If a different
testing framework is used in the future, these requirements of the test scripts
can and should be re-evaluated.

The test script should :
1. Take the intended host / configuration environment as the first argument
2. Take the working directory to immediately change to as the second argument
3. Output some key phrase at the end of the test to denote success, anything else
   (non-zero exit code, no phrase but return zero) is a failure

This particular compilation test script satisfies the above while also providing
enough flexibility to select core, target, parallel jobs, and other command-line
options into the make build. Additionally, for convenience environment variables
can be passed in as command-line options to the test script to modularize certain
inputs.
@islas
Copy link
Contributor Author

islas commented Jul 23, 2024

General Instructions

Follow-up details relating to setting up self-hosted runners.

Repository Security Settings

First, minimal security barriers :

  • Restrict which actions and reusable workflows may be executed in this repository. Do this by going to Settings->Actions->General->Action Permissions within a repository (this may differ for an organization if setting up organization runners but is beyond the scope of this small guide). You may select “Allow , and select non-, actions and reusable workflows” and then check “Allow actions created by GitHub” to allow GitHub-provided actions while still restricting foreign non-vetted code.

  • Once again going to Settings->Actions->General->Action Permissions go to “Fork pull request workflow from outside collaborators” and select one of the radio buttons :
    a. Require approval for first-time contributors OR
    b. Require approval for all outside collaborators
    The latter is more secure but you may find it overly burdensome to approve for well-known outside contributors. Choose carefully when selecting based on your project’s visibility and expected development. If unsure use (b) as being more secure rarely is bad.

  • If not already set, in the same Settings->Actions->General->Actions Permissions, change the default GITHUB_TOKEN permissions to read only. Workflows should use the least privilege necessary to complete their tasks, and this ensures that it starts as low as possible.

Now you are ready to set up a self-hosted runner. You should still use good PR review techniques to check that no malicious code is present in a PR BEFORE kicking off the workflow. If there are changes in .github/workflows/ or wherever you keep your tests, you should pay extra attention to these changes before allowing a workflow to run. Also consider using label triggers as an extra layer of security to not have workflows automatically start running. This goes counter to the automation process, but is more secure and could potentially save on compute resources especially if many pushes happen in a PR.

Creating self-hosted runners

Creating a self-hosted runner is now generally straight forward.

  1. Now that security is handled, go to Settings->Actions->Runners->New self-hosted runners
    You will be presented with a page of instructions. Select “Linux” as the runner image and x64 as the architecture. The following instructions in the web page consist of creating a directory, downloading the runner image tarball, checking the checksum, and extracting.
    If you are comfortable with these instructions, either copy and paste them into your terminal or modify them as you see fit. If you are not STOP, DO NOT PROCEED. We have not done any configuration yet so best to stop and ask someone who knows about self-hosted runners how to best proceed. This is important for security of the system and should not be taken lightly.

  2. Once you have your runner extracted the next instructions direct you to run the ./config.sh script with the URL to your repository and an authentication token. There are other options that may be passed into the configuration script. Please refer to the runner documentation.
    Any necessary missing information will be gathered via prompts. I encourage the use of labels like “”, “< runner id ##>”, and “derecho” to help identify runners if more than one will be set up.

  3. Once configuration is done, you may run ./run.sh
    Note: You may want to run this in a tmux or screen session to be able to detach and continue to run even when you disconnect from the computer. Additionally or alternatively you may want to have a cron job to regularly check if the runner is up. System reboots and maintenance take down runners and will need to be started again.

Self-hosted runners are removed from github if they are not connected for a period of time! (At the time of writing this 14 days) https://docs.github.com/en/actions/hosting-your-own-runners/managing-self-hosted-runners/removing-self-hosted-runners

Runners communication with github :
https://docs.github.com/en/actions/hosting-your-own-runners/managing-self-hosted-runners/about-self-hosted-runners#communication-requirements

Additional notes

I use /glade/work/$USER/github/runners/<repo>/derecho/<runner id> as a structure for setting up runners. This leads to a generally organized setup.

For runner ids I use <repo>##, increasing monotonically from 01, e.g. wrf01, wrf02

Labels I add to runners : <repo>, <runner id>, <machine> (derecho in this case)

I place logs in /glade/work/$USER/github/runners/<repo>/derecho/logs/

I name the runners <machine>-<runner id>, this is useful when having multiple runners across different machines.

I use screen to create detached sessions of the runners, and name the sockets <runner id>

For quick setup I have helper scripts available upon request, but I encourage first time setup to be done by hand to understand what is happening.

A google doc of this guide with pictures can be found here : https://docs.google.com/document/d/1CJq7NA_bh4ogB37t5Q1m9RO_2XVqPjjdmGQIH4xGWBk/edit?usp=sharing

@mgduda
Copy link
Contributor

mgduda commented Jul 26, 2024

Would it be possible to place the hpc-workflow submodule within the .github directory to avoid adding something to the top-level MPAS-Model directory that most users shouldn't worry themselves with?

@islas
Copy link
Contributor Author

islas commented Jul 26, 2024

That'd be doable, though I'd opt for placing it under .ci if that's the case. I think that will result in minor changes to only the .gitmodules file and the actions workflows under .github to reference the new location.

@mgduda
Copy link
Contributor

mgduda commented Jul 29, 2024

@islas I think the .ci directory is a good idea -- let's go with that.

islas added 4 commits July 29, 2024 12:21
Following the documentation of the hpc-workflows testing framework and the
testing structure found in .ci/, a JSON file for a GNU compilation test was added.
This test will compile the atmosphere core using gnu and single precision. If
this test is run using the derecho configuration the appropriate modules will
attempt to be loaded. For non-derecho environments, per the testing structure
under .ci/, if no configuration exists in .ci/hostenv.sh then the current
environment will be used verbatim.
…c-workflows

This reusable workflow balances quick setup with github actions-specific features.
It assumes that the tests can be controlled via a label being set in a PR.

To coordinate PR vs primary branch testing, a suffix is generated using either
the PR number or the branch name. This suffix is then used to relocate log files
to an archival location in an organized fashion. Github artifacts are still used
for failed test capture, but logs will also be moved to the archive location for
quicker access if one has access to where these tests execute.

To allow for parallelized testing available from hpc-workflows, the workflow can
make duplicate directories of the repository that can each run their own test
instance without clobbering files.

Once tests are run, results are gathered, relocated to archival location,
reported and printed to the screen, summarized into the actions summary page,
and then packaged into an artifact if failure occured.

Finally, the test label is removed if the named tests and label match.
This pipeline is triggered if any pushes occur on master or develop OR if a PR
is labeled with an appropriate tag as specified by the tests within this
workflow. Additionally, a specific label to trigger all tests can be used that
will be removed from the PR when all tests finish, regardless of exit status.

The pipeline makes extensive use of the reusable test_workflow.yml to
instantiate tests on runners.

This pipeline currently only includes the definition for one test to be run on
a github runner with tags that satisfy "derecho". Likewise, other hard-coded
values appearing in here assume a particular runner setup and environment.
@islas islas force-pushed the initial-testing-framework branch from 9a2ff49 to 1b31b47 Compare July 29, 2024 19:23
@islas
Copy link
Contributor Author

islas commented Jul 29, 2024

We should be able to set up runners and test this all out inside this PR before this goes in as well

During the review of this testing infrastructure into WRF, changes were requested
as well as some minor improvements were made. These include:
* Update submodule for fixes in filename handling, job naming, and err output
* Naming CI/CD jobs with trigger event identifier
* Parallelize copy of duplicate directories
* Add notes about public repo permissions on actions
* Reword env scripts
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants