GitHub Pages: Here
Would you like to have a single script to quickly provision High Performance Computing (HPC) clusters with access to several ready-to-use HPC applications (WRF, GROMACS, OpenFOAM, and many more) so you can focus on solving hard scientific and engineering challenges? Then this blog may be relevant to you.
To achieve this goal we rely on a cluster provisioning script described in a previous blog post (see here), which is based on Command Line Interface (CLI) from both Azure and Azure CycleCloud. We then extended this script to automatically setup EESSI (European Environment for Scientific Software Installations)---see a previous blog post on EESSI here. And to make things concrete, we describe here how we put all of this together using WRF (Weather Research & Forecasting) Model as a use case scenario.
- We provide a script for a SLURM cluster creation via Azure CycleCloud ready to submit your (MPI) jobs using only Command Line Interface (CLI);
- We leverage EESSI to have access to ready-to-use applications which are mounted into the cluster nodes;
- The setup of EESSI is done via two relevant CycleCloud concepts: projects and cluster templates---so here you will learn a bit about those and can reuse the projects and cluster template even when CycleCloud was provisioned in other ways such as via marketplace in the Azure Portal;
- We use WRF as example, so the cluster provisioned will contain both WRF application and the benchmarking data Conus 2.5km and Conus 12km automatically available for job submission;
- This tutorial shows how you could do this for different applications and is not intended to describe the optimized ways to run WRF in a production system. For this, there are plenty of material out there, including a previous blog post (see here and here).
Here is the git repository that contains the script:
- git folder: git folder with automation script and cluster templates + cyclecloud projects for EESSI support
- cyclecloud_cli.sh: script itself to automate CycleCloud+SLURM installation using Azure/CycleCloud CLI
- setvars.sh: helper script to setup variables to customize deployment
- Deployment relies only on PRIVATE IP addresses;
- Private and public ssh keys available;
- We use Ubuntu for all resources: CycleCloud VM, scheduler, and cluster nodes.
- Azure CLI must be setup in the machine that triggers the script call (i.e.
az login
should work with the subscription for deployment). The commandjq
must also be available.
Download the git repository where the script is hosted:
git clone https://github.com/marconetto/azadventures.git
cd azadventures/chapter12
Customize variables in setvars.sh
, including resource group, storage account,
keyvault, among others and source
the file:
source setvars.sh
CCPASSWORD
and CCPUBKEY
are setup outside setvars.sh
. When running the
automation script, you will be asked for their values in case you
haven't done before using:
export CCPASSWORD=<mygreatpassword>
export CCPUBKEY=$(cat ~/.ssh/id_rsa.pub)
If the variables below are setup, the script will automatically check for you when the cluster is ready for job submission. Otherwise, you can check the cluster creation yourself using Azure Bastion---the automation script will show you the IP address of the CycleCloud VM.
export VPNRG=myvpnrg
export VPNVNET=myvpnvnet
Provision the resources (resources group, vnet, keyvault, cyclecloud, etc...):
./cyclecloud_cli.sh <clustername>
Once you are in the cluster scheduler via ssh or Azure Bastion, just:
sbatch run_wrf_hb_2_5km.sh
The benchmark data is in the azureuser home directory, together with a couple of SLURM batch script examples that you can work with depending on the SKU, network, and data you want to use.
Here is an example of a sbatch script available to run using for instance HB SKU, with Infiniband network, and Conus 2.5km benchmark data:
#!/bin/bash
#SBATCH --nodes=4
#SBATCH --tasks-per-node=120
source /cvmfs/software.eessi.io/versions/2023.06/init/bash
module load WRF/4.4.1-foss-2022b-dmpar
module load mpi/openmpi-4.1.5
execdir="run_$((RANDOM % 90000 + 10000))"
mkdir -p $execdir
cd $execdir || exit
echo "Execution directory: $execdir"
wrfrundir=$(which wrf.exe | sed 's/\/main\/wrf.exe/\/run\//')
ln -s "$wrfrundir"/* .
ln -sf /shared/home/azureuser/v4.4_bench_conus2.5km/* .
export UCX_NET_DEVICES=mlx5_ib0:1
time mpirun wrf.exe
Here we used WRF4.4.1 from the production EESSI repository. Click
here for details
and to get up to date information on applications that are being onboarded to
EESSI. Once you source
the EESSI bash script, you can have access to many
other apps/libraries, including GROMACS, OpenFOAM, among others. For instance,
if one wanted to use OpenFOAM, the module load command would be: module load OpenFOAM
.
Now let's move to the behind the scenes here in case you want to learn how this was done or you want to modify/expand the current automation.
EESSI will provide you with quick access to various applications. Alternatively, you could modify the steps below, to have applications being built from source code, or use frameworks such as SPACK or EasyBuild (see references for details).
When we provision a CycleCloud cluster, we can choose which job scheduler the cluster resources are managed by; which includes SLURM, PBS, and LSF. Such clusters have a pre-defined list of job queues. If we want to provision a cluster with some customizations, such as pre-download an application, change job queues and resource types, add start up tasks, among others, we can explore what is called cluster templates, projects, and cloud-init.
Cluster templates define cluster configurations. You can specify the VM types of cluster nodes, storage options, deployment region, network ports to access a scheduler node, cluster partitions/queues, etc. All these can also be parameterized, so a template can be used for multiple use cases.
Here is an example of a cluster template for a SLURM cluster: LINK
The format of these cluster templates follow the INI format.
[cluster]
[[node, nodearray]]
[[[volume]]]
[[[network-interface]]]
[[[cluster-init]]]
[[[input-endpoint]]]
[[[configuration]]]
[environment]
[noderef]
[parameters]
[[parameters]]
[[[parameter]]]
As mentioned above, cluster template defines configuration for the overall cluster. Inside the template, you can define configurations for nodes, and those are called CycleCloud projects. These projects contain specs. When a node starts, CycleCloud configures it by processing and running a sequence of specs. These specs can be python, shell, or powershell scripts. They are executed once nodes are ready (different from cloud-init, which is executed before cyclecloud processes are executed on the node).
Projects are used in the cluster templates with this following syntax:
[[[cluster-init <project>:<spec>:<project version>]]]
Here is a simplified view of a CycleCloud project:
\myproject
├── project.ini
├── templates
├── specs
│ ├── default
│ └── cluster-init
│ ├── scripts
│ ├── files
│ └── tests
- templates directory: hold cluster templates
- specs: the specifications defining your project
- scripts: scripts executed in lexicographical order on the node
- files: raw data files to will be put on the node
- tests: tests executed when a cluster is started in testing mode
Here is the URL on how to create a project and additional functionalities of cluster projects: LINK
CycleCloud also supports cloud-init. The configurations can be executed at the first boot a VM performs, before any other CycleCloud specific configuration occurs on the VM (such as installation of HPC schedulers). Cloud-init can be used for configuring things such as networking, yum/apt mirrors, etc.
Further details can be found here: LINK
[node scheduler]
CloudInit = '''#!/bin/bash
echo "cloud-init works" > /tmp/cloud-init.txt
'''
We can make WRF available through EESSI---European Environment for Scientific Software Installations (EESSI, pronounced as "easy"). There are certain steps to be executed in the cluster nodes to make WRF available for execution. We will make use of cluster template and cyclecloud project files to get there.
All of the steps below have been added to the CycleCloud CLI automation script.
There are several ways of doing so; let's see one of those ways exploring CycleCloud projects (we could alternatively use cloud-init). Here we assume you are on an existing CycleCloud VM.
LOCKER=$(cyclecloud locker list | sed 's/ (.*)//')
echo $LOCKER | cyclecloud project init cc_eessi
Copy the new template to the user home directory:
cp $NEW_TEMPLATE $HOME/
Create a file with this content cc_eessi/specs/default/cluster-init/scripts/00_setup_eessi.sh
:
#!/usr/bin/env bash
# instructions from: https://www.eessi.io/docs/getting_access/native_installation
sudo apt-get install lsb-release
wget https://ecsft.cern.ch/dist/cvmfs/cvmfs-release/cvmfs-release-latest_all.deb
sudo dpkg -i cvmfs-release-latest_all.deb
rm -f cvmfs-release-latest_all.deb
sudo apt-get update
sudo apt-get install -y cvmfs
wget https://github.com/EESSI/filesystem-layer/releases/download/latest/cvmfs-config-eessi_latest_all.deb
sudo dpkg -i cvmfs-config-eessi_latest_all.deb
sudo bash -c "echo 'CVMFS_CLIENT_PROFILE="single"' > /etc/cvmfs/default.local"
sudo bash -c "echo 'CVMFS_QUOTA_LIMIT=10000' >> /etc/cvmfs/default.local"
sudo cvmfs_config setup
Upload the project (in case you want to test it on existing CycleCloud environment):
cd cc_eessi/
cyclecloud project upload $LOCKER
cd ..
Let's create a second project so the scheduler downloads the WRF benchmark data once the scheduler is provisioned.
echo $LOCKER | cyclecloud project init cc_wrfconus
Create a file with this content cc_wrfconus/specs/default/cluster-init/scripts/00_get_conus.sh
:
#!/usr/bin/env bash
ADMINUSER=$(grep name /opt/cycle/jetpack/config/auth.json | awk -F'"' '{print $4}')
runuser -l "$ADMINUSER" -c 'curl -O https://www2.mmm.ucar.edu/wrf/users/benchmark/v3911/bench_12km.tar.bz2'
runuser -l "$ADMINUSER" -c 'tar jxvf bench_12km.tar.bz2'
runuser -l "$ADMINUSER" -c 'curl -O https://www2.mmm.ucar.edu/wrf/users/benchmark/v3911/bench_2.5km.tar.bz2'
runuser -l "$ADMINUSER" -c 'tar jxvf bench_2.5km.tar.bz2'
Upload this second project (again, in case you want to test it on existing CycleCloud environment):
cd cc_wrfconus/
cyclecloud project upload $LOCKER
cd ..
Now we need a way to use these CycleCloud projects, and we will do this by customizing a CycleCloud cluster template.
In your $HOME
directory inside the CycleCloud VM:
EXISTING_TEMPLATE=$(sudo find /opt/cycle_server -iname "*slurm_template*txt")
NEW_TEMPLATE=newslurm.txt
sudo cp $EXISTING_TEMPLATE $NEW_TEMPLATE
sudo chown azureuser.azureuser $NEW_TEMPLATE
You can also get the template from git:
cyclecloud project fetch https://github.com/Azure/cyclecloud-slurm/releases/3.0.5 cc-slurm
NEW_TEMPLATE=cc-slurm/templates/slurm.txt
Or:
wget https://raw.githubusercontent.com/Azure/cyclecloud-slurm/3.0.5/templates/slurm.txt
If you diff
these NEW_TEMPLATE
files, the content should be exactly the same,
assuming you got the right release ID from your current CycleCloud installation.
We modified $NEW_TEMPLATE in three places.
We first changed the cluster template name from Slurm
to SlurmEESSI
:
⋮
[cluster SlurmEESSI]
IconUrl = static/cloud/cluster/ui/ClusterIcon/slurm.png
FormLayout = selectionpanel
⋮
Second, we made sure EESSI could be used in all nodes, including the scheduler:
⋮
[[node defaults]]
UsePublicNetwork = $UsePublicNetwork
Credentials = $Credentials
SubnetId = $SubnetId
Region = $Region
KeyPairLocation = ~/.ssh/cyclecloud.pem
Azure.Identities = $ManagedIdentity
[[[cluster-init cc_eessi:default:1.0.0]]]
⋮
Third we added the WRF benchmark data project to be executed in the scheduler node:
⋮
[[node scheduler]]
MachineType = $SchedulerMachineType
ImageName = $SchedulerImageName
IsReturnProxy = $ReturnProxy
⋮
[[[cluster-init cyclecloud/slurm:scheduler:3.0.5]]]
[[[cluster-init cc_wrfconus:default:1.0.0]]]
⋮
Upload the cluster template.
cyclecloud import_template -f $NEW_TEMPLATE
With this you are ready to play with the new template+projects on existing CycleCloud VM. In our case, we uploaded these files into git to be consumed by the automation script or consumed by other CycleCloud VM created in different ways.
- azure cyclecloud:
https://learn.microsoft.com/en-us/azure/cyclecloud/overview - cyclecloud cluster templates (link 1):
https://learn.microsoft.com/en-us/training/modules/customize-clusters-azure-cyclecloud/2-describe-templates - cyclecloud cluster templates (link 2):
https://learn.microsoft.com/en-us/azure/cyclecloud/how-to/cluster-templates?view=cyclecloud-8 - cyclecloud projects:
https://learn.microsoft.com/en-us/azure/cyclecloud/how-to/projects?view=cyclecloud-8 - cyclecloud projects:
https://learn.microsoft.com/en-us/training/modules/customize-clusters-azure-cyclecloud/5-customize-software-installations - cyclecloud core concepts:
https://learn.microsoft.com/en-us/azure/cyclecloud/concepts/core?view=cyclecloud-8 - SLURM cluster template:
https://github.com/Azure/cyclecloud-slurm/blob/master/templates/slurm.txt - cyclecloud cloud-init:
https://learn.microsoft.com/en-us/azure/cyclecloud/how-to/cloud-init?view=cyclecloud-8 - EESSI Website:
https://www.eessi.io/ - EESSI Getting Access:
https://www.eessi.io/docs/getting_access/native_installation/ - EESSI+WRF on Azure:
https://easybuild.io/eum22/013_eum22_WRF_Azure_EESSI.pdf - SPACK Website:
https://spack.io/ - EasyBuild:
https://easybuild.io/