-
Notifications
You must be signed in to change notification settings - Fork 23
Getting started
This page will help you to get to the point of running some basic test cases using ACCESS-OM2. It is divided into several sections:
- Quick start for Gadi users
- Downloading source code
- Building the models
- Running a test case
- Checking model output
If you have an account on the Gadi supercomputer at NCI you can use pre-built executables and standard configurations to get up and running very quickly.
There are over a dozen supported configurations in these 6 repositories (also included in access-om2/control):
- https://github.com/COSIMA/1deg_jra55_ryf.git
- https://github.com/COSIMA/1deg_jra55_iaf.git
- https://github.com/COSIMA/025deg_jra55_ryf.git
- https://github.com/COSIMA/025deg_jra55_iaf.git
- https://github.com/COSIMA/01deg_jra55_ryf.git
- https://github.com/COSIMA/01deg_jra55_iaf.git
These include
- 3 nominal horizontal resolutions: 1° (
1deg
), 0.25° (025deg
) and 0.1° (01deg
) - 2 forcing types: JRA55-do data that repeats 1 May 1990 - 30 April 1991 (
ryf
, repeat-year forcing) or 1 Jan 1958 - 31 Dec 2018 (iaf
, interannual forcing) - a
master
branch (for physics-only simulations) and amaster+bgc
branch (that also includes ocean and sea ice biogeochemistry) in each repository
The following steps will run the 1 degree JRA55 RYF configuration 1deg_jra55_ryf
. Analogous steps can be used for any of the others, but the SU cost, storage and run time of the models increase dramatically at higher resolution (roughly 0.14 kSU/model year and 100 model years/day at 1°; 7 kSU/yr and 12 yrs/day at 0.25°; 120 kSU/yr and 2 yrs/day at 0.1°; worse with high-frequency output and BGC), so ensure that you have the computational and storage resources to support your run!
-
Get an account on NCI and then get membership of an NCI compute project (e.g. CLEX compute projects:
w40
,w42
,v45
orw97
) to charge the model SU usage to, and also projectsqv56
,ik11
andhh5
to access the input files and payu - apply at https://my.nci.org.au/mancini/project-search if you need to. -
Decide on a name for your experiment, ideally one that's unique. We'll use
1deg_jra55_ryf_demo
as an example here, but you should replace this with your own experiment name. -
Log in to
gadi.nci.org.au
and download the experiment configuration to your home directory (since this is backed up)
mkdir -p ~/payu
cd ~/payu
git clone https://github.com/COSIMA/1deg_jra55_ryf.git 1deg_jra55_ryf_demo
cd 1deg_jra55_ryf_demo
Optional: if using a branch other than master
, check it out now. E.g. for biogeochemistry, do
git checkout master+bgc
- Check out a git branch for your experiment (same as your experiment name)
git checkout -b 1deg_jra55_ryf_demo
- Model output will be saved on
/scratch
and will be deleted if unread for 100 days, so it's important to copy your outputs to/gdata
(this isn't backed up, but at least your files won't be auto-deleted). To do this automatically (but see warnings below),
- Change
SYNCDIR
insync_data.sh
to a path in/gdata
you have write access to. To save confusion this should end with your experiment name (/1deg_jra55_ryf_demo
in our case). Double check that this directory does not already exist, otherwise you could lose data! - Uncomment the line
# postscript: sync_data.sh
at the end ofconfig.yaml
-
Uncomment and modify the
project
andshortpath
lines inconfig.yaml
to reflect your compute project (optional if not different from default). -
Make whatever other configuration changes you'd like, e.g.
- run length (
restart_period
inaccessom2.nml
), timestep (ice_ocean_timestep
inaccessom2.nml
), walltime limit (walltime
inconfig.yaml
) - ocean diagnostic outputs (
cd ocean
, editdiag_table_source.yaml
, run./make_diag_table.py
; for more info see https://github.com/COSIMA/make_diag_table) - sea ice diagnostic outputs in
ice/cice_in.nml
(f_*
fields; 'd'=daily mean, 'm'=monthly mean, 'x'=no output) - ocean parameters in
ocean/input.nml
- sea ice parameters in
ice/*.nml
-
Edit
metadata.yaml
to describe your experiment. Not strictly needed, but a good idea. -
git commit your changes with an informative message:
git commit -am "initial setup for experiment to test... bla bla"
-
load the conda environment containing
payu
module use /g/data/hh5/public/modules
module load conda/analysis3
- Run the model:
payu run
This uses the payu workflow management tool to prepare the run and submit it as a job to the PBS job queue. See the Gadi User Guide to learn more about PBS job management.
Check the status of the job (state 'Q'=waiting in queue, 'R'=running, 'E'=exiting, 'H'=held) with
uqstat -c
while it's running, you can get an idea of where it's up to with
grep cur_exp-datetime work/atmosphere/log/matmxx.pe00000.log
To kill the run early, do qdel N
, where N is the job number (first column given by uqstat
). If you kill the job (or it crashes), a work
directory will be left behind after the job has disappeared from uqstat
and you'll need to do payu sweep
before you can run again.
When your run has finished successfully, its output will be in archive/output000
, and a new job will start to sync this data to SYNCDIR
with sync_data.sh
. When complete, this will also generate a run_summary*.csv
file (best viewed in a spreadsheet program) containing a log of the runs in your experiment.
To do another run, just type payu run
again. To do (say) 10 runs, do payu run -n 10
and they'll automatically be submitted one after the other.
The outputs from each run will be in numbered subdirectories in archive
and also in SYNCDIR
.
Each run creates a restart
directory in archive
which is used as the initial condition for the next run. These restarts can accumulate and consume disk space, but only the most recent one is needed (unless you plan to restart a new experiment from an intermediate state). tidy_restarts.py
can be used to clean them up - see ./tidy_restarts.py -h
for usage information.
WARNING restarts are stored on /scratch
(so will be deleted after 100 days) and are not automatically copied to SYNCDIR
. If you plan to continue your run in the future you'll need to store the necessary restarts somewhere safe. To copy restarts to SYNCDIR
, first wait for all jobs to finish, tidy them with tidy_restarts.py
(if desired - see above), then do
./sync_data.sh -r -u
The most recent ocean restarts will be uncollated. If you'd like them to be collated, wait for the model run to finish, check for uncollated outputs with ls archive/restart*/ocean/*.nc.0000
and do payu collate -d archive/restartN/ocean
for any N with uncollated restarts. When the collation job has finished (check with uqstat
and ls archive/restart*/ocean/*.nc.0000
) you can then sync the restarts with just
./sync_data.sh -r
ANOTHER WARNING sync_data.sh
won't automatically copy uncollated files to SYNCDIR
, so may omit the most recent outputs if they haven't finished collating. To avoid auto-deletion from /scratch
, do sync_data.sh
after all jobs have finished. Also, sometimes collation of MOM output fails. To detect this, wait for all jobs to finish, then check for uncollated outputs with ls archive/output*/ocean/*.nc.0000
. If you find any, collate them with payu collate -i N
where N is the output directory number. When the collation jobs have finished (check with uqstat
and ls archive/output*/ocean/*.nc.0000
), copy the collated outputs to SYNCDIR
with ./sync_data.sh
.
The output data will be in NetCDF format (.nc
files). The quickest way to look at it is ncview, e.g.
module load ncview
ncview archive/output123/ocean/whatever.nc &
To do any serious plotting or analysis, you should use the COSIMA Cookbook. You'll probably need to make your own database to access your output data copied to SYNCDIR
.
As the previous section showed, on Gadi only the experiment configuration needs to be downloaded to use ACCESS-OM2. This is because, by default, the configurations reference pre-existing executables and inputs. However, if you wish to dig into the details of the model, modify input or code, or set it up on a different machine, then it is necessary to download and familiarise yourself with the entire ACCESS-OM2 repository.
First, to download the entire ACCESS-OM2 repository, including all of the configurations and source code:
git clone --recursive https://github.com/COSIMA/access-om2.git
Then, to simplify future instructions we give this directory a short name with:
cd access-om2
export ACCESS_OM2_DIR=$(pwd)
If you already have an existing download and would like to update to the latest version see the tutorial on updating an experiment.
The project is arranged into a collection of repositories that are linked together using git submodules. The access-om2 repository is the parent, with other repositories embedded within it.
cd $ACCESS_OM2_DIR
tree -L 2 -d
.
├── control
│ ├── 01deg_jra55_iaf
│ ├── 01deg_jra55_ryf
│ ├── 025deg_jra55_iaf
│ ├── 025deg_jra55_ryf
│ ├── 1deg_jra55_iaf
│ └── 1deg_jra55_ryf
├── src
│ ├── cice5
│ ├── libaccessom2
│ │ ├── datetime-fortran
│ │ ├── json-fortran
│ │ ├── oasis3-mct
│ │ ...
│ └── mom
├── test
│ └── checksums
└── tools
├── contrib
└── esmgrids
The control
directory contains a number of submodules, one for each standard configuration. These repositories that can be downloaded independently and used to run experiments on Gadi. The src
directory contains a submodule for each of the dynamical models. The libaccessom2
repository contains the file-based atmosphere YATM, the OASIS coupler and other code needed to bolt everything together.
Working with git submodules can be tricky, fortunately there is a lot of good documentation out there, for example here and here.
If this repository layout is particularly dis(agreeable) to you for any reason, please feel free to add a comment (as others have done) on a related issue, such as this one.
The easiest is simply
cd $ACCESS_OM2_DIR
./install.sh
which should will build executables for all models at all resolutions. The executables will be placed in the $ACCESS_OM2_DIR/bin/
directory.
Each of the model configurations is run by payu from within its respective directory in $ACCESS_OM2_DIR/control/
. The config.yaml
within each of these subdirectories gives the PBS specification for the job, including executable names.
You will need to edit config.yaml
to set the project:
and shortpath:
settings appropriately. If you have built your own executables or wish to use your own input then it will also be necessary to modify the exe:
and input:
settings.
For example, to run the 1 degree JRA-55 RYF experiment:
export EXP_DIR=$ACCESS_OM2_DIR/control/1deg_jra55_ryf/
cd $EXP_DIR
module use /g/data/hh5/public/modules
module load conda/analysis3
payu run
or to do N runs:
payu run -n N
On NCI, status of submitted runs can be checked with qstat -u ${USER}
.
If the run is successful output is stored in $EXP_DIR/archive
. The payu documentation provides and explanation of where run output is stored.
If the run is unsuccessful you can find output in $EXP_DIR/access-om2.err
and any output from the run will be in $EXP_DIR/work
.
Please see this issue: https://github.com/COSIMA/access-om2/issues/10 for further information that may need to be integrated into this page.
This is a work in progress. Instructions here are (very) incomplete!
General gadi transition info is here: https://opus.nci.org.au/display/Help/Preparing+for+Gadi
The changes most relevant to ACCESS-OM2 are:
-
/short
will not exist on gadi. The replacement/scratch
is time-limited, so is not suitable for storing model inputs or executables. We have therefore moved inputs from/short/public/access-om2/
to/g/data/ik11/inputs/access-om2/
. To access this you will need to be a member of projectik11
(apply via mancini). You'll also need to be a member ofua8
for RYF orqv56
for IAF. - The openMPI and NetCDF library versions we used on raijin will not be available on gadi. We've upgraded them in new executables but your old raijin executables will not run on gadi.
- The intel compilers we used on raijin will not be available on gadi.
- There are 48 CPUs per node on the normal queue on gadi (compared to 16 on raijin), so if using the normal queue you will need this at the end of
config.yaml
:
platform:
nodesize: 48
If you want to start a brand-new experiment, we recommend you use the latest executables and configurations, which will fix the above issues (and more).
We haven't (yet) released a version of ACCESS-OM2 suitable for gadi, but we're working on it. There are test configurations available on the gadi-transition
and ak-dev
branches of the JRA-do-forced IAF and RYF configurations at all three resolutions in the individual config repos. gadi-transition
is close to the old raijin configurations so is more suitable for continuation of existing runs. ak-dev
includes many improvements an bug fixes (see the incomplete summary in the merge ak-dev branch
pull request, e.g. https://github.com/COSIMA/025deg_jra55_iaf/pull/4).
Not all configurations have been tested, and those that have been run have not had their output carefully checked.
The IAF configs use JRA55-do from qv56
which is a slightly newer version (1.3.1) from that on ua8
used in previous runs and RYF (with small differences in near-surface temperature and humidity only). You could revert to the ua8
version for continuation runs by undoing these changes to atmosphere/forcing.json
https://github.com/COSIMA/025deg_jra55_iaf/commit/d19b5c86e0600f1b8d52cc35cb7dba206d53f15b#diff-cd3ba4f3a1c8dd37efc78631f27566b3
and undoing the atmosphere input changes in config.yaml
:
https://github.com/COSIMA/025deg_jra55_iaf/commit/d19b5c86e0600f1b8d52cc35cb7dba206d53f15b#diff-259fe82e12a866f01123927480c7851b
and also replacing /g/data1/
with /g/data/
in these files.
To try out a config, do this on gadi:
git clone https://github.com/COSIMA/1deg_jra55_iaf.git my-test-run
cd my-test-run
git checkout gadi-transition
git checkout -b my-test-run
then edit accessom2.nml
, sync_output_to_gdata.sh
, config.yaml
and run with
module use /g/data/hh5/public/modules
module load conda/analysis3-unstable
payu setup
git commit -am "my test run"
payu sweep
payu run
with "my-test-run" being whatever name you want. You can replace 1deg_jra55_iaf
above with any of these:
1deg_jra55_ryf
025deg_jra55_iaf
025deg_jra55_ryf
01deg_jra55_iaf
01deg_jra55_ryf
If you really want to live on the bleeding edge, replace gadi-transition
with ak-dev
above.
This is where it gets tricky. Your executables from raijin will not run on gadi due to library changes. If you want to continue a run as close as possible to your raijin experiment you'll need to recompile the versions you used so they work on gadi. Note that due to compiler changes this will not give bit-for-bit reproducibility of your previous raijin runs.
It's a bit involved:
-
Determine the versions (i.e. git hashes) of the executables used for your raijin run. Look in the
exe
fields inconfig.yaml
. These contain the git hash (at the end for yatm and before_libaccessom2
for mom and cice), e.g. in bold (yours will probably differ):exe: /short/public/access-om2/bin/yatm_b6caeab.exe
exe: /short/public/access-om2/bin/fms_ACCESS-OM_50dc61e_libaccessom2_b6caeab.x
exe: /short/public/access-om2/bin/cice_auscom_360x300_24p_47650cc_libaccessom2_b6caeab.exe
Before embarking on the steps below, first check whether the changes between your versions and those on gadi-transition
are significant enough to warrant recompiling the code. If not, just use the gadi-transition
branch as above.
- Download and compile access-om2 on gadi:
git clone --recursive https://github.com/COSIMA/access-om2.git
cd access-om2
git checkout gadi-transition
./install.sh
- Check out the versions of the code you used on raijin, using the hashes you found in step 1 above (yours will probably differ):
cd src/libaccessom2
git checkout b6caeab
git checkout -b "recompiling b6caeab for gadi"
cd ../mom
git checkout 50dc61e
git checkout -b "recompiling 50dc61e for gadi"
cd ../cice5
git checkout 47650cc
git checkout -b "recompiling 47650cc for gadi"
- Make the necessary compiler and library changes in libaccessom2, mom and cice:
- https://github.com/COSIMA/libaccessom2/compare/gadi-transition
- https://github.com/mom-ocean/MOM5/compare/gadi-transition
- https://github.com/COSIMA/cice5/compare/gadi-transition
-
Commit these changes:
cd src; for d in *; do cd $d; git commit -am "update compiler and libraries for gadi"; cd -; done
-
Cross fingers and run
./install
again. If it works you'll have new executables inbin
. Their names will include hashes that differ from your raijin run but match those of the commits at step 5. -
Put the executables somewhere permanent, which is visible to the gadi compute nodes (e.g. your home directory).
-
Update
config.yaml
to point to these executables and use/g/data/ik11/inputs/
rather than/short/public/
, e.g. https://github.com/COSIMA/1deg_jra55_ryf/compare/gadi-transition#diff-259fe82e12a866f01123927480c7851b You'll also need to set PBS flags inconfig.yaml
andsync_output_to_gdata.sh
, and also definemin_thickness = 1.0
in the&ocean_topog_nml
group inocean/input.nml
if it wasn't already defined there (more info here).
Don't update atmosphere/forcing.json
(there are small differences between the JRA55-do v1.3 in /g/data/ua8
and v1.3.1 in /g/data/qv56/replicas/input4MIPs/CMIP6/OMIP/MRI/MRI-JRA55-do-1-3
).
- Load a version of payu that works on gadi and do
payu setup
module use /g/data/hh5/public/modules
module load conda/analysis3-unstable
payu setup
- If this works, do
payu sweep
and then try a (short!) run:payu run
- payu v1.0.6 and later works on raijin and gadi. For the former the default location for laboratories is
/short/$PROJECT/$USER
and for gadi it is/scratch/$PROJECT/$USER
. payu v1.0.6 is available via theconda/analysis3-unstable environment
. More info: http://climate-cms.wikis.unsw.edu.au/Payuongadi
module use /g/data/hh5/public/modules
module load conda/analysis3-unstable
- you will need to be a member of projects
hh5
,ik11
,qv56
andua8
. Apply via mancini. - you might find useful information at https://accessdev.nci.org.au/trac/wiki/gadi
- to build from the
gadi-transition
sources, do
git clone --recursive https://github.com/COSIMA/access-om2.git
cd access-om2
git checkout gadi-transition
for d in src/*; do cd $d; git checkout gadi-transition; cd ../..; done
./install.sh
your executables will be in ./bin
.
- Quick start
- Downloading ACCESS-OM2
- Building the models
- Running a test case
- Checking model output
- Running on gadi
- Updating an experiment
- Creating a new experiment
- Starting an experiment from existing restarts
- Integrating with the COSIMA cookbook
- Archiving model output
- Updating model bathymetry
- Perturbing the forcing fields
- Changing the bathymetry, land-sea mask and OASIS remapping weights
- Overview
- Coupling strategy
- Model load balancing
- CICE block distribution
- Build system
- Future work
- libaccessom2 / YATM
- MOM
- CICE
- Building models individually
- Installing new executables
- Contributing changes to the ACCESS-OM2 repository
- Testing
- Release history