Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement global-workflow on AWS #2549

Closed
wants to merge 66 commits into from
Closed
Show file tree
Hide file tree
Changes from 64 commits
Commits
Show all changes
66 commits
Select commit Hold shift + click to select a range
b3b3b17
add noaacloud module for AWS
weihuang-jedi Apr 8, 2024
1c5231a
Merge branch 'develop' of ssh://github.com/NOAA-EPIC/global-workflow-…
weihuang-jedi Apr 8, 2024
149d776
to compile on AWS
weihuang-jedi Apr 12, 2024
2fb6e6d
check back on hera to make sure it did not break anything
weihuang-jedi Apr 15, 2024
30b615a
add mods to save aws changes
weihuang-jedi Apr 15, 2024
5ec3a78
mv .aws.tar to mods4aws
weihuang-jedi Apr 15, 2024
43a6b33
update gdas modulefile
weihuang-jedi Apr 16, 2024
4512075
Merge remote-tracking branch 'origin' into wei-epic-aws
weihuang-jedi Apr 16, 2024
d84fded
add changes of gsi_enkf
weihuang-jedi Apr 18, 2024
534cbb3
add two vers for noaacloud
weihuang-jedi Apr 18, 2024
e9d41a0
prepare to run case
weihuang-jedi Apr 18, 2024
b5eb58c
re-add these tar files
weihuang-jedi Apr 18, 2024
7e2d1c4
update gsi_utils module file
weihuang-jedi Apr 18, 2024
c39eb7e
update version and modules for AWS
weihuang-jedi Apr 19, 2024
47bffad
add hosts info for noaacloud
weihuang-jedi Apr 21, 2024
956a3e5
compiled OK for ca-sfc-emc account
weihuang-jedi Apr 21, 2024
2768686
add host and env for CSP
weihuang-jedi Apr 23, 2024
6404639
Merge remote-tracking branch 'origin' into wei-epic-aws
weihuang-jedi Apr 26, 2024
cab4490
compile gdas
weihuang-jedi Apr 26, 2024
e6903e1
Merge branch 'wei-epic-aws' of github.com:NOAA-EPIC/global-workflow-c…
weihuang-jedi Apr 26, 2024
41eb077
Merge branch 'NOAA-EMC:develop' into wei-epic-aws
weihuang-jedi Apr 27, 2024
4e457a1
Merge branch 'wei-epic-aws' of github.com:NOAA-EPIC/global-workflow-c…
weihuang-jedi Apr 27, 2024
5e588f0
run through at ca-epic
weihuang-jedi Apr 27, 2024
db17544
Merge branch 'wei-epic-aws' of github.com:NOAA-EPIC/global-workflow-c…
weihuang-jedi Apr 27, 2024
2cc1c56
sync to ca-sfc-emc
weihuang-jedi Apr 27, 2024
bee6fa9
Merge remote-tracking branch 'origin' into wei-epic-aws
weihuang-jedi Apr 28, 2024
c310b5e
testing on ca-sfc-emc account
weihuang-jedi Apr 30, 2024
d388ad4
Merge branch 'NOAA-EMC:develop' into wei-epic-aws
weihuang-jedi Apr 30, 2024
c18cc33
clean up some temporary files
weihuang-jedi Apr 30, 2024
90006a3
Merge branch 'wei-epic-aws' of ssh://github.com/NOAA-EPIC/global-work…
weihuang-jedi May 1, 2024
b10aeea
Merge branch 'NOAA-EMC:develop' into wei-epic-aws
weihuang-jedi May 1, 2024
bd8ebd8
clean up few unused files
weihuang-jedi May 1, 2024
7a7ff55
Merge branch 'wei-epic-aws' of github.com:NOAA-EPIC/global-workflow-c…
weihuang-jedi May 1, 2024
ef6bc09
run in ca-epic
weihuang-jedi May 2, 2024
b4d8d6e
update global-workflow forked at epic, still waiting on gsi and ufs
weihuang-jedi May 2, 2024
0321ab5
Merge branch 'wei-epic-aws' of github.com:NOAA-EPIC/global-workflow-c…
weihuang-jedi May 2, 2024
a8487dc
load gw module
weihuang-jedi May 3, 2024
29c3253
Merge branch 'wei-epic-aws' of github.com:NOAA-EPIC/global-workflow-c…
weihuang-jedi May 3, 2024
343a3ac
update submodule hash
weihuang-jedi May 7, 2024
fa6f4b4
merge develop to wei-epic-aws
weihuang-jedi May 9, 2024
ec03c7d
trying compile on GCP
weihuang-jedi May 12, 2024
0b53426
save AWS change
weihuang-jedi May 21, 2024
4ce15bc
more test on AWS
weihuang-jedi May 23, 2024
529e258
merge develop into wei-epic-aws
weihuang-jedi May 28, 2024
c690fee
merge develop into wei-epic-aws
weihuang-jedi May 28, 2024
75af7b7
change google CPU numbers
weihuang-jedi May 28, 2024
22ab9a5
Merge branch 'wei-epic-aws' of github.com:NOAA-EPIC/global-workflow-c…
weihuang-jedi May 28, 2024
f9c4fc7
tidying up AWS changes
weihuang-jedi May 29, 2024
f2667d8
handle ICs differently on AWS
weihuang-jedi May 29, 2024
b8d3ef1
handle ICs differently on AWS
weihuang-jedi May 29, 2024
d0b8b50
tidying up
weihuang-jedi May 30, 2024
f6f3722
continuing tidy up
weihuang-jedi May 30, 2024
99ee72d
Merge branch 'wei-epic-aws' of ssh://github.com/NOAA-EPIC/global-work…
weihuang-jedi May 30, 2024
613b0fc
switch ufs-weather-model back
weihuang-jedi Jun 12, 2024
31e3b0c
add noaacloud to compile gdas
weihuang-jedi Jun 12, 2024
9c8e01a
before final sync with EMC repo
weihuang-jedi Jun 12, 2024
c120f37
Merge branch 'wei-epic-aws' of github.com:NOAA-EPIC/global-workflow-c…
weihuang-jedi Jun 12, 2024
1079952
sync with develop
weihuang-jedi Jun 12, 2024
42f9990
make it also run on hera
weihuang-jedi Jun 13, 2024
eec9c5a
Merge branch 'wei-epic-aws' of ssh://github.com/NOAA-EPIC/global-work…
weihuang-jedi Jun 13, 2024
a2802aa
avoid change wxflow
weihuang-jedi Jun 13, 2024
357ed59
Merge branch 'wei-epic-aws' of github.com:NOAA-EPIC/global-workflow-c…
weihuang-jedi Jun 13, 2024
d6b0f71
remove echo
weihuang-jedi Jun 13, 2024
a88ca29
sync and clean up
weihuang-jedi Jun 13, 2024
80eced9
sync on hera and test
weihuang-jedi Jun 13, 2024
9e26c32
save before rebase
weihuang-jedi Jun 17, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .gitmodules
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[submodule "sorc/ufs_model.fd"]
path = sorc/ufs_model.fd
url = https://github.com/ufs-community/ufs-weather-model
url = https://github.com/ufs-community/ufs-weather-model.git
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please revert.

ignore = dirty
[submodule "sorc/wxflow"]
path = sorc/wxflow
Expand Down
6 changes: 3 additions & 3 deletions env/AWSPW.env
Original file line number Diff line number Diff line change
Expand Up @@ -14,8 +14,8 @@ fi

step=$1

export launcher="mpiexec.hydra"
export mpmd_opt=""
export launcher="srun --mpi=pmi2 -l"
export mpmd_opt="--distribution=block:block --hint=nomultithread --cpus-per-task=1"

# Configure MPI environment
export OMP_STACKSIZE=2048000
Expand All @@ -36,7 +36,7 @@ if [[ "${step}" = "fcst" ]] || [[ "${step}" = "efcs" ]]; then
(( nnodes = (${!nprocs}+${!ppn}-1)/${!ppn} ))
(( ntasks = nnodes*${!ppn} ))
# With ESMF threading, the model wants to use the full node
export APRUN_UFS="${launcher} -n ${ntasks}"
export APRUN_UFS="${launcher} -n ${ntasks} ${mpmd_opt}"
unset nprocs ppn nnodes ntasks

elif [[ "${step}" = "post" ]]; then
Expand Down
298 changes: 298 additions & 0 deletions env/AZUREPW.env
Original file line number Diff line number Diff line change
@@ -0,0 +1,298 @@
#! /usr/bin/env bash
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove Azure and Google additions from this PR as this PR is titled "Implemente global-workflow on AWS"


if [[ $# -ne 1 ]]; then

echo "Must specify an input argument to set runtime environment variables!"
echo "argument can be any one of the following:"
echo "atmanlrun atmensanlrun aeroanlrun snowanl"
echo "anal sfcanl fcst post metp"
echo "eobs eupd ecen efcs epos"
echo "postsnd awips gempak"
exit 1

fi

step=$1

export launcher="srun --mpi=pmi2 -l"
export mpmd_opt="--distribution=block:block --hint=nomultithread --cpus-per-task=1"

#export POSTAMBLE_CMD='report-mem'

# Configure MPI environment
#export I_MPI_ADJUST_ALLREDUCE=5
#export MPI_BUFS_PER_PROC=2048
#export MPI_BUFS_PER_HOST=2048
#export MPI_GROUP_MAX=256
#export MPI_MEMMAP_OFF=1
#export MP_STDOUTMODE="ORDERED"
export OMP_STACKSIZE=2048000
export NTHSTACK=1024000000
#export LD_BIND_NOW=1

ulimit -s unlimited
ulimit -a

if [[ "${step}" = "prep" ]] || [[ "${step}" = "prepbufr" ]]; then

nth_max=$((npe_node_max / npe_node_prep))

export POE="NO"
export BACK="NO"
export sys_tp="HERA"
export launcher_PREP="srun"

elif [[ "${step}" = "prepsnowobs" ]]; then

export APRUN_CALCFIMS="${launcher} -n 1"

elif [[ "${step}" = "waveinit" ]] || [[ "${step}" = "waveprep" ]] || [[ "${step}" = "wavepostsbs" ]] || [[ "${step}" = "wavepostbndpnt" ]] || [[ "${step}" = "wavepostbndpntbll" ]] || [[ "${step}" = "wavepostpnt" ]]; then

export CFP_MP="YES"
if [[ "${step}" = "waveprep" ]]; then export MP_PULSE=0 ; fi
export wavempexec=${launcher}
export wave_mpmd=${mpmd_opt}

elif [[ "${step}" = "atmanlrun" ]]; then

nth_max=$((npe_node_max / npe_node_atmanlrun))

export NTHREADS_ATMANL=${nth_atmanlrun:-${nth_max}}
[[ ${NTHREADS_ATMANL} -gt ${nth_max} ]] && export NTHREADS_ATMANL=${nth_max}
export APRUN_ATMANL="${launcher} -n ${npe_atmanlrun} --cpus-per-task=${NTHREADS_ATMANL}"

elif [[ "${step}" = "atmensanlrun" ]]; then

nth_max=$((npe_node_max / npe_node_atmensanlrun))

export NTHREADS_ATMENSANL=${nth_atmensanlrun:-${nth_max}}
[[ ${NTHREADS_ATMENSANL} -gt ${nth_max} ]] && export NTHREADS_ATMENSANL=${nth_max}
export APRUN_ATMENSANL="${launcher} -n ${npe_atmensanlrun} --cpus-per-task=${NTHREADS_ATMENSANL}"

elif [[ "${step}" = "aeroanlrun" ]]; then

export APRUNCFP="${launcher} -n \$ncmd ${mpmd_opt}"

nth_max=$((npe_node_max / npe_node_aeroanlrun))

export NTHREADS_AEROANL=${nth_aeroanlrun:-${nth_max}}
[[ ${NTHREADS_AEROANL} -gt ${nth_max} ]] && export NTHREADS_AEROANL=${nth_max}
export APRUN_AEROANL="${launcher} -n ${npe_aeroanlrun} --cpus-per-task=${NTHREADS_AEROANL}"

elif [[ "${step}" = "snowanl" ]]; then

nth_max=$((npe_node_max / npe_node_snowanl))

export NTHREADS_SNOWANL=${nth_snowanl:-${nth_max}}
[[ ${NTHREADS_SNOWANL} -gt ${nth_max} ]] && export NTHREADS_SNOWANL=${nth_max}
export APRUN_SNOWANL="${launcher} -n ${npe_snowanl} --cpus-per-task=${NTHREADS_SNOWANL}"

export APRUN_APPLY_INCR="${launcher} -n 6"

elif [[ "${step}" = "ocnanalbmat" ]]; then

export APRUNCFP="${launcher} -n \$ncmd --multi-prog"

export APRUN_OCNANAL="${launcher} -n ${npe_ocnanalbmat}"

elif [[ "${step}" = "ocnanalrun" ]]; then

export APRUNCFP="${launcher} -n \$ncmd --multi-prog"

export APRUN_OCNANAL="${launcher} -n ${npe_ocnanalrun}"

elif [[ "${step}" = "ocnanalchkpt" ]]; then

export APRUNCFP="${launcher} -n \$ncmd --multi-prog"

export APRUN_OCNANAL="${launcher} -n ${npe_ocnanalchkpt}"

elif [[ "${step}" = "ocnanalecen" ]]; then

nth_max=$((npe_node_max / npe_node_ocnanalecen))

export NTHREADS_OCNANALECEN=${nth_ocnanalecen:-${nth_max}}
[[ ${NTHREADS_OCNANALECEN} -gt ${nth_max} ]] && export NTHREADS_OCNANALECEN=${nth_max}
export APRUN_OCNANALECEN="${launcher} -n ${npe_ocnanalecen} --cpus-per-task=${NTHREADS_OCNANALECEN}"

elif [[ "${step}" = "anal" ]] || [[ "${step}" = "analcalc" ]]; then

export MKL_NUM_THREADS=4
export MKL_CBWR=AUTO

export CFP_MP=${CFP_MP:-"YES"}
export USE_CFP=${USE_CFP:-"YES"}
export APRUNCFP="${launcher} -n \$ncmd ${mpmd_opt}"

nth_max=$((npe_node_max / npe_node_anal))

export NTHREADS_GSI=${nth_anal:-${nth_max}}
[[ ${NTHREADS_GSI} -gt ${nth_max} ]] && export NTHREADS_GSI=${nth_max}
export APRUN_GSI="${launcher} -n ${npe_gsi:-${npe_anal}} --cpus-per-task=${NTHREADS_GSI}"

export NTHREADS_CALCINC=${nth_calcinc:-1}
[[ ${NTHREADS_CALCINC} -gt ${nth_max} ]] && export NTHREADS_CALCINC=${nth_max}
export APRUN_CALCINC="${launcher} \$ncmd --cpus-per-task=${NTHREADS_CALCINC}"

export NTHREADS_CYCLE=${nth_cycle:-12}
[[ ${NTHREADS_CYCLE} -gt ${npe_node_max} ]] && export NTHREADS_CYCLE=${npe_node_max}
npe_cycle=${ntiles:-6}
export APRUN_CYCLE="${launcher} -n ${npe_cycle} --cpus-per-task=${NTHREADS_CYCLE}"

export NTHREADS_GAUSFCANL=1
npe_gausfcanl=${npe_gausfcanl:-1}
export APRUN_GAUSFCANL="${launcher} -n ${npe_gausfcanl} --cpus-per-task=${NTHREADS_GAUSFCANL}"

elif [[ "${step}" = "sfcanl" ]]; then

nth_max=$((npe_node_max / npe_node_sfcanl))

export NTHREADS_CYCLE=${nth_sfcanl:-14}
[[ ${NTHREADS_CYCLE} -gt ${npe_node_max} ]] && export NTHREADS_CYCLE=${npe_node_max}
npe_sfcanl=${ntiles:-6}
export APRUN_CYCLE="${launcher} -n ${npe_sfcanl} --cpus-per-task=${NTHREADS_CYCLE}"

elif [[ "${step}" = "eobs" ]]; then

export MKL_NUM_THREADS=4
export MKL_CBWR=AUTO

nth_max=$((npe_node_max / npe_node_eobs))

export NTHREADS_GSI=${nth_eobs:-${nth_max}}
[[ ${NTHREADS_GSI} -gt ${nth_max} ]] && export NTHREADS_GSI=${nth_max}
export APRUN_GSI="${launcher} -n ${npe_gsi:-${npe_eobs}} --cpus-per-task=${NTHREADS_GSI}"

export CFP_MP=${CFP_MP:-"YES"}
export USE_CFP=${USE_CFP:-"YES"}
export APRUNCFP="${launcher} -n \$ncmd ${mpmd_opt}"

elif [[ "${step}" = "eupd" ]]; then

nth_max=$((npe_node_max / npe_node_eupd))

export NTHREADS_ENKF=${nth_eupd:-${nth_max}}
[[ ${NTHREADS_ENKF} -gt ${nth_max} ]] && export NTHREADS_ENKF=${nth_max}
export APRUN_ENKF="${launcher} -n ${npe_enkf:-${npe_eupd}} --cpus-per-task=${NTHREADS_ENKF}"

export CFP_MP=${CFP_MP:-"YES"}
export USE_CFP=${USE_CFP:-"YES"}
export APRUNCFP="${launcher} -n \$ncmd ${mpmd_opt}"

elif [[ "${step}" = "fcst" ]] || [[ "${step}" = "efcs" ]]; then

if [[ "${CDUMP}" =~ "gfs" ]]; then
nprocs="npe_${step}_gfs"
ppn="npe_node_${step}_gfs" || ppn="npe_node_${step}"
else
nprocs="npe_${step}"
ppn="npe_node_${step}"
fi
(( nnodes = (${!nprocs}+${!ppn}-1)/${!ppn} ))
(( ntasks = nnodes*${!ppn} ))
# With ESMF threading, the model wants to use the full node
export APRUN_UFS="${launcher} -n ${ntasks}"
unset nprocs ppn nnodes ntasks

elif [[ "${step}" = "upp" ]]; then

nth_max=$((npe_node_max / npe_node_upp))

export NTHREADS_UPP=${nth_upp:-1}
[[ ${NTHREADS_UPP} -gt ${nth_max} ]] && export NTHREADS_UPP=${nth_max}
export APRUN_UPP="${launcher} -n ${npe_upp} --cpus-per-task=${NTHREADS_UPP}"

elif [[ "${step}" = "atmos_products" ]]; then

export USE_CFP="YES" # Use MPMD for downstream product generation on Hera

elif [[ "${step}" = "oceanice_products" ]]; then

nth_max=$((npe_node_max / npe_node_oceanice_products))

export NTHREADS_OCNICEPOST=${nth_oceanice_products:-1}
export APRUN_OCNICEPOST="${launcher} -n 1 --cpus-per-task=${NTHREADS_OCNICEPOST}"

elif [[ "${step}" = "ecen" ]]; then

nth_max=$((npe_node_max / npe_node_ecen))

export NTHREADS_ECEN=${nth_ecen:-${nth_max}}
[[ ${NTHREADS_ECEN} -gt ${nth_max} ]] && export NTHREADS_ECEN=${nth_max}
export APRUN_ECEN="${launcher} -n ${npe_ecen} --cpus-per-task=${NTHREADS_ECEN}"

export NTHREADS_CHGRES=${nth_chgres:-12}
[[ ${NTHREADS_CHGRES} -gt ${npe_node_max} ]] && export NTHREADS_CHGRES=${npe_node_max}
export APRUN_CHGRES="time"

export NTHREADS_CALCINC=${nth_calcinc:-1}
[[ ${NTHREADS_CALCINC} -gt ${nth_max} ]] && export NTHREADS_CALCINC=${nth_max}
export APRUN_CALCINC="${launcher} -n ${npe_ecen} --cpus-per-task=${NTHREADS_CALCINC}"

elif [[ "${step}" = "esfc" ]]; then

nth_max=$((npe_node_max / npe_node_esfc))

export NTHREADS_ESFC=${nth_esfc:-${nth_max}}
[[ ${NTHREADS_ESFC} -gt ${nth_max} ]] && export NTHREADS_ESFC=${nth_max}
export APRUN_ESFC="${launcher} -n ${npe_esfc} --cpus-per-task=${NTHREADS_ESFC}"

export NTHREADS_CYCLE=${nth_cycle:-14}
[[ ${NTHREADS_CYCLE} -gt ${npe_node_max} ]] && export NTHREADS_CYCLE=${npe_node_max}
export APRUN_CYCLE="${launcher} -n ${npe_esfc} --cpus-per-task=${NTHREADS_CYCLE}"

elif [[ "${step}" = "epos" ]]; then

nth_max=$((npe_node_max / npe_node_epos))

export NTHREADS_EPOS=${nth_epos:-${nth_max}}
[[ ${NTHREADS_EPOS} -gt ${nth_max} ]] && export NTHREADS_EPOS=${nth_max}
export APRUN_EPOS="${launcher} -n ${npe_epos} --cpus-per-task=${NTHREADS_EPOS}"

elif [[ "${step}" = "postsnd" ]]; then

export CFP_MP="YES"

nth_max=$((npe_node_max / npe_node_postsnd))

export NTHREADS_POSTSND=${nth_postsnd:-1}
[[ ${NTHREADS_POSTSND} -gt ${nth_max} ]] && export NTHREADS_POSTSND=${nth_max}
export APRUN_POSTSND="${launcher} -n ${npe_postsnd} --cpus-per-task=${NTHREADS_POSTSND}"

export NTHREADS_POSTSNDCFP=${nth_postsndcfp:-1}
[[ ${NTHREADS_POSTSNDCFP} -gt ${nth_max} ]] && export NTHREADS_POSTSNDCFP=${nth_max}
export APRUN_POSTSNDCFP="${launcher} -n ${npe_postsndcfp} ${mpmd_opt}"

elif [[ "${step}" = "awips" ]]; then

nth_max=$((npe_node_max / npe_node_awips))

export NTHREADS_AWIPS=${nth_awips:-2}
[[ ${NTHREADS_AWIPS} -gt ${nth_max} ]] && export NTHREADS_AWIPS=${nth_max}
export APRUN_AWIPSCFP="${launcher} -n ${npe_awips} ${mpmd_opt}"

elif [[ "${step}" = "gempak" ]]; then

export CFP_MP="YES"

if [[ ${CDUMP} == "gfs" ]]; then
npe_gempak=${npe_gempak_gfs}
npe_node_gempak=${npe_node_gempak_gfs}
fi

nth_max=$((npe_node_max / npe_node_gempak))

export NTHREADS_GEMPAK=${nth_gempak:-1}
[[ ${NTHREADS_GEMPAK} -gt ${nth_max} ]] && export NTHREADS_GEMPAK=${nth_max}
export APRUN="${launcher} -n ${npe_gempak} ${mpmd_opt}"


elif [[ "${step}" = "fit2obs" ]]; then

nth_max=$((npe_node_max / npe_node_fit2obs))

export NTHREADS_FIT2OBS=${nth_fit2obs:-1}
[[ ${NTHREADS_FIT2OBS} -gt ${nth_max} ]] && export NTHREADS_FIT2OBS=${nth_max}
export MPIRUN="${launcher} -n ${npe_fit2obs} --cpus-per-task=${NTHREADS_FIT2OBS}"

fi
Loading