Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing mpmd.out for NOAA AWS CSP #2003

Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion env/AWSPW.env
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ step=$1

export npe_node_max=36
export launcher="mpiexec.hydra"
export mpmd_opt=""
export mpmd_opt="--multi-prog --output=mpmd.%j.%t.out"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these the right options for mpiexec.hydra? They appear to be slurm options

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rajdpanda Can you please address @aerorahul's questions?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's correct! they are options for srun but not for mpiexec.hydra. Forgot to mention this to Henry. It can be taken out

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rajdpanda So, the only change required is to ush/fv3gfs_downstream_nems.sh?


# Configure MPI environment
export OMP_STACKSIZE=2048000
Expand Down Expand Up @@ -48,6 +48,7 @@ elif [[ "${step}" = "post" ]]; then
[[ ${NTHREADS_NP} -gt ${nth_max} ]] && export NTHREADS_NP=${nth_max}
export APRUN_NP="${launcher} -n ${npe_post}"

export USE_CFP="YES"
export NTHREADS_DWN=${nth_dwn:-1}
[[ ${NTHREADS_DWN} -gt ${nth_max} ]] && export NTHREADS_DWN=${nth_max}
export APRUN_DWN="${launcher} -n ${npe_dwn}"
Expand Down
13 changes: 10 additions & 3 deletions ush/fv3gfs_downstream_nems.sh
Original file line number Diff line number Diff line change
Expand Up @@ -135,9 +135,16 @@ for (( nset=1 ; nset <= downset ; nset++ )); do
fi
err_chk

# We are in a loop over downset, save output from mpmd into nset specific output
cat mpmd.out # so we capture output into the main logfile
mv mpmd.out "mpmd_${nset}.out"
# We are in a loop over downset, save output from mpmd into nset
# specific output; this if-block is necessary for the NOAA CSP AWS
# platform; AWS uses `mpiexec` (PBS) as it's parallel executable
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WCOSS2 also uses PBS and is handled in the elif [[ "${launcher:-}" =~ ^mpiexec.* ]]; then # mpiexec block.
As seen, the output of each thread is piped to mpmd.${nm}.out
How is AWS executing MPMD and where does its output go?
The options above in AWSPW.env (mpmd_opt="--multi-prog --output=mpmd.%j.%t.out") appear to be SLURM options.

# launcher rather than `srun` which can use the `MPMD` style; PBS
# can also handle similar situations but the implementation is more
# complicated to implement and this check is sufficient for now.
if [[ -f "mpmd.out" ]]; then
cat mpmd.out # so we capture output into the main logfile
mv mpmd.out "mpmd_${nset}.out"
fi

# Concatenate grib files from each processor into a single one
# and clean-up as you go
Expand Down