Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing mpmd.out for NOAA AWS CSP #2003

Conversation

HenryRWinterbottom
Copy link
Contributor

Description

This PR addresses issue #2002. The env/AWSPW.env file has been updated with to now include the appropriate mpmd.out directives. Further, the ush/fv3gfs_downstream_nems.sh has been updated to allow PBS support. The PBS implementation may be considered a workaround for now and can be extended/modified if deemed necessary in the future.

Resolves #2002

Type of change

  • Bug fix (fixes something broken)

Change characteristics

  • Is this a breaking change (a change in existing functionality)? NO
  • Does this change require a documentation update? NO

How has this been tested?

This was tested on NOAA CSP AWS. The CI will test for the RDHPCS platforms.

Checklist

  • Any dependent changes have been merged and published
  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • My changes generate no new warnings
  • New and existing tests pass with my changes
  • I have made corresponding changes to the documentation if necessary

mv mpmd.out "mpmd_${nset}.out"
# We are in a loop over downset, save output from mpmd into nset
# specific output; this if-block is necessary for the NOAA CSP AWS
# platform; AWS uses `mpiexec` (PBS) as it's parallel executable
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WCOSS2 also uses PBS and is handled in the elif [[ "${launcher:-}" =~ ^mpiexec.* ]]; then # mpiexec block.
As seen, the output of each thread is piped to mpmd.${nm}.out
How is AWS executing MPMD and where does its output go?
The options above in AWSPW.env (mpmd_opt="--multi-prog --output=mpmd.%j.%t.out") appear to be SLURM options.

@@ -16,7 +16,7 @@ step=$1

export npe_node_max=36
export launcher="mpiexec.hydra"
export mpmd_opt=""
export mpmd_opt="--multi-prog --output=mpmd.%j.%t.out"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these the right options for mpiexec.hydra? They appear to be slurm options

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rajdpanda Can you please address @aerorahul's questions?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's correct! they are options for srun but not for mpiexec.hydra. Forgot to mention this to Henry. It can be taken out

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rajdpanda So, the only change required is to ush/fv3gfs_downstream_nems.sh?

@HenryRWinterbottom HenryRWinterbottom deleted the bug/awspw_missing_mpmd_directive branch November 20, 2023 17:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

AWS mpmd failure
4 participants