Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Out of the box pe layout for Derecho --compset 1850_DATM%GSWP3v1_CLM51%BGC-CROP --res ne30pg3_ne30pg3_mg17 is 50% slower than on cheyenne #2306

Open
olyson opened this issue Jan 4, 2024 · 3 comments
Assignees
Labels
enhancement new capability or improved behavior of existing capability performance idea or PR to improve performance (e.g. throughput, memory)

Comments

@olyson
Copy link
Contributor

olyson commented Jan 4, 2024

Maybe we can work on speeding up this configuration which is being used for standalone dead veg simulations, etc.

pe layout on cheyenne (I was getting around 172 yrs/day):

Comp NTASKS NTHRDS ROOTPE PSTRIDE
CPL : 1800/ 1; 36 1
ATM : 36/ 1; 0 1
LND : 1800/ 1; 36 1
ICE : 1800/ 1; 36 1
OCN : 1800/ 1; 36 1
ROF : 1800/ 1; 36 1
GLC : 1800/ 1; 36 1
WAV : 1800/ 1; 36 1
ESP : 1/ 1; 0 1
ESMF_AWARE_THREADING is False
ROOTPE is with respect to 36.0 tasks per node

pe layout on derecho (I'm getting around 86 yrs/day):
Comp NTASKS NTHRDS ROOTPE PSTRIDE
CPL : 640/ 1; 128 1
ATM : 128/ 1; 0 1
LND : 640/ 1; 128 1
ICE : 640/ 1; 128 1
OCN : 640/ 1; 128 1
ROF : 640/ 1; 128 1
GLC : 640/ 1; 128 1
WAV : 640/ 1; 128 1
ESP : 1/ 1; 0 1
ESMF_AWARE_THREADING is False
ROOTPE is with respect to 128.0 tasks per node

@olyson olyson added enhancement new capability or improved behavior of existing capability next this should get some attention in the next week or two. Normally each Thursday SE meeting. labels Jan 4, 2024
@ekluzek
Copy link
Collaborator

ekluzek commented Jan 4, 2024

This is part of #2244

@olyson could you try increasing to 14 nodes (so 640 tasks goes to 1792==14x128)? I'd like to make that the starting point to work from. I'll play around with it after you do that and report back.

Do you want this optimized for speed (throughput) or cost? Or a reasonable combination of the two?

Since this is the new CAM SE workhorse grid I take it this is an important PE layout to optimize....

@olyson
Copy link
Contributor Author

olyson commented Jan 4, 2024

I would think we want a reasonable combination of the two since it is the workhorse resolution.
Note, the priority for this is not quite as high now, as we decided today that we are going to switch to 1deg for our dead veg simulations for ease/speed of postprocessing. But I think in general we'll be doing ne30 simulations more regularly.

5 nodes (default):
Model Cost: 214.59 pe-hrs/simulated_year
Model Throughput: 85.89 simulated_years/day

14 nodes:
Model Cost: 329.17 pe-hrs/simulated_year
Model Throughput: 139.99 simulated_years/day

@ekluzek ekluzek removed the next this should get some attention in the next week or two. Normally each Thursday SE meeting. label Jan 18, 2024
@ekluzek
Copy link
Collaborator

ekluzek commented Jan 18, 2024

@olyson says this is good for now, so we'll postpone further work for later.

@samsrabin samsrabin added the performance idea or PR to improve performance (e.g. throughput, memory) label Feb 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement new capability or improved behavior of existing capability performance idea or PR to improve performance (e.g. throughput, memory)
Projects
None yet
Development

No branches or pull requests

3 participants