Optimization options #825
-
I built the SRW v2.1 (and HPC-stack) on an unsupported platform (Borah: Boise State cluster) using intel/2021/2.0.2883 and mpi/2021/2.0.2883 Borah has 48 processors per node. I notice if I run, say 6:ppn=48, the run time is MUCH slower compared to 6:ppn=24 with exclusive node use. I have run 553x355 domain 8-day run in 08:15 wall time, but trying to get it in under 7 hours using 6 nodes for an operational ensemble.
|
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 3 replies
-
@hertneky It's interesting that the runtime was so much slower when running 6:48. When run this way, OMP_NUM_THREADS_RUN_FCST needs to be set to 1 (it's 2 by default) so each PE runs on one core instead of 2. Did you try this? In my experience,, running with more instances and fewer threads tends to be faster. You might try setting that in your config file and rerunning setup.py. Note that this will use ~2x the memory on the node, so you might want to see if the 6:24 case was already bumping up against that limit. If you are running slurm for your job manager, you can check this with For KMP_AFFINITY_RUN_FCST, "scatter" is probably your best bet, which is the default, but you could also try "compact" if you want a second option. For compiler options, did you create a cmake configuration file in /sorc/ufs-weather-model/cmake? If so, you could try compiling using |
Beta Was this translation helpful? Give feedback.
@hertneky It's interesting that the runtime was so much slower when running 6:48. When run this way, OMP_NUM_THREADS_RUN_FCST needs to be set to 1 (it's 2 by default) so each PE runs on one core instead of 2. Did you try this? In my experience,, running with more instances and fewer threads tends to be faster. You might try setting that in your config file and rerunning setup.py. Note that this will use ~2x the memory on the node, so you might want to see if the 6:24 case was already bumping up against that limit. If you are running slurm for your job manager, you can check this with
sacct -j <job number> -o "JobName,MaxRSS"
, whereMaxRSS
is the maximum memory used by a single PE. Multipl…