-
Notifications
You must be signed in to change notification settings - Fork 168
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The UPP job (offline post) failed at Hera #2227
Comments
@WenMeng-NOAA So, I don't think the issue is in the memory and not in the number of threads assignment in |
That is correct. We had to add the |
@aerorahul and @KateFriedman-NOAA From my testing, removing the option "--cpus-per-task=${NTHREADS_UPP}" can solve the off-line post failure. |
I think this needs RDHPCS input as the line was added based on their suggestion and should have no impact on the run success/failure. |
For the UPP gfs standalone tests on Hera, I usually don't specify the memory size but be sure all tasks not in one node.
Following the computation resource configuration for UPP in global-workflow, the offline post is run as: Then having out of memory errors as:
You may look into my runtime log upp.gfs.oe54315683 at /scratch1/NCEPDEV/stmp2/Wen.Meng/gw_test on hera. |
Can you open a ticket w/ Hera RDHPCS and ask them why would adding |
@aerorahul Will do it. |
@WenMeng-NOAA This also might help Since we are putting |
@aerorahul I tested with "srun -l --export=ALL -n 120 --cpus-per-task=2". The job can be successfully completed. Do you have any suggestions on specifying memory or tunning 'NTHREADS_UPP' in env/HERA.env for the overall gfs resource configuration in global-workflow? |
|
@aerorahul No need running UPP with threads. OMP_NUM_THREADS=1 would be good. |
@aerorahul @KateFriedman-NOAA For my off-line post testing in global-workflow, I can make some local changes to in order to complete the job. I am wondering if you see the similar issue from GFS in high resolution case end to end run. The off-line post is used in post processing analysis data from model. |
@aerorahul I am confused about setting of NTHREADS_UPP at Are you meaning the option "--cpus-per-task=??" is for threads? If yes, could this option be removed from off-line post configuration? |
I think we resolved this by providing more memory. |
What is wrong?
The standalone JGLOBAL_ATMOS_UPP job failed on Hera with model history files in C768.
The runtime log indicates out of memory issue:
The C768 case has computation resource configuration as:
What should have happened?
This job is running off-line post to generate GFS master, flux and goes files.
What machines are impacted?
Hera
Steps to reproduce
Checkout global-workflow develop branch, run jobs/rocoto/upp.sh with model history files from GFS V17 HR2.
Additional information
None.
Do you have a proposed solution?
Update env/HERA.env file:
Change
export APRUN_UPP="${launcher} -n ${npe_upp} --cpus-per-task=${NTHREADS_UPP}"
into
export APRUN_UPP="${launcher} -n ${npe_upp}"
The text was updated successfully, but these errors were encountered: