Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UFS-dev PR#92 #108

Merged
merged 4 commits into from
Jan 16, 2024
Merged

UFS-dev PR#92 #108

merged 4 commits into from
Jan 16, 2024

Conversation

grantfirl
Copy link
Collaborator

Identical to ufs-community#1844

Contains changes from #107 until it is merged.

SamuelTrahanNOAA and others added 2 commits August 22, 2023 10:57
…ng PR#1863) (ufs-community#1844)

* Changes to logging and initialization of the CLM Lake Model.
* merge ccpp-physics NCAR#91 (UFS-SRW v3.0.0 SciDoc updates)

1. Use ice thickness hice(i) to find the level in the lake where ice is
   zero.
2. Do not allow lake temperature to be below freezing point if there is
   no ice.
3. If there is no snow or ice, do not allow surface lake temperature to
   be below freezing point.
   These changes fixed the problem with large errors in the energy budget
   at the beginning of the cold-start run with lakes.
4. Added flag to turn on debug print statements in the CLM lake model.

* explicitly turn of frac_ice for flake

* t_grnd(i) should be t_grnd(c)
-------------------------------------------------------------------
Co-authored-by: Samuel Trahan <[email protected]>
Co-authored-by: Grant Firl <[email protected]>
@grantfirl
Copy link
Collaborator Author

Expected BL changes:
COMPILE | rrfs | intel | -DAPP=ATM -DCCPP_SUITES=FV3_RAP,FV3_RAP_sfcdiff,FV3_HRRR,FV3_RRFS_v1beta,FV3_RRFS_v1nssl -D32BIT=ON | | fv3 |
RUN | hrrr_control | - wcoss2 | baseline |
RUN | hrrr_control_qr | - wcoss2 | |
RUN | hrrr_control_decomp | - wcoss2 | |
RUN | hrrr_control_2threads | - wcoss2 | |
RUN | hrrr_control_restart | - wcoss2 | | hrrr_control
RUN | hrrr_control_restart_qr | - wcoss2 | | hrrr_control_qr

COMPILE | atm_debug_dyn32 | intel | -DAPP=ATM -DDEBUG=ON -D32BIT=ON -DCCPP_SUITES=FV3_HRRR,FV3_GFS_v16,FV3_GFS_v16_csawmg,FV3_GFS_v16_ras,FV3_GFS_v17_p8,FV3_GFS_v15_thompson_mynn_lam3km,FV3_RAP,FV3_RAP_unified_ugwp,FV3_RAP_cires_ugwp,FV3_RAP_flake,FV3_RAP_clm_lake,FV3_RAP_noah,FV3_RAP_sfcdiff,FV3_RAP_noah_sfcdiff_cires_ugwp,FV3_RRFS_v1beta | | fv3 |
RUN | hrrr_control_debug | | baseline |
RUN | rap_clm_lake_debug | | baseline |

COMPILE | rrfs_dyn32_phy32 | intel | -DAPP=ATM -DCCPP_SUITES=FV3_RAP,FV3_HRRR -D32BIT=ON -DCCPP_32BIT=ON | | fv3 |
RUN | hrrr_control_dyn32_phy32 | | baseline |
RUN | hrrr_control_qr_dyn32_phy32 | | baseline |
RUN | hrrr_control_2threads_dyn32_phy32 | | |
RUN | hrrr_control_decomp_dyn32_phy32 | | |
RUN | hrrr_control_restart_dyn32_phy32 | | | hrrr_control_dyn32_phy32
RUN | hrrr_control_restart_qr_dyn32_phy32 | | | hrrr_control_qr_dyn32_phy32

COMPILE | rrfs_dyn32_phy32_debug | intel | -DAPP=ATM -DCCPP_SUITES=FV3_RAP,FV3_HRRR -D32BIT=ON -DCCPP_32BIT=ON -DDEBUG=ON | | fv3 |
RUN | hrrr_control_debug_dyn32_phy32 | | baseline |

COMPILE | rrfs | gnu | -DAPP=ATM -DCCPP_SUITES=FV3_RAP,FV3_RAP_sfcdiff,FV3_HRRR,FV3_RRFS_v1beta -D32BIT=ON | + hera cheyenne | fv3 |
RUN | hrrr_control | + hera cheyenne | baseline |
RUN | hrrr_control_qr | + hera cheyenne | |
RUN | hrrr_control_2threads | + hera cheyenne | |
RUN | hrrr_control_decomp | + hera cheyenne | |
RUN | hrrr_control_restart | + hera cheyenne | | hrrr_control
RUN | hrrr_control_restart_qr | + hera cheyenne | | hrrr_control_qr

COMPILE | atm_dyn32_debug | gnu | -DAPP=ATM -D32BIT=ON -DDEBUG=ON | + hera cheyenne | fv3 |
RUN | hrrr_control_debug | + hera cheyenne | baseline |
RUN | rap_clm_lake_debug | + hera cheyenne | baseline |

COMPILE | rrfs_dyn32_phy32 | gnu | -DAPP=ATM -DCCPP_SUITES=FV3_RAP,FV3_HRRR -D32BIT=ON -DCCPP_32BIT=ON | + hera cheyenne | fv3 |
RUN | hrrr_control_dyn32_phy32 | + hera cheyenne | baseline |
RUN | hrrr_control_qr_dyn32_phy32 | + hera cheyenne | baseline |
RUN | hrrr_control_2threads_dyn32_phy32 | + hera cheyenne | |
RUN | hrrr_control_decomp_dyn32_phy32 | + hera cheyenne | |
RUN | hrrr_control_restart_dyn32_phy32 | + hera cheyenne | | hrrr_control_dyn32_phy32
RUN | hrrr_control_restart_qr_dyn32_phy32 | + hera cheyenne | | hrrr_control_qr_dyn32_phy32

COMPILE | atm_dyn32_phy32_debug | gnu | -DAPP=ATM -D32BIT=ON -DCCPP_32BIT=ON -DDEBUG=ON | + hera cheyenne | fv3 |
RUN | hrrr_control_debug_dyn32_phy32 | + hera cheyenne | baseline |

@grantfirl grantfirl mentioned this pull request Nov 3, 2023
@mkavulich
Copy link
Collaborator

@grantfirl Can you remove the old tests under /scratch1/BMC/gmtb/CCPP_regression_testing/NCAR_ufs-weather-model/run_gjf when you get a chance? I just want to make sure we don't run into disk space issues while flying through these tests.

@grantfirl
Copy link
Collaborator Author

grantfirl commented Jan 11, 2024 via email

@grantfirl
Copy link
Collaborator Author

@mkavulich This one is ready to start testing to verify failed tests.

@grantfirl
Copy link
Collaborator Author

@mkavulich Don't we need to do hera-intel-RT first to verify failed tests?

@mkavulich
Copy link
Collaborator

Automated RT Failure Notification
Machine: hera
Compiler: intel
Job: BL
[BL] Repo location: /scratch1/BMC/gmtb/CCPP_regression_testing/NCAR_ufs-weather-model//run//1584956840/20240111214517/ufs-weather-model
Please make changes and add the following label back: hera-intel-BL

@grantfirl
Copy link
Collaborator Author

@mkavulich It looks like the baselines were created successfully, and we can use those, but we still need to run RTs against the old baseline.

@mkavulich
Copy link
Collaborator

Yeah @grantfirl sorry about that, I figured it didn't matter the order in which we did this so long as there were no unexpected failures.

I'm still trying to remember/discover the exact process that's happening in the "new baseline" tests...my initial thoughts would be that it creates a new baseline and runs again against to compare against that same baseline, does that sound correct? It's not actually clear from the logs that that's what's happening, but I haven't had a chance to delve deeply in there yet (they are exceedingly hard to parse)

@mkavulich
Copy link
Collaborator

I'm also not sure why the BL tests are showing a failure. It seems to be some check that is running within the auto-RT on the repository side (rather than the copy staged on disk) that I don't understand. I will keep investigating.

@grantfirl
Copy link
Collaborator Author

Yeah @grantfirl sorry about that, I figured it didn't matter the order in which we did this so long as there were no unexpected failures.

I'm still trying to remember/discover the exact process that's happening in the "new baseline" tests...my initial thoughts would be that it creates a new baseline and runs again against to compare against that same baseline, does that sound correct? It's not actually clear from the logs that that's what's happening, but I haven't had a chance to delve deeply in there yet (they are exceedingly hard to parse)

I guess that the order of RT/BL isn't too important as long as there aren't any unexpected failures in RT, which is why it has typically gone first. You don't want to create new baselines until you are sure that the code is performing as expected.

I also found the scripts super confusing and Dustin and I had a similar discussion about whether RTs are run to check against new baselines (basically just testing reproducibility). I think that we came to the conclusion that it was doing another RT after the baseline was recreated, but I don't think that we were ever 100% certain of this. Since the UFS/EMC/EPIC code managers are ok with only the RT/BL tags, we were assuming that this was good enough for us too.

@mkavulich
Copy link
Collaborator

Automated RT Failure Notification
Machine: hera
Compiler: intel
Job: RT
[RT] Repo location: /scratch1/BMC/gmtb/CCPP_regression_testing/NCAR_ufs-weather-model//run//1584956840/20240112164518/ufs-weather-model
[RT] Error: Test 068 hrrr_control_intel FAIL Tries: 2
[RT] Error: Test 069 hrrr_control_qr_intel FAIL Tries: 2
[RT] Error: Test 070 hrrr_control_decomp_intel FAIL Tries: 2
[RT] Error: Test 071 hrrr_control_2threads_intel FAIL Tries: 2
[RT] Error: Test 104 hrrr_control_debug_intel FAIL Tries: 2
[RT] Error: Test 115 rap_clm_lake_debug_intel FAIL Tries: 2
[RT] Error: Test 120 hrrr_control_dyn32_phy32_intel FAIL Tries: 2
[RT] Error: Test 121 hrrr_control_qr_dyn32_phy32_intel FAIL Tries: 2
[RT] Error: Test 123 hrrr_control_2threads_dyn32_phy32_intel FAIL Tries: 2
[RT] Error: Test 124 hrrr_control_decomp_dyn32_phy32_intel FAIL Tries: 2
[RT] Error: Test 134 hrrr_control_debug_dyn32_phy32_intel FAIL Tries: 2
[RT] Error: Test 196 hrrr_control_gnu FAIL Tries: 2
[RT] Error: Test 197 hrrr_control_qr_gnu FAIL Tries: 2
[RT] Error: Test 198 hrrr_control_2threads_gnu FAIL Tries: 2
[RT] Error: Test 199 hrrr_control_decomp_gnu FAIL Tries: 2
[RT] Error: Test 213 hrrr_control_debug_gnu FAIL Tries: 2
[RT] Error: Test 225 rap_clm_lake_debug_gnu FAIL Tries: 2
[RT] Error: Test 228 hrrr_control_dyn32_phy32_gnu FAIL Tries: 2
[RT] Error: Test 229 hrrr_control_qr_dyn32_phy32_gnu FAIL Tries: 2
[RT] Error: Test 231 hrrr_control_2threads_dyn32_phy32_gnu FAIL Tries: 2
[RT] Error: Test 232 hrrr_control_decomp_dyn32_phy32_gnu FAIL Tries: 2
[RT] Error: Test 242 hrrr_control_debug_dyn32_phy32_gnu FAIL Tries: 2
[RT] Log file shows failures.
[RT] Please obtain logs from /scratch1/BMC/gmtb/CCPP_regression_testing/NCAR_ufs-weather-model//run//1584956840/20240112164518/ufs-weather-model

@mkavulich
Copy link
Collaborator

@grantfirl It looks like the rap_clm_lake_debug tests are unexpected failures, can you confirm that's the case? Otherwise everything else looks expected.

@grantfirl
Copy link
Collaborator Author

@grantfirl It looks like the rap_clm_lake_debug tests are unexpected failures, can you confirm that's the case? Otherwise everything else looks expected.

rap_clm_lake_debug failures are expected. It is listed. This is ready to merge when you approve NCAR/ccpp-physics#1034, NCAR/fv3atm#104, and this one.

Copy link
Collaborator

@mkavulich mkavulich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@grantfirl Sorry, forgot to approve this one 😬

How did you want to go about updating the test logs and baseline location? I could go ahead and do it once tests are successful since I think I have write permissions on your fork, but I don't want to commit things to your branch if you're working on it.

@grantfirl
Copy link
Collaborator Author

@grantfirl Sorry, forgot to approve this one 😬

How did you want to go about updating the test logs and baseline location? I could go ahead and do it once tests are successful since I think I have write permissions on your fork, but I don't want to commit things to your branch if you're working on it.

I can do that. I'll go ahead and start the merge process so that we can move on to the next one.

@grantfirl
Copy link
Collaborator Author

/scratch1/BMC/gmtb/CCPP_regression_testing/NCAR_ufs-weather-model//run//1584956840/20240112164518/ufs-weather-model

@mkavulich I still can't mv the new baselines into the baselines directory on Hera due to lack of permissions. I've done the rest. If you could mv the new baselines and give it a name of main-20240116, we'll be ready to continue with the next one once I update the branches.

@grantfirl grantfirl merged commit c96f7fe into NCAR:main Jan 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants