Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add end of run restart functionality to MOM6 #133

Merged
merged 8 commits into from
May 21, 2024

Conversation

dpsarmie
Copy link

This PR allows the user to create restart files at the end of a run in MOM using the write_restart_at_endofrun configuration option in CMEPS. This configuration option will control end of run restarts for MOM6, CICE, and CMEPS. This PR closes NOAA-EMC/CMEPS#118 and closes ufs-community/ufs-weather-model#2236. A similar PR was made in the CICE repository (CICE PR#77) that will completely resolve these issues.

The code was tested on Hera using the regression test datm_cdeps_control_gefs. This was tested using different combinations of restart setting to try and ensure expected functionality.
If the setting is not set or set to False, a restart file will not be created at the end of the run. Setting the option to true will create the file.
The end of file restarts for CMEPS, CICE, and MOM will all be controlled with this single configuration option.

dpsarmie and others added 3 commits April 23, 2024 18:20
This commit adds end or run restart functionality to MOM6. The
restart files are written when the stop_run alarm has been triggered.
Remove debug check that wrote to log.
This commit allows the user to set write_restart_at_endofrun in
ufs.configure.*.IN allow CMEPS and MOM6 to create a restart at the
end of the run even if it falls outside of the restart interval.
@dpsarmie dpsarmie changed the title Add end of run restart functionality Add end of run restart functionality to MOM6 Apr 24, 2024
dpsarmie and others added 2 commits April 25, 2024 18:11
The call to read the write_restart_at_endofrun config option has been
placed in the InitializeAdvertise subroutine to avoid the the check to
be called at every timestep. Default for the EOR restart has been set
to False.
@dpsarmie dpsarmie marked this pull request as ready for review April 26, 2024 15:33
@jiandewang
Copy link
Collaborator

jiandewang commented May 6, 2024

@DeniseWorthen can you take a preliminary code review before I reach out NCAR side ?

@jiandewang jiandewang self-requested a review May 6, 2024 17:21
Fix issue where wrong alarm was being turned off.
Moving call to read config file for endofrun restart attribute to the InitailizeP0 subroutine.
@jiandewang
Copy link
Collaborator

@dpsarmie I made a try with latest UWM but replaed with your MOM6 branch. Somehow I am not getting what I expected. I used cpld_control_c48_intel as a template and changed restart_n from 12 to 18. The run length is 24hr and the IC is 20210306. With write_restart_at_endofrun=T it shall give us 20210400 and 20240406 ocean restart files. But I only see the frist one being written out. My run dir is on HERA: /scratch1/NCEPDEV/stmp2/Jiande.Wang/FV3_RT/rt_272814/cpld_control_c48_inte-T

can you take a look ? is there anything I missed or is there a specific CMEPS branch that I shall use ?
my UWM: /scratch1/NCEPDEV/climate/Jiande.Wang/working/scratch/MOM6-eor/ufs-weather-model, note here I replaced MOM6 with your branch

@DeniseWorthen
Copy link
Collaborator

DeniseWorthen commented May 8, 2024

@jiandewang I believe the test you want to do is to leave restart_n=12 but make the run length 27. You should get restarts at hour 12,24 and 27. Without this 'end-of-run' setting, you would only get restarts at 12,24, even though you run all the way to 27.

@jiandewang
Copy link
Collaborator

@DeniseWorthen in order to to extend to 27hr run, do I need to change stop_n to 27 (alone with model_configure nhours_fcst=27) ?

@DeniseWorthen
Copy link
Collaborator

@jiandewang stop_n should be set == to fhmax

@jiandewang
Copy link
Collaborator

jiandewang commented May 8, 2024

@just made a quick test, see /scratch1/NCEPDEV/stmp2/Jiande.Wang/FV3_RT/rt_272814/cpld_control_c48_inte-TDW
restart_n = 12
stop_n = 30
nhours_fcst: 30

still don't see final restart file
/scratch1/NCEPDEV/stmp2/Jiande.Wang/FV3_RT/rt_272814/cpld_control_c48_inte-TDW[215]ls -l RESTART/MOM
-rw-r--r-- 1 Jiande.Wang stmp 6926138 May 8 13:52 RESTART/20210322.180000.MOM.res.nc
-rw-r--r-- 1 Jiande.Wang stmp 6926138 May 8 13:56 RESTART/20210323.060000.MOM.res.nc

@DeniseWorthen
Copy link
Collaborator

@dpsarmie Could you please check Jiande's run and see if you can spot the issue?

@dpsarmie
Copy link
Author

dpsarmie commented May 8, 2024

@jiandewang cpld_control_c48 uses parm/ufs.configure.s2s_esmf.IM but that file does not have the configuration option active.

Go ahead and add write_restart_at_endofrun = .true. and rerun the test. That should (hopefully) solve the problem.

@dpsarmie
Copy link
Author

dpsarmie commented May 8, 2024

@DeniseWorthen We can talk about whether or not we should add that option to the other ufs.configure files. Currently, only the HAFS and DATM ufs.configure files have the option in the configuration files.

@jiandewang
Copy link
Collaborator

@dpsarmie but if you see /scratch1/NCEPDEV/stmp2/Jiande.Wang/FV3_RT/rt_272814/cpld_control_c48_inte-TDW/ufs.configure, I have that in line 124

@dpsarmie
Copy link
Author

dpsarmie commented May 8, 2024

@dpsarmie but if you see /scratch1/NCEPDEV/stmp2/Jiande.Wang/FV3_RT/rt_272814/cpld_control_c48_inte-TDW/ufs.configure, I have that in line 124

Ok I see that. I'll try to run the c48 case and see if I can replicate the issue.

@DeniseWorthen
Copy link
Collaborator

@dpsarmie When we add this to UWM, we will want to add a configure variable to all the ufs.config files, but in the RT system, this could be false by default. The G-W will then be able to set it true if they need to.

@dpsarmie
Copy link
Author

dpsarmie commented May 9, 2024

@dpsarmie When we add this to UWM, we will want to add a configure variable to all the ufs.config files, but in the RT system, this could be false by default. The G-W will then be able to set it true if they need to.

Sounds good.


@jiandewang , I'm seeing that this is False in your mediator.log:
(med_phases_restart_alarm_init) write_restart_at_endofrun : F (Line 537)

Haven't found what could be causing the issue causing the mediator flag to be incorrectly set but I'll keep looking tomorrow.

@DeniseWorthen
Copy link
Collaborator

I think the issue might be that Jiande used "true" vs ".true."

@dpsarmie
Copy link
Author

dpsarmie commented May 9, 2024

I have a "true" case queued up on Hera right now. I figured that the parser would handle either case correctly but I'll wait an see what this test shows.

@DeniseWorthen
Copy link
Collaborator

@dpsarmie
Copy link
Author

dpsarmie commented May 9, 2024

actually I tried .true. without success

Ok. Denise is right though, I did a test run with "true" and it wasn't parsed correctly. I'll keep looking through your logs and see if there's any other issues.

@DeniseWorthen
Copy link
Collaborator

yes, just looking through some of the other logicals and it doesn't seem to matter. weird.

@jiandewang
Copy link
Collaborator

just re-submitted my test case with .true.
this time mediator.log shows
(med_phases_restart_alarm_init) write_restart_at_endofrun : T

let's wait for couple of minutes to see what's going on

@jiandewang
Copy link
Collaborator

using .true. give me what I am expecting.
now let me try run length=24, restart_n=18 to see what will happen

@DeniseWorthen
Copy link
Collaborator

@jiandewang Please check that you also get mediator cpl.r files at the same times that you get MOM6 restarts. In the end, when also including the CICE changes, we need all three components to have the capability to write at restart_n and at the end.

@jiandewang
Copy link
Collaborator

@DeniseWorthen yes we got that file
/scratch1/NCEPDEV/stmp2/Jiande.Wang/FV3_RT/rt_272814/cpld_control_c48_inte-TDW1/RESTART[119]ls -l MOM
-rw-r--r-- 1 Jiande.Wang stmp 6926138 May 9 10:03 20210322.180000.MOM.res.nc
-rw-r--r-- 1 Jiande.Wang stmp 6926138 May 9 10:07 20210323.060000.MOM.res.nc
-rw-r--r-- 1 Jiande.Wang stmp 6926138 May 9 10:10 20210323.120000.MOM.res.nc
/scratch1/NCEPDEV/stmp2/Jiande.Wang/FV3_RT/rt_272814/cpld_control_c48_inte-TDW1/RESTART[120]ls -l ufs.cpld.cpl*
-rw-r--r-- 1 Jiande.Wang stmp 8086484 May 9 10:03 ufs.cpld.cpl.r.2021-03-22-64800.nc
-rw-r--r-- 1 Jiande.Wang stmp 8086484 May 9 10:08 ufs.cpld.cpl.r.2021-03-23-21600.nc
-rw-r--r-- 1 Jiande.Wang stmp 8086484 May 9 10:10 ufs.cpld.cpl.r.2021-03-23-43200.nc

for my test case, run length=30, restart_n=12

@jiandewang
Copy link
Collaborator

my second try: run length=60, restart_n=18, works as expected:
/scratch1/NCEPDEV/stmp2/Jiande.Wang/FV3_RT/rt_272814/cpld_control_c48_inte-TDW3[185]ll RESTART/MOM
-rw-r--r-- 1 Jiande.Wang stmp 6926138 May 9 13:58 RESTART/20210323.000000.MOM.res.nc
-rw-r--r-- 1 Jiande.Wang stmp 6926138 May 9 14:05 RESTART/20210323.180000.MOM.res.nc
-rw-r--r-- 1 Jiande.Wang stmp 6926138 May 9 14:12 RESTART/20210324.120000.MOM.res.nc
-rw-r--r-- 1 Jiande.Wang stmp 6926138 May 9 14:14 RESTART/20210324.180000.MOM.res.nc

/scratch1/NCEPDEV/stmp2/Jiande.Wang/FV3_RT/rt_272814/cpld_control_c48_inte-TDW3[186]ll RESTART/ufs*
-rw-r--r-- 1 Jiande.Wang stmp 8086484 May 9 13:59 RESTART/ufs.cpld.cpl.r.2021-03-23-00000.nc
-rw-r--r-- 1 Jiande.Wang stmp 8086484 May 9 14:05 RESTART/ufs.cpld.cpl.r.2021-03-23-64800.nc
-rw-r--r-- 1 Jiande.Wang stmp 8086484 May 9 14:12 RESTART/ufs.cpld.cpl.r.2021-03-24-43200.nc
-rw-r--r-- 1 Jiande.Wang stmp 8086484 May 9 14:14 RESTART/ufs.cpld.cpl.r.2021-03-24-64800.nc

I am going to ask NACR to test on their side to make sure it won't break their system

@jiandewang
Copy link
Collaborator

@dpsarmie MOM6 dev/emc just had one updating. Can you sync your branch ? I will reach out NCAR after you sync your branch. Thanks

@dpsarmie
Copy link
Author

@dpsarmie MOM6 dev/emc just had one updating. Can you sync your branch ? I will reach out NCAR after you sync your branch. Thanks

@jiandewang , I've updated the branch. Thanks again for testing and the help.

@jiandewang
Copy link
Collaborator

now we got greenlight from NCAR. I will prepare a UWM PR for it to get merged to dev/emc

@jiandewang
Copy link
Collaborator

combined with UWM (ufs-community/ufs-weather-model#2205)

@FernandoAndrade-NOAA
Copy link

Testing for #2205 has completed successfully, please continue with merging this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants