Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CMEPS PR for CDEPS Inline implementation #420

Merged
merged 35 commits into from
Jan 31, 2024

Conversation

uturuncoglu
Copy link
Collaborator

@uturuncoglu uturuncoglu commented Dec 6, 2023

Description of changes

This PR aims to bring CDEPS inline capability to CMEPS to fill unmapped regions for the regional modeling configurations. More information can be found in the following presentation: https://docs.google.com/presentation/d/1Tk76zlsRiT7_KMJiZsEJHNvlHciZJJBBhsJurRMxVy4/edit#slide=id.g2613ed2f8f8_0_0

This PR is also linked to complete the implementation: ESCOMP/CDEPS#259

Specific notes

Contributors other than yourself, if any: @danrosen25 @binli2337 @BinLiu-NOAA

CMEPS Issues Fixed (include github issue #): No

Are changes expected to change answers? (specify if bfb, different at roundoff, more substantial) No

Any User Interface Changes (namelist or namelist defaults changes)? No.

The user need to add new phase (med_phases_cdeps_run) to the run sequence to activate the feature. Then, needs to provide stream.config configuration file.

Additional changes also made to allow filling exchange fields with all data in the first coupling time step. I introduce set of new mediator namelist option to make it configurable. So, for example I am setting following in the nems.configure for ocean and wave (in mediator section),

  ocn_use_data_first_import = .true.
  wav_use_data_first_import = .true.

By default the values are .false. and it is only usable when CDEPS Inline feature activated. I also modify the wave and ocean prep phases to get the data and pass it to the components.

Testing performed

Please describe the tests along with the target model and machine(s)
If possible, please also added hashes that were used in the testing

This is initially tested under UFS Weather Model and did. not change the answer for hafs_regional_atm_ocn_wav and cpld_control_p8 cases. More test will be performed with both UFS Weather Model and also CESM.

* add flag to track lake freezing for clm lake
@jedwards4b
Copy link
Collaborator

jedwards4b commented Dec 21, 2023

@uturuncoglu I think a merge of main into your branch should fix the github testing.

@uturuncoglu
Copy link
Collaborator Author

@jedwards4b Okay. I'll do it. I think I am referencing their CDEPS for for this but once it is ready to merge, I'll sync with master. NOAA fork has some data modes that are not pushed to ESCOMP. Also, I have no information about the process of creating input files (incl. meshes) for thoıse data modes. We could push them but they don't have any entry in the CIME interface. Do you think we could still push them to ESCOMP?

@jedwards4b
Copy link
Collaborator

Do you suppose maybe they pushed to a NOAA fork and not to the main repo because they wanted to avoid our rigorous review and testing process???
I don't want stuff in our repo that we have no ability to test.

@uturuncoglu
Copy link
Collaborator Author

@jedwards4b Not sure. We need to coordinate with them. Maybe I could create a branch (forked from ESCOMP) that will only have changes related to CMEPS+CDEPS inline capability. This will allow us to get that feature to CESM but not the extra data modes which seems reasonable solution.

@jedwards4b
Copy link
Collaborator

Yes, I like that idea. Thanks

@uturuncoglu
Copy link
Collaborator Author

@jedwards4b I would like to test this with CESM but I am not sure it can be done on Derecho. Please let me know which tests needs to be run and with which baseline. Then, I could perform some initial test.

BTW, i am seeing following error in CI script regression tests but not sure why,

Finished MODEL_BUILD for test SMS.f19_g16_rx1.A.ubuntu-latest_gnu in 8.613172 seconds (FAIL). [COMPLETED 1 of 1]
    Case dir: /home/runner/cesm/scratch/scripts_regression_test.20231229_171411/SMS.f19_g16_rx1.A.ubuntu-latest_gnu.fake_testing_only_20231229_171411
    Errors were:
        Building test for SMS in directory /home/runner/cesm/scratch/scripts_regression_test.20231229_171411/SMS.f19_g16_rx1.A.ubuntu-latest_gnu.fake_testing_only_20231229_171411
        Error: Nonnegative width required in format string at (1)
        
        Error: ‘src_mask_val’ at (1) is not a member of the ‘shr_stream_streamtype’ structure
        
        Error: ‘src_mask_val’ at (1) is not a member of the ‘shr_stream_streamtype’ structure
        
        Error: Nonnegative width required in format string at (1)
        
        ERROR: BUILD FAIL: buildexe failed, cat /home/runner/cesm/scratch/scripts_regression_test.20231229_171411/SMS.f19_g16_rx1.A.ubuntu-latest_gnu.fake_testing_only_20231229_171411/bld/cesm.bldlog.231229-171504

Waiting for tests to finish
FAIL SMS.f19_g16_rx1.A.ubuntu-latest_gnu (phase MODEL_BUILD)
    Case dir: /home/runner/cesm/scratch/scripts_regression_test.20231229_171411/SMS.f19_g16_rx1.A.ubuntu-latest_gnu.fake_testing_only_20231229_171411
test-scheduler took 62.69175124168396 seconds
    ERRPUT: 

I have some changes in the CDEPS side to related with src_mask_val and dst_mask_val and make them configurable through the config file. So, I am not sure which version of CDEPS is used in here but there might be some issue related with it. So, maybe we need to bring CDEPS changes first and then CMEPS ones. Anyway, let me know what you think?

@jedwards4b
Copy link
Collaborator

@uturuncoglu I don't see any PR with changes to dshr_stream_mod.F90 in CDEPS and there is no variable src_mask_val in the derived type.

@jedwards4b
Copy link
Collaborator

Yes, I think that you need to bring in cdeps changes first.

@uturuncoglu uturuncoglu marked this pull request as ready for review January 5, 2024 21:09
@uturuncoglu uturuncoglu changed the title DRAFT: CDEPS Inline implementation CDEPS Inline implementation Jan 5, 2024
@uturuncoglu
Copy link
Collaborator Author

@jedwards4b I think this is ready for review. I still need to test it with CESM and add some fix for UFS Weather Model field exchange file (the UFS one) but UFS RT are fine. I'll also test ATM+OCN+WAV configuration with MOM6 and that could require some minimal change. This is also linked with ESCOMP/CDEPS#259.

@uturuncoglu
Copy link
Collaborator Author

@jedwards4b Please also note that I also did not modify the CMEPS core build system to activate the CDEPS inline. Also, we need to make some changes in the CIME interface to support new options.

@DeniseWorthen
Copy link
Collaborator

@uturuncoglu I tested your latest commit on this branch for all UWM non-standalone ATM tests and it passed. I only needed to merge in the latest emc/develop to get your changes for the land component.

Just to note---I had tested the total/select change previously, but only for a single test. It turns out that it failed in several tests, but not the one I used.

@uturuncoglu
Copy link
Collaborator Author

@DeniseWorthen Okay that is great. It seems that we don't have issue in UFS side. I will also update the branch with land changes. I still need to test with CESM. I'll do it today and let you know. Thanks for your help.

@uturuncoglu
Copy link
Collaborator Author

@jedwards4b scripts regression tests are still failing because of the newly added namelist options in CDEPS side. I check the action and it seems it's getting CDEPS from CESM repository and since CESM is not using lates version it is failing. We might need to update workflow or CESM to make it work again. I also start testing the fix and I'll compare with the baseline. Keep you updated.

@uturuncoglu
Copy link
Collaborator Author

@DeniseWorthen @jedwards4b JFYI, commit ea995f6 basically brings recent changes made for the land component coupling under UFS.

@uturuncoglu
Copy link
Collaborator Author

@jedwards4b I am creating baseline with following command,

qcmd -l walltime=4:00:00 -- ./create_test --xml-testlist ../../components/cmeps/cime_config/testdefs/testlist_drv.xml --xml-machine cheyenne --xml-category nuopc --machine derecho --xml-compiler intel --generate my_baseline --baseline-root /glade/derecho/scratch/turuncu/baselines --xml-category aux_cmeps

and checking agains with following,

qcmd -l walltime=4:00:00 -- ./create_test --xml-testlist ../../components/cmeps/cime_config/testdefs/testlist_drv.xml --xml-machine derecho --xml-category nuopc --xml-compiler intel --compare my_baseline  --baseline-root /glade/derecho/scratch/turuncu/baselines --xml-category aux_cmeps

The issue is that the second command tries to check the baseline with SMS_Ld2.ww3a.ADWAV.derecho_intel name and fails since I have SMS_Vnuopc_Ld2.ww3a.ADWAV.derecho_intel in the baseline directory. I am not sure but CIME drops the _Vnuopc part and fails to find the baseline even if I provide --xml-category nuopc key to second command. Do you have any idea?

@uturuncoglu
Copy link
Collaborator Author

@jedwards4b I might find the issue. I am trying to create baselines again. Maybe I created the baseline with different version of CIME. Anyway testing now.

@uturuncoglu
Copy link
Collaborator Author

@jedwards4b Okay. The naming issue seems solved but I am still getting error like following,

    FAIL ERS_Ld5.T62_g37.DTEST.derecho_intel MEMCOMP [Errno 2] No such file or directory: '/glade/derecho/scratch/turuncu/baselines_head/my_baseline/ERS_Ld5.T62_g37.DTEST.derecho_intel/cpl-mem.log'
    FAIL ERS_Ld5.T62_g37.DTEST.derecho_intel TPUTCOMP [Errno 2] No such file or directory: '/glade/derecho/scratch/turuncu/baselines_head/my_baseline/ERS_Ld5.T62_g37.DTEST.derecho_intel/cpl-tput.log'

The directory over there but cpl-mem.log and cpl-tput.log are missing for some cases. So, there might be a bug in CIME side. Not sure. Even with those failures, it checks the baseline and those steps are passing now. I think the fix is solved the issue. Anyway, let me know what you think.

I have also fix the issue when add_gusts is turned on. The ERS_Ld5.ne30pg3_t232.BLT1850_v0c.derecho_intel.allactive-defaultio.C.20240128_165135_o5mlnj was failing but passes now.

@DeniseWorthen @BinLiu-NOAA @binli2337 Are you fine with the current state of the CMEPS? I think we were fine in the UFS side (we need to check last add_gusts fix in UFS end too) but you might still need something else in UFS side. Anyway, I just want to double check. If everything is fine we could merge this PR.

@jedwards4b you might want to review this again since there are some changes like fixing the issue in CESM side. You might want to double check if you want. Here is the script that shows the results - /glade/derecho/scratch/turuncu/cs.status.20240128_165135_o5mlnj and the last baseline that I used is in /glade/derecho/scratch/turuncu/baselines_head/my_baseline.

@uturuncoglu
Copy link
Collaborator Author

@jedwards4b To be sure I also run cmeps prealpha tests again. I have some failures like following,

Traceback (most recent call last):
  File "/var/spool/pbs/mom_priv/jobs/2956721.desched1.SC", line 69, in <module>
    _main_func(__doc__)
  File "/var/spool/pbs/mom_priv/jobs/2956721.desched1.SC", line 64, in _main_func
    success = case.case_test(testname=testname, reset=reset, skip_pnl=skip_pnl)
  File "/glade/derecho/scratch/turuncu/CESM/cime/CIME/case/case_test.py", line 81, in case_test
    success = test.run(skip_pnl=skip_pnl)
  File "/glade/derecho/scratch/turuncu/CESM/cime/CIME/SystemTests/system_tests_common.py", line 281, in run
    self.run_phase()
  File "/glade/derecho/scratch/turuncu/CESM/cime/CIME/SystemTests/eri.py", line 186, in run_phase
    _helper(dout_sr1, refdate_2, refsec_2, rundir2)
  File "/glade/derecho/scratch/turuncu/CESM/cime/CIME/SystemTests/eri.py", line 34, in _helper
    os.symlink(item, dst)
FileNotFoundError: [Errno 2] No such file or directory: '/glade/derecho/scratch/turuncu/archive/ERI.T62_g16.C1850ECO.derecho_intel.pop-ecosys.20240128_194145_10ls18.ref1/rest/0001-01-04-00000/ERI.T62_g16.C1850ECO.derecho_intel.pop-ecosys.20240128_194145_10ls18.ref1.pop.r.0001-01-04-00000.nc' -> '/glade/derecho/scratch/turuncu/ERI.T62_g16.C1850ECO.derecho_intel.pop-ecosys.20240128_194145_10ls18.ref2/run/ERI.T62_g16.C1850ECO.derecho_intel.pop-ecosys.20240128_194145_10ls18.ref1.pop.r.0001-01-04-00000.nc'

In this case /glade/derecho/scratch/turuncu/ERI.T62_g16.C1850ECO.derecho_intel.pop-ecosys.20240128_194145_10ls18.ref2/ exists but run directory does not. Also, I had followings in other tests,

RUN: /glade/derecho/scratch/turuncu/CESM/components/cice/bld/generate_cice_decomp.pl -ccsmroot /glade/derecho/scratch/turuncu/CESM -res gx3v7 -nx 100 -ny 116 -nproc 128 -thrds 1 -output all
FROM: /glade/derecho/scratch/turuncu/ERI.T62_g37.G.derecho_intel.pop-cice.20240128_194145_10ls18.ref1
  output: 100 116 5 4 5 sectrobin square-ice

  errput: /bin/sh: module: line 1: syntax error: unexpected end of file
/bin/sh: error importing function definition for `module'

  2024-01-28 19:47:54 ocn
Create namelist for component pop

and

==> SYST ... done.    ==> PWD ... done.
==> TYPE I ... done.  ==> CWD (1) /cesm/inputdata/lnd/clm2/surfdata_esmf/ctsm5.2.0 ...
No such directory ‘cesm/inputdata/lnd/clm2/surfdata_esmf/ctsm5.2.0’.

You could see the failed 3 tests using ./cs.status.20240128_194145_10ls18 | grep FAIL under /glade/derecho/scratch/turuncu/ but others are passing without any issue. I don't think they are related with the PR but please confirm it.

mediator/med.F90 Outdated Show resolved Hide resolved
@uturuncoglu
Copy link
Collaborator Author

@DeniseWorthen @BinLiu-NOAA I just wonder if you need any additional change in this PR. Are all tests fine with this version. @jedwards4b please let me know if you need more testing. If everybody are fine with it then I think we could ready for merge and after that I could create another CMEPS PR to update NOAA-EMC fork.

@DeniseWorthen
Copy link
Collaborator

@uturuncoglu I wasn't sure how your testing at ESCOMP was going. If everything looks good on that end, I think it's ready to merge. Then yes, please create a PR back to NOAA-EMC, and add that PR to your inline-cdeps PR for UWM.

@uturuncoglu
Copy link
Collaborator Author

@DeniseWorthen It seems that CESM tests works fine. @jedwards4b Could you confirm it. Then I could go ahead and merge this PR.

Copy link
Collaborator

@jedwards4b jedwards4b left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good Ufuk - just a few changes requested.

mediator/esmFldsExchange_hafs_mod.F90 Show resolved Hide resolved
mediator/med.F90 Show resolved Hide resolved
mediator/med.F90 Outdated Show resolved Hide resolved
mediator/med_map_mod.F90 Outdated Show resolved Hide resolved
mediator/med_map_mod.F90 Show resolved Hide resolved
mediator/med_methods_mod.F90 Show resolved Hide resolved
@uturuncoglu
Copy link
Collaborator Author

@jedwards4b Thanks. Do you want me to merge it. Or waiting for the test to finish. Probably it will fail unless you update the cdeps version in the CESM side.

@uturuncoglu
Copy link
Collaborator Author

@jedwards4b JFYI, set test failed again with cprnc issue.

@uturuncoglu
Copy link
Collaborator Author

@jedwards4b It seems that there is an issue in checking out https://github.com/ESMCI/cprnc. The srt test has issue with accessing submodule under https://github.com/ESMCI/cime/tree/master/CIME/non_py

@jedwards4b
Copy link
Collaborator

right, working on it now

@jedwards4b jedwards4b merged commit 7e0908c into ESCOMP:main Jan 31, 2024
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants