Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update CMEPS #110

Merged
merged 552 commits into from
Feb 5, 2024
Merged

Update CMEPS #110

merged 552 commits into from
Feb 5, 2024

Conversation

uturuncoglu
Copy link
Collaborator

Description of changes

This PR aims to update CMPES with the upstream (ESCOMP) to bring recent developments such as CDEPS inline capability to UFS Weather Model.

Specific notes

Contributors other than yourself, if any: @DeniseWorthen @danrosen25 @binli2337 @BinLiu-NOAA

CMEPS Issues Fixed (include github issue #): N/A

Are changes expected to change answers? (specify if bfb, different at roundoff, more substantial) No

Any User Interface Changes (namelist or namelist defaults changes)?

The user need to add new phase (med_phases_cdeps_run) to the run sequence to activate the feature. Then, needs to provide stream.config configuration file.

Additional changes also made to allow filling exchange fields with all data in the first coupling time step. I introduce set of new mediator namelist option to make it configurable. So, for example I am setting following in the nems.configure for ocean and wave (in mediator section),

  ocn_use_data_first_import = .true.
  wav_use_data_first_import = .true.

By default the values are .false. and it is only usable when CDEPS Inline feature activated. I also modify the wave and ocean prep phases to get the data and pass it to the components.

Testing performed

Please describe the tests along with the target model and machine(s)
If possible, please also added hashes that were used in the testing
Both CESM (Derecho, Intel) and UFS Weather Model (Orion Intel and WCOSS2) head of develop tested on multiple machines.

jedwards4b and others added 30 commits January 11, 2023 13:10
update esmFldsExchange_nems for ungridded wave fields; fix incorrect units in mediator files
        modified:   cesm/nuopc_cap_share/shr_expr_parser_mod.F90
        modified:   cesm/nuopc_cap_share/shr_megan_mod.F90
        modified:   cesm/nuopc_cap_share/shr_megan_mod.F90
        modified:   cesm/nuopc_cap_share/shr_megan_mod.F90
Updates to the MEGAN specifier string parser
enable asyncio using pio
### Description of changes 
Allows IO tasks to be independent of compute tasks in cesm

### Specific notes
(testing in progress)
Contributors other than yourself, if any:
Depends on share (ESCOMP/CESM_share#37) and cime (ESMCI/cime#4340).  

CMEPS Issues Fixed (include github issue #):

Are changes expected to change answers? (specify if bfb, different at roundoff, more substantial) 

Any User Interface Changes (namelist or namelist defaults changes)?

### Testing performed

Testing performed if application target is CESM:
- [ ] (recommended) CIME_DRIVER=nuopc scripts_regression_tests.py
   - machines:
   - details (e.g. failed tests):
- [ ] (recommended) CESM testlist_drv.xml
   - machines and compilers:
   - details (e.g. failed tests):
- [X] (optional) CESM prealpha test
   - machines and compilers cheyenne intel 
   - details (e.g. failed tests): results consistant with cesm2_3_alpha10d
- [ ] (other) please described in detail
   - machines and compilers
   - details (e.g. failed tests):

Testing performed if application target is UFS-coupled:
- [ ] (recommended) UFS-coupled testing
   - description:
   - details (e.g. failed tests):

Testing performed if application target is UFS-HAFS:
- [X] (recommended) UFS-HAFS testing
   - description:
   - details (e.g. failed tests):

### Hashes used for testing:

- [ ] CESM:
  - repository to check out: https://github.com/ESCOMP/CESM.git
  - branch/hash:
- [ ] UFS-coupled, then umbrella repostiory to check out and associated hash:
  - repository to check out:
  - branch/hash:
- [ ] UFS-HAFS, then umbrella repostiory to check out and associated hash:
  - repository to check out:
  - branch/hash:
        new file:   cesm/nuopc_cap_share/shr_lightning_coupling_mod.F90
        modified:   cime_config/namelist_definition_drv_flds.xml
        modified:   mediator/esmFldsExchange_cesm_mod.F90
        modified:   mediator/fd_cesm.yaml
        modified:   cesm/nuopc_cap_share/shr_lightning_coupling_mod.F90
        modified:   cime_config/namelist_definition_drv_flds.xml
        modified:   cime_config/namelist_definition_drv_flds.xml
        modified:   mediator/esmFldsExchange_cesm_mod.F90
        modified:   mediator/fd_cesm.yaml
make xgrid default, handle main task for multidriver cases in esm_tim…
xgrid was causing restart problems; revert this until we can solve those
problems
Revert default aoflux_grid to ogrid
fieldNameList is not always allocated. We could wrap the deallocate in a
conditional, but since allocatable arrays are automatically deallocated
upon leaving a subroutine, this deallocate statement is unnecessary.
Update nvhpc compiler for GPU settings
Remove PGI compiler

	modified:   cime_config/config_component.xml
so that they could assign multiple values to the config_machines.xml
file

	modified:   cime_config/config_component.xml
@uturuncoglu
Copy link
Collaborator Author

uturuncoglu commented Feb 5, 2024

@BinLiu-NOAA I think there is a masking issue in the fixed case. If I look at the So_t on atmosphere grid I am only seeing data over active coupling region and it seems that CDEPS inline does not complete the So_t. This is correct in the original run. If you look at your stream.config file, you are providing So_t as a stream and as it expected original implementation (first run without any fix) completes the field and correct. The later run does not have it (just fills large values) and I think it is wrong.

@uturuncoglu
Copy link
Collaborator Author

@BinLiu-NOAA @DeniseWorthen I'll look at the original run and see what is going on wrong in there. I'll update you when I have something new.

@jkbk2004
Copy link

jkbk2004 commented Feb 5, 2024

All test are done at ufs-community/ufs-weather-model#2028. @DeniseWorthen can you merge this pr?

@DeniseWorthen
Copy link
Collaborator

@jkbk2004 Testing over the weekend uncovered a couple of issues related to this PR and the UWM. Since these features are intended for hafs v2 implentation, there is discussion about how to deal w/ this. @BinLiu-NOAA Would you please chime in here as to your preference?

@jkbk2004
Copy link

jkbk2004 commented Feb 5, 2024

@jkbk2004 Testing over the weekend uncovered a couple of issues related to this PR and the UWM. Since these features are intended for hafs v2 implentation, there is discussion about how to deal w/ this. @BinLiu-NOAA Would you please chime in here as to your preference?

What was issue?

@DeniseWorthen
Copy link
Collaborator

There are fixes needed for the CMEPS implementation of the inline-CDEPS feature.

@BinLiu-NOAA
Copy link

All test are done at ufs-community/ufs-weather-model#2028. @DeniseWorthen can you merge this pr?

@jkbk2004, as you may notice from the above conversations, we (@uturuncoglu, @DeniseWorthen, @binli2337 and myself) found and fixed two bugs from this CMEPS side. The fixes improve the treatment of So_u and So_v when CMEPS-inline-CDEPS does not have those fields in the input stream file, also set the polemethod = ESMF_POLEMETHOD_NONE for the hafs.mom6 coupling mode to ensure the regional MOM6 ocean coupling variables being properly remapped to the FV3ATM grid (without strange extrapolation).

Unfortunately, the ufs-weather-model side regression tests are already done for all supported platforms without these additional CMEPS fixes (which were figured out, tested and verified not until this morning). Meanwhile fortunately, the PR has not yet been merged.

In this situation, I am not sure what is the typical approach to move forward: letting the PR go in first as is and creating a follow-up PR to fix the issues, or updating the PR with the CMEPS bug fixes and rerunning all the RTs. Appreciate it if you and other code/repo admins can provide some suggestions/recommendations here.

Thanks!

@jkbk2004
Copy link

jkbk2004 commented Feb 5, 2024

Can we move on merging and applying another new bug fix pr? CDEPS pr NOAA-EMC/CDEPS#60 (comment) already got merged.

@binli2337
Copy link
Collaborator

@jkbk2004 The CMEPS fix is not related to the CDEPS pr.

@BinLiu-NOAA
Copy link

@jkbk2004 and @DeniseWorthen, I personally prefer the second approach (updating the PR with the CMEPS bug fixes and rerunning all the RTs). But I understand it's more up to you (as ufs-weather-model repo Admin and CMEPS code/reop Admin) to choose/decide which approach to proceed in this special scenario. Thanks!

@DeniseWorthen
Copy link
Collaborator

@BinLiu-NOAA If we merge the current PR, then I assume we're going to just need to turn right around and create a second PR to make the fixes? Those fixes will change the baselines for the same two tests (we need to confirm this is in fact the case). So we either re-create the two baselines now and run against those new ones, or we make a second hotfix PR. At that point, we'll have to create an entire new baseline because the tests will have been added and the fixes will change those baselines.

@DeniseWorthen
Copy link
Collaborator

Remember also that Hera is on maintenance tomorrow.

@jkbk2004
Copy link

jkbk2004 commented Feb 5, 2024

I prefer to move on with merging this pr and applying a new hotfix pr. Then, we can combine with other independent PRs to avoid commit schedule jamming situation.

@BinLiu-NOAA
Copy link

BinLiu-NOAA commented Feb 5, 2024

I prefer to move on with merging this pr and applying a new hotfix pr. Then, we can combine with other independent PRs to avoid commit schedule jamming situation.

Thanks, @jkbk2004! The idea of potential combining the hot fix with other independent PR(s) sounds good to me. So, if @DeniseWorthen also agrees, it sounds reasonable to me to go with this approach. Thanks!

@DeniseWorthen
Copy link
Collaborator

I think it will be faster to re-create the two needed baselines and run against them now for this PR, given that Hera will be down tomorrow. Unless the PRs are combined and ready to run today, that means we won't be able to commit to UWM until Wednesday. @BinLiu-NOAA Does that work for the hafs v2 schedule?

…ling mode

to ensure the regional MOM6 ocean coupling variables being properly remapped to
the FV3ATM grid (without strange extrapolation). (From @binli2337)
@BinLiu-NOAA
Copy link

@jkbk2004 do we have another ufs-weather-model level PR to combined with today for this potential CMEPS hot fix PR (if choosing to merging first and hotfix after)?

@DeniseWorthen, for HAFSv2 code freeze, we definitely would like to have the ufs-weather-model side ready as early as possible. If the CMEPS hot fix is not in the develop branch, we will try to point to the CMEPS feature branch in that case (not ideal though). Thanks!

@jkbk2004
Copy link

jkbk2004 commented Feb 5, 2024

@binli2337 we can combine with ufs-community/ufs-weather-model#2115 and ufs-community/ufs-weather-model#2111. These are at the weather model level and no baseline changes. @uturuncoglu can you apply a quick fix today?

@DeniseWorthen
Copy link
Collaborator

Again, my concern is a) hera downtime tomorrow and b) the time it takes to get a combined pr tested and ready. Remember, once we combine them, we need to run just to confirm that the combined PRs have the intended impacts on the tests.

@BrianCurtis-NOAA
Copy link
Collaborator

BrianCurtis-NOAA commented Feb 5, 2024

It is my recommendation to bring the hotfix into this PR, and get Hera started immediately to beat the downtime tomorrow. I personally agree with Denise, and to me it makes better sense due to the high importance of this work towards a HAFS deadline. What I would hope from HAFS is that bugs like this are found before it makes it to the UFSWM so we don't need such a hotfix in the future.

@jkbk2004
Copy link

jkbk2004 commented Feb 5, 2024

@DeniseWorthen re-test is inevitable once we receive new commit.

@DeniseWorthen
Copy link
Collaborator

The hotfix will require re-creating baselines for the two new tests only. That should be relatively quick on hera, and then hera RTs can kick off today.

@jkbk2004
Copy link

jkbk2004 commented Feb 5, 2024

@uturuncoglu @BinLiu-NOAA please, move faster to apply the fix. @DeniseWorthen @BrianCurtis-NOAA Sanity check can be done with faster machine.

@BinLiu-NOAA
Copy link

Thanks @jkbk2004 @BrianCurtis-NOAA and @DeniseWorthen, I am updating this PR with the bug fixes now. Much appreciated.

@BrianCurtis-NOAA
Copy link
Collaborator

@uturuncoglu @BinLiu-NOAA please, move faster to apply the fix. @DeniseWorthen @BrianCurtis-NOAA Sanity check can be done with faster machine.

Don't worry about sanity check. Just run baselines and we can address if we start seeing failures.

@uturuncoglu
Copy link
Collaborator Author

Sorry all I could not respond to you about recent discussion due to time difference. I think we already made a decision by applying the fixes for this PR. Anyway, let me know if you need anything from my side. Thanks again all for your help.

@FernandoAndrade-NOAA
Copy link

The RT rerun on #2028 has completed successfully, please proceed with the merge process.

@BinLiu-NOAA
Copy link

The RT rerun on #2028 has completed successfully, please proceed with the merge process.

Thanks @FernandoAndrade-NOAA as well as all the code/repo Admins for the speedy RT processing! Much appreciated! @DeniseWorthen looks like this PR is ready to merge probably. Thanks!

@DeniseWorthen DeniseWorthen merged commit 624920d into NOAA-EMC:emc/develop Feb 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.