Upgrade 1-Degree to 025-Degree RYF Configuration #48

ezhilsabareesh8 · 2024-04-02T23:24:28Z

This pull request addresses the upgrade of the 1-Degree RYF configuration to a higher resolution of 0.25-Degree RYF configuration.

Changes Implemented:

A new CICE grid has been generated from the MOM super grid to resolve the longitude mismatch issue in these PRs 1 and 2
The CFL violation error caused by bad departure points has been fixed by adjusting the time step of CICE and MOM, similar to OM2 quarter-degree configurations. Note that only the time step has been changed and ndtd is not changed to avoid truncation errors in MOM6 refer here.
Fixed Restart File Issue: Multiple restart files were being generated by MOM6 due to netCDF file limits. This issue has been addressed and fixed in this PR.
Updated datm_in, drof_in, and nuopc.runconfig files with a nomask ESMF mesh file for atmosphere and runoff components to resolve NaN.
Used a default ice initial condition for cold start, which will be updated with the initial conditions generated by running this configuration for a longer period.
All input files for this configuration have been updated with OM2 input files refer here.
This 025 degree configuration has been tested and executed for two years without encountering any errors.

Note that the MOM6 parameters listed here needs to be updated and the block size in ice_in needs to be optimised for performance.

anton-seaice

Well done Ezhil! Apologies for many questions!

Can you rebase this to use the latest RYF config changed please?

5. Used a default ice initial condition for cold start, which will be updated with the initial conditions generated by running this configuration for a longer period.

For the 1deg we used initial conditions from 3 hour run of OM2, did we still want to generate those?

Shall we update ice_in to match the OM2 config now or in a later PR?

Does @minghangli-uni have suggestions for performance changes or is that also a later PR?

Shall we try and make a clean commit history to explain each change of somesort? (Or is this all one merge commit?)

manifests/input.yaml

metadata.yaml

config.yaml

manifests/restart.yaml

minghangli-uni · 2024-04-03T03:40:32Z

Shall we update ice_in to match the OM2 config now or in a later PR?
Does @minghangli-uni have suggestions for performance changes or is that also a later PR?

I suggest so. For 0.25deg, the processor shape is not recommended using slenderX2, but square-ice instead. Similar to distribution type, not cartesian but roundrobin instead. The above two changes omit land-only blocks and improve load balance.

This is the updated &domain_nml in ice.in, which is consistent with OM2.

&domain_nml
  block_size_x = 16
  block_size_y = 15
  distribution_type = "roundrobin"
  distribution_wght = "latitude"
  maskhalo_bound = .true.
  maskhalo_dyn = .true.
  maskhalo_remap = .true.
  max_blocks = 1000
  ns_boundary_type = "tripole"
  nx_global = 1440
  ny_global = 1080
  processor_shape = "square-ice"

The default 1deg setting,

&domain_nml
  block_size_x = 16
  block_size_y = 15
  distribution_type = "cartesian"
  distribution_wght = "latitude"
  maskhalo_bound = .true.
  maskhalo_dyn = .true.
  maskhalo_remap = .true.
  max_blocks = 1000
  ns_boundary_type = "tripole"
  nx_global = 1440
  ny_global = 1080
  processor_shape = "slenderX2"

With the updated ice.in, the speedup of ice component can be around 35%, calculated from the mean time of two consecutive runs.

Shall we try and make a clean commit history to explain each change of somesort? (Or is this all one merge commit?)

A clean commit history is better for users to track and understand changes made.

anton-seaice · 2024-04-03T03:45:32Z

Thanks @minghangli-uni . I think you can push that change straight to this branch (025deg_jra55do_ryf_iss101), or if your prefer, make a PR into this branch for Ezhil to review first.

minghangli-uni · 2024-04-03T03:47:51Z

Sure thing. I can make a PR into this branch soon.

minghangli-uni · 2024-04-03T05:59:57Z

MOM_input

@@ -1,7 +1,7 @@
 ! This input file provides the adjustable run-time parameters for version 6 of the Modular Ocean Model (MOM6).

 ! === module MOM ===
-DT = 1800.0
+DT = 1350.0


Should DT_THERM be an integer multiple of DT?

Ideally, DT_THERM should be an integer multiple of DT and less than the forcing or coupling time-step (DT = 1350.0). However, there are no warnings or errors with the current DT_THERM but we have to reduce it, any thoughts @aekiss ?

If we want to do what you've done here, we'll also need to set THERMO_SPANS_COUPLING = True, which may require some testing.

But do we really want to change the coupling timestep? Is this done only because the CICE and coupling timesteps are currently linked? Could we instead try to relax this requirement?

(Sorry, lot's of questions)

This issue will be handled in a separate PR as discussed here

I think we need to decide what to do about DT_THERM here. The current settings are confusing:
DT_THERM > DT but THERMO_SPANS_COUPLING = False (by default).

With these settings, I think the value of DT_THERM in the MOM_input will not be used - instead DT_THERM will be reset internally to DT. If we do want to have DT_THERM > DT we need to set THERMO_SPANS_COUPLING = True which will require we also change to DIABATIC_FIRST = False due to limitations in the MOM6 NUOPC cap.

Thoughts @aekiss?

It's certainly confusing. Do we set single_step_call anywhere? It defaults to true, in which case DT, DT_THERM and THERMO_SPANS_COUPLING are ignored, according to this.

Additional relevant documentation in MOM6 cap is here.

As discussed in today's meeting, let's do option 1 for now.

We can then explore option 2 with different DT_THERM in subsequent tests and PRs. (Note that when THERMO_SPANS_COUPLING = True, DT_THERM will be rounded down to an integer number of coupling timesteps.)

Given how confusingly these parameters interact, it will be good to have MOM_parameter_doc.* added to the repos to provide a sanity check on what is actually being used for the tests when we do option 2 - see COSIMA/access-om3#117. We could do this manually until CI is in place to automate it.

In option 2, if DT_THERM exceeds the coupling timestep, but the coupling timestep matches the dynamic timestep in CICE and baroclinic DT in MOM6, would it make sense to set the thermo timestep in CICE to match that in MOM?

This would involve setting ndtd in CICE to DT_THERM/DT and setting the CICE thermo timestep to ndtd*DT.

Open question: can CICE run with a thermo timestep greater than the coupling timestep?

ezhilsabareesh8 · 2024-04-03T06:10:59Z

For the 1deg we used initial conditions from 3 hour run of OM2, did we still want to generate those?

There's an ongoing discussion about the ice initial conditions as highlighted in issue #50. We initially used the default option due to uncertainties in using OM2-generated initial conditions when switching to a C-grid, as discussed here.

The question remains: should we generate these initial conditions? or leave running the current configuration for a longer duration and examining the ice area output can provide insights into whether using OM2-generated initial conditions is needed or not.

dougiesquire

Thank @ezhilsabareesh8. A couple of initial comments. I'd still like to try and run the configuration, but I haven't had time today.

I think we maybe need to discuss whether we really want to be changing the coupling timestep here (and if so, what we want to do with MOM's thermodynamic timestep).

Also, from what I can tell the tool to generate the CICE grid isn't yet finialised. Should this wait until there is a single established tool?

MOM_input

config.yaml

ice_in

metadata.yaml

dougiesquire · 2024-04-04T04:46:34Z

MOM_input

@@ -1,7 +1,7 @@
 ! This input file provides the adjustable run-time parameters for version 6 of the Modular Ocean Model (MOM6).

 ! === module MOM ===
-DT = 1800.0
+DT = 1350.0


If we want to do what you've done here, we'll also need to set THERMO_SPANS_COUPLING = True, which may require some testing.

But do we really want to change the coupling timestep? Is this done only because the CICE and coupling timesteps are currently linked? Could we instead try to relax this requirement?

(Sorry, lot's of questions)

MOM_input

nuopc.runconfig

manifests/input.yaml

dougiesquire · 2024-04-04T06:21:06Z

@ezhilsabareesh8 I see you just force-pushed changes which has made Github think that some of my review comments are outdated. Please view my comments in this conversation, rather than in the "Files changed" tab, where a number of comments have disappeared.

anton-seaice · 2024-04-04T21:58:14Z

For the 1deg we used initial conditions from 3 hour run of OM2, did we still want to generate those?

There's an ongoing discussion about the ice initial conditions as highlighted in issue #50. We initially used the default option due to uncertainties in using OM2-generated initial conditions when switching to a C-grid, as discussed here.

The question remains: should we generate these initial conditions? or leave running the current configuration for a longer duration and examining the ice area output can provide insights into whether using OM2-generated initial conditions is needed or not.

I think we should generate the initial conditions, so that

the config is as close as possible to OM2 for verification of output, and
to reduce the number of potential stability issues we have during model spin-up (noting the same initial will probably used for CM3 initial set-up too).

anton-seaice · 2024-04-04T22:01:24Z

I think we maybe need to discuss whether we really want to be changing the coupling timestep here (and if so, what we want to do with MOM's thermodynamic timestep).

I guess we could reduce the CICE timestep without reducing the coupling timestep? Is that what you are thinking? Isn't changing the coupling timestep here just matching the OM2 configuration?

Also, from what I can tell the tool to generate the CICE grid isn't yet finialised. Should this wait until there is a single established tool?

I think we should progress:

We will have to update all the configs when the cice grid is settled
Finalising this PR allows for other improvements to be made (noting this is still a config under development). e.g. any cice pararmeter configs, performance tuning etc

minghangli-uni · 2024-04-04T22:07:52Z

But do we really want to change the coupling timestep? Is this done only because the CICE and coupling timesteps are currently linked? Could we instead try to relax this requirement?

@dougiesquire In the current NUOPC setup, if the coupling timestep exceeds or equals 1800s, below two issues popped up,

FATAL from PE 100: write energy: Ocean velocity has been truncated too many times
(abort ice) error = (diagnostic abort) ERROR: bad departure points

From my perspective, the issues mainly stem from CICE, irrespective of the values assigned to ndtd during current tests, whether it's set to 1 or 2.

dougiesquire · 2024-04-04T22:11:35Z

I guess we could reduce the CICE timestep without reducing the coupling timestep? Is that what you are thinking?

Yeah, really just wondering whether that's even sensible. Probably not. And yeah, in OM2 the atmosphere-to-ice coupling is fixed at 3 hourly, but the ice-to-ocean coupling changes with the ocean-baroclinic/ice-thermodynamic timestep (which are the same). In MOM6 we have to think about what we want to do with DT_THERM - i.e. do we want this to be longer than the coupling timestep?

dougiesquire · 2024-04-04T22:20:59Z

@dougiesquire In the current NUOPC setup, if the coupling timestep exceeds or equals 1800s, below two issues popped up

Okay, so maybe we don't really have a choice (at least for now). Changing the coupling timestep along with the ice and mom-baroclinic timesteps is certainly the easiest approach and aligns best with OM2. So the question that remains is what to do now with DT_THERM. In the long run, I presume we'll want this to be longer than the coupling timestep, so maybe now is the time to give THERMO_SPANS_COUPLING a whirl?

(P.S. the reason I'm so hung up on DT_THERM/THERMO_SPANS_COUPLING is because I'll need to think about how this impacts WOMBAT coupling)

anton-seaice · 2024-04-04T22:24:03Z

For reference, Kieran did test a patch to the CICE nuopc driver to run cice several times per coupling timesteps when he was messing around with the atmosphere coupling. It's looking like it won't be needed though.

anton-seaice · 2024-04-04T22:26:12Z

Also - can someone make the 025deg_jra55do_ryf branch write protected?

minghangli-uni · 2024-04-04T22:33:53Z

Okay, so maybe we don't really have a choice (at least for now). Changing the coupling timestep along with the ice and mom-baroclinic timesteps is certainly the easiest approach and aligns best with OM2. So the question that remains is what to do now with DT_THERM. In the long run, I presume we'll want this to be longer than the coupling timestep, so maybe now is the time to give THERMO_SPANS_COUPLING a whirl?

@dougiesquire These two COSIMA/mom6-panan#28 and https://github.com/jsimkins2/nwa25/tree/main/run could be useful for dt_therm selection.

For a quick update, my current test case with DT=1350 and DT_THERM=5400 can run up to 2 years and is still running now.

dougiesquire · 2024-04-04T22:41:15Z

For a quick update, my current test case with DT=1350 and DT_THERM=5400 can run up to 2 years and is still running now.

Did you also set THERMO_SPANS_COUPLING = True? I'm guessing not because the NUOPC cap cannot handle THERMO_SPANS_COUPLING = True with DIABATIC_FIRST = True

minghangli-uni · 2024-04-04T22:44:10Z

Did you also set THERMO_SPANS_COUPLING = True? I'm guessing not because the NUOPC cap cannot handle THERMO_SPANS_COUPLING = True with DIABATIC_FIRST = True

No. THERMO_SPANS_COUPLING is in default False. and DIABATIC_FIRST = False

dougiesquire · 2024-04-04T22:48:35Z

No. THERMO_SPANS_COUPLING is in default False. and DIABATIC_FIRST = False

If you do not set THERMO_SPANS_COUPLING = True then your thermodynamic timestep is just getting reset internally to the baroclinic timestep. Also, have you changed the DIABATIC_FIRST setting because in this PR it is set to True?

minghangli-uni · 2024-04-04T23:01:11Z

have you changed the DIABATIC_FIRST setting because in this PR it is set to True?

It was discussed in https://forum.access-hive.org.au/t/namelist-configuration-discussion-meeting/1917/8 and agreed to better leave it in default first.

ezhilsabareesh8 · 2024-04-04T23:09:19Z

have you changed the DIABATIC_FIRST setting because in this PR it is set to True?

It was discussed in https://forum.access-hive.org.au/t/namelist-configuration-discussion-meeting/1917/8 and agreed to better leave it in default first.

Hi @minghangli-uni, did you change 'DIABATIC_FIRST' to true in the '1deg_jra55do_ryf' branch?. Since this PR is rebased from that.

minghangli-uni · 2024-04-04T23:17:51Z

For 1deg_jra55do_ryf, it is True. But following the MOM6 parameter discussion, I havent run 1deg_jra55do_ryf.

dougiesquire · 2024-04-04T23:19:49Z

I think the DIABATIC_FIRST issue should be handled in a separate PR. Here, as in our other config branches, it is currently set to True.

ADDED: Of course, that is, unless we decide to have DT_THERM > the coupling timestep. In which case, we need to set THERMO_SPANS_COUPLING = True for it to have any effect. Then we will also have to change to DIABATIC_FIRST = False since the NUOPC cap isn't set up to run with THERMO_SPANS_COUPLING = True and DIABATIC_FIRST = True.

aekiss · 2024-04-10T05:24:08Z

Do we also want to merge this or do that as a separate PR?
https://github.com/COSIMA/MOM6-CICE6/tree/025deg_jra55do_ryf_iss135
COSIMA/access-om3#135

ezhilsabareesh8 · 2024-04-10T05:26:57Z

I'm just doing a test run with this PR and if that's successful I'll approve. Are you happy for me to rebase this and deal with the conflicts?

Thanks @dougiesquire, yes that would be great.

…arameters

This is a reasonable default, I haven't investgated in detail.

Co-authored-by: Dougie Squire <[email protected]>

dougiesquire · 2024-04-10T05:52:21Z

Do we also want to merge this or do that as a separate PR?
https://github.com/COSIMA/MOM6-CICE6/tree/025deg_jra55do_ryf_iss135
COSIMA/access-om3#135

I think the changes in https://github.com/COSIMA/MOM6-CICE6/tree/025deg_jra55do_ryf_iss135 that aren't already included in this PR should be handled in separate PR(s). @minghang, can you have another look at the changes here to check that you're happy with them before we merge. The CICE block_size_x, block_size_y and max_blocks, in particular, are different than what you propose in COSIMA/access-om3#135

minghangli-uni · 2024-04-10T06:12:16Z

Hi @dougiesquire My account is @minghangli-uni instead of @minghang, which you have been referring to.

Regarding the parameters block_size_x=30 and block_size_y=27, these values were directly grabbed from the OM2 technical report. And max_blocks=8 is calculated from this code. As I used 288 cores, the corresponding value for max_blocks was 8. However, for the current config, which employs 240 cores, max_blocks should be set to 10.

Given that we have reliable references to validate these parameter values, I prefer to maintain block_size_x=30, block_size_y=27, and max_blocks=10.

I am happy for the rest of the change.

dougiesquire · 2024-04-10T06:13:17Z

My account is @minghangli-uni instead of @minghang, which you have been referring to.

Whoops, sorry!

minghangli-uni · 2024-04-10T06:18:20Z

No problem at all. I'm concerned about you quoting the wrong person, and I might not receive your message in time.

…ithub.com/COSIMA/ACCESS-OM2-1-025-010deg-report

anton-seaice · 2024-04-10T23:18:32Z

Regarding the parameters block_size_x=30 and block_size_y=27, these values were directly grabbed from the OM2 technical report. And max_blocks=8 is calculated from this code. As I used 288 cores, the corresponding value for max_blocks was 8. However, for the current config, which employs 240 cores, max_blocks should be set to 10.

The larger block sizes seem like a more sensible default :)

Feel free to mess with the numbers as much as you would like during the profiling work. The cice doco states:

In practice, blocks should probably not have fewer than about 8 to 10 grid cells in each direction, and more square blocks tend to optimize the volume-to-surface ratio important for communication cost. Often 3 to 8 blocks per processor provide the decompositions flexiblity to create reasonable load balance configurations.

So moving to bigger blocks rather than increasing max_blocks might make sense.

Also see https://cice-consortium-cice.readthedocs.io/en/main/user_guide/ug_implementation.html#fig-distribscorecard . I don't know why we use 'sectrobin' either.

Anyway - this is for future work, not this PR :)

minghangli-uni · 2024-04-10T23:24:49Z

We use roundrobin

https://github.com/COSIMA/MOM6-CICE6/blob/185b44f46f10aa4700fd3b094dfa7fec9484f1e9/ice_in#L70

dougiesquire · 2024-04-11T10:51:40Z

@ezhilsabareesh8 I'm just doing a test run with this PR and if that's successful I'll approve.

So this ran for me, but it took 19 hours to complete 1 year-long run. Is that expected?

minghangli-uni · 2024-04-11T22:54:06Z

Yes, with the current configuration, using DT=DT_THERM=1350s, a year-long run took approximately 19 hours to complete.

dougiesquire

Thanks @ezhilsabareesh8 et al. LGTM and ran as is.

anton-seaice · 2024-04-19T04:31:38Z

Can this get merged ?

dougiesquire · 2024-04-23T01:52:53Z

@ezhilsabareesh8 we're happy for you to merge

ezhilsabareesh8 · 2024-04-23T01:55:16Z

@ezhilsabareesh8 we're happy for you to merge

Thanks @dougiesquire , I will merge then

ezhilsabareesh8 requested review from aekiss, dougiesquire and anton-seaice April 2, 2024 23:24

ezhilsabareesh8 mentioned this pull request Apr 2, 2024

0.25deg configuration COSIMA/access-om3#101

Open

anton-seaice reviewed Apr 2, 2024

View reviewed changes

anton-seaice reviewed Apr 3, 2024

View reviewed changes

manifests/restart.yaml Outdated Show resolved Hide resolved

minghangli-uni reviewed Apr 3, 2024

View reviewed changes

ezhilsabareesh8 force-pushed the 025deg_jra55do_ryf_iss101 branch from d5d6449 to 5224c31 Compare April 4, 2024 06:14

dougiesquire reviewed Apr 4, 2024

View reviewed changes

anton-seaice force-pushed the 025deg_jra55do_ryf_iss101 branch from b2a31a0 to 6eee1f6 Compare April 4, 2024 23:15

ezhilsabareesh8 and others added 9 commits April 10, 2024 15:32

Update configuration to quarter-degree resolution and MOM6 namelist p…

82b6657

…arameters

Set max_blocks based on output from ice.log

4b1144f

Set pio tasks for cice to one per node.

3aef57e

This is a reasonable default, I haven't investgated in detail.

Update config.yaml

56e0a53

Co-authored-by: Dougie Squire <[email protected]>

Correct alignment issues in ice_in

207a45f

Co-authored-by: Dougie Squire <[email protected]>

Restore wave coupler time step

6024d84

Remove unused variable RESTART_CONTROL

1f33f79

Change DT_THERM to 1350

75836c1

Revert to original cpl dt values

2e34841

dougiesquire force-pushed the 025deg_jra55do_ryf_iss101 branch from 043f97b to 2e34841 Compare April 10, 2024 05:52

update block_size_x=30, block_size_y=27, max_blocks=10 from https://g…

185b44f

…ithub.com/COSIMA/ACCESS-OM2-1-025-010deg-report

dougiesquire approved these changes Apr 12, 2024

View reviewed changes

minghangli-uni mentioned this pull request Apr 15, 2024

Investigation on single_step_call COSIMA/access-om3#140

Open

dougiesquire mentioned this pull request Apr 19, 2024

025deg jra55do_ryf:update OM3 executable to latest version #59

Merged

ezhilsabareesh8 merged commit 56fe064 into 025deg_jra55do_ryf Apr 23, 2024

dougiesquire deleted the 025deg_jra55do_ryf_iss101 branch April 23, 2024 07:09

anton-seaice mentioned this pull request Nov 14, 2024

0.25deg CICE6 parameters COSIMA/access-om3#125

Closed

Upgrade 1-Degree to 025-Degree RYF Configuration #48

Upgrade 1-Degree to 025-Degree RYF Configuration #48

Conversation

ezhilsabareesh8 commented Apr 2, 2024

anton-seaice left a comment

Choose a reason for hiding this comment

minghangli-uni commented Apr 3, 2024 • edited Loading

anton-seaice commented Apr 3, 2024 • edited Loading

minghangli-uni commented Apr 3, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dougiesquire Apr 8, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

aekiss Apr 10, 2024 • edited Loading

Choose a reason for hiding this comment

aekiss Apr 10, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ezhilsabareesh8 commented Apr 3, 2024

dougiesquire left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dougiesquire commented Apr 4, 2024

anton-seaice commented Apr 4, 2024

anton-seaice commented Apr 4, 2024

minghangli-uni commented Apr 4, 2024

dougiesquire commented Apr 4, 2024

dougiesquire commented Apr 4, 2024 • edited Loading

anton-seaice commented Apr 4, 2024

anton-seaice commented Apr 4, 2024

minghangli-uni commented Apr 4, 2024 • edited Loading

dougiesquire commented Apr 4, 2024 • edited Loading

minghangli-uni commented Apr 4, 2024

dougiesquire commented Apr 4, 2024

minghangli-uni commented Apr 4, 2024

ezhilsabareesh8 commented Apr 4, 2024

minghangli-uni commented Apr 4, 2024

dougiesquire commented Apr 4, 2024 • edited Loading

aekiss commented Apr 10, 2024

ezhilsabareesh8 commented Apr 10, 2024

dougiesquire commented Apr 10, 2024

minghangli-uni commented Apr 10, 2024

dougiesquire commented Apr 10, 2024

minghangli-uni commented Apr 10, 2024

anton-seaice commented Apr 10, 2024

minghangli-uni commented Apr 10, 2024

dougiesquire commented Apr 11, 2024

minghangli-uni commented Apr 11, 2024

dougiesquire left a comment

Choose a reason for hiding this comment

anton-seaice commented Apr 19, 2024

dougiesquire commented Apr 23, 2024

ezhilsabareesh8 commented Apr 23, 2024

minghangli-uni commented Apr 3, 2024 •

edited

Loading

anton-seaice commented Apr 3, 2024 •

edited

Loading

dougiesquire Apr 8, 2024 •

edited

Loading

aekiss Apr 10, 2024 •

edited

Loading

aekiss Apr 10, 2024 •

edited

Loading

dougiesquire commented Apr 4, 2024 •

edited

Loading

minghangli-uni commented Apr 4, 2024 •

edited

Loading

dougiesquire commented Apr 4, 2024 •

edited

Loading

dougiesquire commented Apr 4, 2024 •

edited

Loading