Problem with 'use_leap_years' #343

abouchat · 2019-08-07T19:11:14Z

When running the QC test case with the 'boxrestore' option activated, I get errors in the thermo step as soon as Jan 1st 2000 is reached. This seems to be due to having the option 'use_leap_years = .true.' activated in the namelist (2000 is a leap year). If I manually put 'use_leap_years = .false.' then the case runs and there are no errors.

Maybe related to #162 ?
Might also be related to using 'ktherm = 0' (i.e. zero-layer thermodynamics) in conjunction with leap years since the errors seem to originate from the 'zerolayer_temperature' routine. See sample of errors below:

(zerolayer_temperature)Thermo iteration does not converge,
 (zerolayer_temperature)Ice thickness:  0.210543751420792
 (zerolayer_temperature)Snow thickness:  6.959153238059593E-002
 (zerolayer_temperature)dTsf, Tsf_errmax:  -325.110616024919       5.0000000000
00000E-004
 (zerolayer_temperature)Tsf:  -1982.97625185358
 (zerolayer_temperature)fsurfn:  -5880.71078384918
 (zerolayer_temperature)fcondtopn, fcondbot  -5874.50719981974       -5874.5071
9981974
   (icepack_warnings_setabort) T :file icepack_therm_0layer.F90 :line
303
(zerolayer_temperature) zerolayer_temperature: Thermo iteration does not conver
ge
(icepack_warnings_aborted) ... (thermo_vertical)
(icepack_warnings_aborted) ... (icepack_step_therm1)
(icepack_step_therm1) ice: Vertical thermo error:
(icepack_warnings_aborted) ... (icepack_step_therm1)

and so on....

The text was updated successfully, but these errors were encountered:

phil-blain · 2019-08-07T19:32:53Z

The use_leap_year namelist flag is only active for the option 'boxrestore', and in the test suites the 'boxrestore' option only runs for 1 or 2 days (I think) so this might not have been noticed before.

dabail10 · 2019-08-07T19:35:56Z

This is very curious. I don't know why the leap year option would cause a thermodynamic error like this. Did you try turning on use_leap_years for a ktherm=1 or ktherm=2 case? I have a feeling this is just coincidental.

dabail10 · 2019-08-07T19:37:07Z

Actually, one potential issue is that shortwave is not being generated correctly for the "leap days". However, I would think this is only a problem when you reach Feb 29.

eclare108213 · 2019-08-07T22:00:14Z

The code tends to fail with thermodynamic errors if anything is wrong with any of the forcing, so most likely having the extra day is causing some forcing field to be computed or read in incorrectly, for leap years. Could be shortwave, might be something else. Good catch.

abouchat · 2019-08-09T13:23:26Z

I ran new cases with ktherm=1 or ktherm=2 and use_leap_years on, and the thermodynamic also fails when reaching Jan. 1st 2000 in both cases.

dabail10 · 2019-08-09T15:40:07Z

Thanks for checking. In the CESM, the driver deals with leap years so we haven't seen this in CICE before. Good catch as Elizabeth said.

eclare108213 · 2019-08-09T17:16:55Z

@abouchat would you mind setting up a test to debug the forcing? Use -s debug for the setup, and swap out CICE_RunMod.F90 with the version that prints out various fields (including forcing), called CICE_RunMod.F90_debug (it should be in the same driver directory, and identical except for the print calls). You'll probably need to change the timestep on which printing begins, in ice_diagnostics.F90 -- it just needs to start a few timesteps before the error occurs, so that you can see what changes. If you don't have time, one of the others (maybe me) can do it. Thanks!

abouchat · 2019-08-09T18:43:28Z

I am afraid I won't have time to complete this as my internship at ECCC ends on Aug. 13th, and I won't be able to access the output after this. However, @phil-blain mentioned he will take care of this soon!

phil-blain · 2019-08-28T18:19:32Z

I'm looking at this now. Just setting use_leap_year=true and compiling in debug mode makes the code abort with a division by zero error :

 Forcing data year =         2005
 Atmospheric data files:
 /users/dor/afsg/phb/local/FORCING/CICE_data/forcing/gx1/COREII/MONTHLY/cldf.omi
 p.dat
 /users/dor/afsg/phb/local/FORCING/CICE_data/forcing/gx1/COREII/MONTHLY/prec.nmy
 r.dat
 /users/dor/afsg/phb/local/FORCING/CICE_data/forcing/gx1/COREII/4XDAILY/u_10.200
 5.dat
 /users/dor/afsg/phb/local/FORCING/CICE_data/forcing/gx1/COREII/4XDAILY/v_10.200
 5.dat
 /users/dor/afsg/phb/local/FORCING/CICE_data/forcing/gx1/COREII/4XDAILY/t_10.200
 5.dat
 /users/dor/afsg/phb/local/FORCING/CICE_data/forcing/gx1/COREII/4XDAILY/q_10.200
 5.dat
  Current forcing data year =         2005
forrtl: error (73): floating divide by zero
Image              PC                Routine            Line        Source
cice               0000000001889FC5  Unknown               Unknown  Unknown
cice               0000000001887D87  Unknown               Unknown  Unknown
cice               000000000183E104  Unknown               Unknown  Unknown
cice               000000000183DF16  Unknown               Unknown  Unknown
cice               00000000017CFB36  Unknown               Unknown  Unknown
cice               00000000017D45B6  Unknown               Unknown  Unknown
libpthread.so.0    00007F5118376330  Unknown               Unknown  Unknown
cice               0000000000A70FC6  ice_forcing_mp_co        2286  ice_forcing.F90
cice               0000000000A6A6E6  ice_forcing_mp_ly        2177  ice_forcing.F90
cice               0000000000A2FBBC  ice_forcing_mp_ge         541  ice_forcing.F90
cice               0000000000437DB2  cice_initmod_mp_c         192  CICE_InitMod.F90
cice               0000000000436D6B  cice_initmod_mp_c          51  CICE_InitMod.F90
cice               0000000000436511  MAIN__                     44  CICE.F90
cice               000000000043649E  Unknown               Unknown  Unknown
libc.so.6          00007F5117FBEF45  Unknown               Unknown  Unknown
cice               00000000004363A9  Unknown               Unknown  Unknown

That is on the gx1 grid. It does not happen on gx3.
I will run the code in Totalview to investigate where this comes from.

phil-blain · 2019-08-28T19:45:20Z

the divide by zero error comes from line 2286 in function compute_shortwave of ice_forcing.F90:

e = 1.e5*Qa(i,j)/(0.622_dbl_kind + 0.378_dbl_kind*Qa(i,j))

For some reason the denominator is exactly zero for some indices. In my case it errors at
iblk = 2
i = 32
j =21
when creating the case with -g gx1 -p 1x1.

So the values in Qa are different depending on if use_leap_years is set are not...

eclare108213 · 2019-08-28T20:19:16Z

That's weird. Qa would have to be negative for the denominator to be zero, so it's getting corrupted. Are you testing using data that has realistic values for 366 days in a year? If not, then it does not make sense to set use_leap_years=T, and perhaps we should add a consistency check on this with an abort. The new JRA55 data does include leap days, I believe, but probably (hopefully?) only for 2008.

phil-blain · 2019-08-28T20:20:18Z

The values in Qa are different because of the values of ixm, ixp which are passed to subroutine read_data at line 2134 of ice_forcing.F90, in subroutine LY_data.
The indices ixm, ixp themselves are computed using recnum, which is computed at line 2104 in LY_data using the variable yday from module ice_calendar.

      recnum = 4*int(yday) - 3 + int(real(sec,kind=dbl_kind)/sec6hr)

When use_leap_years=false, I get yday=1, but with use_leap_years=true, I get yday=362, leading to different values of recnum.
yday is set in init_calendar of ice_calendar.F90 so something is fishy in this calendar initialization... so this is definitely related to #162.

phil-blain · 2019-08-28T20:23:40Z

I'm using COREII forcing with the standard namelist values, so

 year_init      = 1997
 use_restart_time = .true.
fyear_init      = 2005
ycycle          = 1

maybe there is an interaction between year_init and fyear_init... I will try some other combination of options. However this happens at initialization so I think it is related to the calendar initialization

eclare108213 · 2019-08-28T20:24:56Z

I don't think the COREII forcing includes leap days, so it probably won't work no matter what years you put in the namelist.

phil-blain · 2019-08-28T20:50:37Z

I agree, but I would have thought that the problem would manifest itself later on. I investigated further and here is what I found:
the value of yday is computed first (=1) in init_calendar (called from cice_init) along with other calendar variables. These variables all get the same values if use_leap_year is set to True or False (except basis_seconds, but from what I understand I think this is normal for this variable).
Then, cice_init calls calendar(time) (line 182) and then the subroutine calendar computes new values for yday (and other values) at line 248 :

call sec2time(nyr,month,mday,basis_seconds+ttime)

After this call, with use_leap_years=false we get :

nyr = 2017 (off by 20 years from year_init, as we mentioned in #350)
month = 1
mday = 1

but with use_leap_years=true we get :

nyr = 2016
month = 12
mday = 27 (4 days difference with the above)

then yday is computed:

 yday = mday + daycal(month)   ! day of the year

So this explain why yday has different values : this comes from the subroutine sec2time which computes stuff differently when use_leap_years=True.

phil-blain · 2019-08-28T21:32:39Z

When using use_restart_time=.false., the divide by zero is not encountered and the code seems to run ok.
Same when using year_init=2005.

I think this issue should be taken care of carefully as part of #162. I wanted to see if a quick fix was possible before the release, but it seems it is not...
I think aborting the code early in the initialization process would be safer, but what should we check for ? @eclare108213 do you think we should add this abort before the next release or we wait ?

dabail10 · 2019-08-28T21:34:16Z

Did you try re-running with use_restart_time = .false.? I think the problem is that it is counting steps from year_init to the date on the restart file. When leap_years is .true., this means it counts the extra leap days between 1997 and 2017. Now, that said it shouldn't matter for the forcing, because the forcing cycle repeats right? When reading the LY data, the relevant variables that it needs it to call interp_coeff. The variable maxrec is hard coded to 365x4 = 1460 as the LY data do not have leap days. The variable recnum is the time slice in the model year. So, ixm will be the first slice in the LY data it needs. ixx is the second slice and ixp is not used in this case. So, marching through the year:

recnum, ixm, ixx, ixm (leap), ixx (leap)
1, 2, 1, 2, 1
2, 3, 2, 3, 2
3, 4, 3, 4, 3
...

1459, 1460, 1459, 1460, 1459
1460, 1, 1460, 1, 1460
1461, *, *, 2, 1
1462, *, *, 3, 2
1463, *, *, 4, 3
1464, *, *, 5, 4

So, in a leap year, you are getting January 1st/2nd slices from the LY 4x daily data. Maybe not ideal, but it works. I just used the following code in NCL:

recnum = ispan(1,1464,1)
ixm = mod(recnum+1460,1460)+1
ixx = mod(recnum-1,1460)+1

Just saw the note about use_restart_time = .false. So, I guess the issue has to do with the difference between the restart time and fyear_init.
Dave

eclare108213 · 2019-08-30T16:35:22Z

Please add a check/abort to subroutine init_forcing_atmo. I'd check whether the forcing data is JRA55 or default, and abort if it's not. Hard-coding is not optimal, but this is a temporary fix until we can overhaul the time manager.

Alternatively, we could reset use_leap_year to false if the forcing is not JRA55, but we're moving away from less-than-transparent changes like that. Aborting is annoying but better.

rallard77 · 2019-09-11T13:38:49Z

Sure, we can do that... Rick

…

On 9/11/2019 8:32 AM, Elizabeth Hunke wrote: How about using 2008 instead of 2005 for the JRA55 test? Sent with BlackBerry Work (www.blackberry.com) From: rallard77 ***@***.******@***.***>> Date: Wednesday, Sep 11, 2019, 7:27 AM To: CICE-Consortium/CICE ***@***.******@***.***>> Cc: Hunke, Elizabeth Clare ***@***.******@***.***>>, Mention ***@***.******@***.***>> Subject: Re: [CICE-Consortium/CICE] Problem with 'use_leap_years' (#343) The present JRA55gx1 test is short and for 2005. We can add a test for 2008 with use_leap_years = true if the group feels it would be worthwhile. Rick On 9/11/2019 8:10 AM, Matthew Turner wrote: > > Is the JRA55 test sufficient enough for testing use_leap_years? Or do > we need/want additional tests? > > — > You are receiving this because you were mentioned. > Reply to this email directly, view it on GitHub > <#343?email_source=notifications&email_token=AG63LB5XQ6AZXTL4AFGA4C3QJDU3PA5CNFSM4IKDN232YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD6ONSCY#issuecomment-530372875>, > or mute the thread > <https://github.com/notifications/unsubscribe-auth/AG63LB5BFQYKTLCIGWVIFGTQJDU3PANCNFSM4IKDN23Q>. > — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub<#343?email_source=notifications&email_token=AB6LTHYRI5GPY5XCZFAFBUDQJDW3NA5CNFSM4IKDN232YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD6OPIKY#issuecomment-530379819>, or mute the thread<https://github.com/notifications/unsubscribe-auth/AB6LTHYX2WKE2THILRALJETQJDW3NANCNFSM4IKDN23Q>. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#343?email_source=notifications&email_token=AG63LB2AVDAO2WNFPJBLDJDQJDXNJA5CNFSM4IKDN232YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD6OPXJI#issuecomment-530381733>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AG63LB3SLIGRT3HLHZLG3CLQJDXNJANCNFSM4IKDN23Q>.

apcraig · 2021-02-02T20:55:20Z

I have a new time manager which should fix many of the problems, although I'm not sure we understand all of them. I'm also reviewing the ice_forcing.F90 code and also testing the older time manager myself. I'm a little surprised by a few things in the implement. In particular, that we still compute the "record number" of the input data instead of reading the time axis from the file. I understand this is probably a hold-over from binary input days, but it's far from ideal. As part of that, it's very difficult to deal with cases where the model calendar and the input data calendar are different. In the case where the input data is gregorian, but we're running noleap in the model, the skipped input data will be Dec 31 on leap years. That seems rather arbitrary. For cases where the model is gregorian and the input data is no leap, it's likely the model is just going to abort trying to read data on Dec 31 that doesn't exist. All this also has to assume the input data frequency (8x, 4x, etc) and there are no checks that the dates are consistent in any way. It's all hardwired into the code.

I'm a little reluctant to fix all this now, but I will review the implementation and add some extra checks to the JRA55 forcing where I can. I'm not going to touch the other forcing modes at this point. I think if we do improve anything, it should be the JRA55 forcing. If I think it's doable, I might go ahead and rewrite some of it. Let me know if that raises any concerns.

eclare108213 · 2021-02-02T21:28:03Z

LOL! @apcraig thanks for pointing this out. Now that you say it, of course I recognize that you're right, but I hadn't even considered how hard-wired the forcing code is. It indeed is a hold-over from binary input, which we aren't long past (e.g. the gx3 core forcing still isn't completely deprecated). My recommendation is to consider fixing these issues for JRA and not worry about the older forcing routines -- I'd like to remove them from the code completely at some point.

apcraig · 2021-02-02T21:32:42Z

In terms of JRA55 forcing, it looks like the model reads the forcing every timestep and that it effectively does no interpolation because it reads the same record into the two input data
slots. Is this what we expect? You can see this in the code here,

     fieldname = 'airtmp'
     call ice_read_nc(ncid,recnum,fieldname,Tair_data(:,:,1,:),debug_n_d, &
          field_loc=field_loc_center, &
          field_type=field_type_scalar)
     call ice_read_nc(ncid,recnum,fieldname,Tair_data(:,:,2,:),debug_n_d, &
          field_loc=field_loc_center, &
          field_type=field_type_scalar)

where recnum is the same in both calls and this is called every timestep even if recnum is not changing. I would have expected that the JRA55 is read only when needed and that it reads two consecutive input times and interpolates between them.

One other thing, it also looks like when the data is shifting from one year to the next, the forcing is going to not interpolate from the end of the prior year, but from the end of the current year in some forcing cases.

Something else I've noticed, the JRA55 input files do have a time axis on them and it seems to be correct for the gregorian calendar. All that is good. But the first timestep on the file is basically Jan 1, 00z. The documentation in the code for the JRA55 data says

!-------------------------------------------------------------------                        
! 3-hourly data                                                                             
!                                                                                           
! Assume that the 3-hourly value is located at the end of the                               
!  3-hour period.  This is the convention for NCEP reanalysis data.                         
!  E.g. record 1 gives conditions at 3 am GMT on 1 January.                                 
!-------------------------------------------------------------------

That suggests to me the time axis defined on the JRA55 input files is off by 3 hours or the model is assuming the data is offset by 3 hours and it isn't. We need to make these consistent. @rallard77 @daveh150 any comments on any of these points? Thanks.

daveh150 · 2021-02-02T22:08:57Z

Thanks for finding this bug Tony. What should be happening is the 2nd call to ‘ice_read_nc’ for each variable should read recnum+1. A check that needs to happen though is that recnum+1 exists. On the last day of a year, the code will try to read a non-existent record. IN this case it probably should read the first record. I can work some code to make this fix. Let me look more into your second question regarding 3 hour offset vs time when we wrote it into the interpolated files. @eclare108213 edited this entry to remove the redundant text included in the email reply.

apcraig · 2021-02-02T22:30:21Z

@daveh150, thanks for the clarification. I will fix this in the code. I'm working on a few other things in the same space. And I will try to address the wrap-around issue as well. For a single year, it's fine. For multiple years, you really do want to read data from two different years during the period where you're between the last timestep on one file and the first from the other.

daveh150 · 2021-02-02T22:43:21Z

Sounds good, Thanks Tony! Let me know if I can help out at all. @eclare108213 edited this entry to remove the redundant text included in the email reply.

apcraig · 2021-02-27T01:47:48Z

I think we can close this soon. The JRA55 bugs have been addressed in #562. I believe the problems that @phil-blain pointed out in the initial date when switching between leap years off and on will be fixed with the new time manager. The new time manager keeps track of the date, not the time in seconds and uses the date information on restart. It could be that there are still some issues, but the JRA55 forcing should work fine with noleap or leap calendars. With noleap it'll just be skipping Dec 31 data of leap years. If someone can summarize a test for me to do with the new time manager, I can give that a try. I do think our experience and bug fixing in the longer JRA55 runs has provided a relatively robust place for testing.

apcraig · 2021-02-27T01:49:57Z

I think one outstanding issue maybe unrelated to the time manager is

Do we have an issue with a divide by zero with Qa that we need to code around or abort on?

apcraig · 2021-03-16T23:07:19Z

This should be fixed in #566

apcraig mentioned this issue Aug 13, 2019

time manager concerns #162

Closed

apcraig assigned apcraig, eclare108213, dabail10, proteanplanet, phil-blain and abouchat Aug 13, 2019

apcraig added Forcing Priority: High Software Engineering Type: Bug labels Aug 13, 2019

rallard77 mentioned this issue Aug 28, 2019

Added new capability in ice_forcing.F90 to test with JRA55 atmospheri… #350

Merged

15 tasks

phil-blain mentioned this issue Nov 24, 2020

Replace forcing for gx1 #249

Closed

phil-blain mentioned this issue Jan 26, 2021

JRA forcing 1996 to 1997 #558

Closed

apcraig mentioned this issue Feb 5, 2021

Update JRA55 forcing implementation #562

Merged

16 tasks

apcraig mentioned this issue Feb 27, 2021

Update Time Manager #566

Merged

16 tasks

apcraig closed this as completed Mar 16, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Problem with 'use_leap_years' #343

Problem with 'use_leap_years' #343

abouchat commented Aug 7, 2019

phil-blain commented Aug 7, 2019

dabail10 commented Aug 7, 2019

dabail10 commented Aug 7, 2019

eclare108213 commented Aug 7, 2019

abouchat commented Aug 9, 2019

dabail10 commented Aug 9, 2019

eclare108213 commented Aug 9, 2019

abouchat commented Aug 9, 2019

phil-blain commented Aug 28, 2019

phil-blain commented Aug 28, 2019

eclare108213 commented Aug 28, 2019

phil-blain commented Aug 28, 2019 •

edited

Loading

phil-blain commented Aug 28, 2019 •

edited

Loading

eclare108213 commented Aug 28, 2019

phil-blain commented Aug 28, 2019 •

edited

Loading

phil-blain commented Aug 28, 2019

dabail10 commented Aug 28, 2019

eclare108213 commented Aug 30, 2019

rallard77 commented Sep 11, 2019 via email

apcraig commented Feb 2, 2021

eclare108213 commented Feb 2, 2021

apcraig commented Feb 2, 2021

daveh150 commented Feb 2, 2021 via email •

edited by eclare108213

Loading

apcraig commented Feb 2, 2021

daveh150 commented Feb 2, 2021 via email •

edited by eclare108213

Loading

apcraig commented Feb 27, 2021

apcraig commented Feb 27, 2021

apcraig commented Mar 16, 2021

Problem with 'use_leap_years' #343

Problem with 'use_leap_years' #343

Comments

abouchat commented Aug 7, 2019

phil-blain commented Aug 7, 2019

dabail10 commented Aug 7, 2019

dabail10 commented Aug 7, 2019

eclare108213 commented Aug 7, 2019

abouchat commented Aug 9, 2019

dabail10 commented Aug 9, 2019

eclare108213 commented Aug 9, 2019

abouchat commented Aug 9, 2019

phil-blain commented Aug 28, 2019

phil-blain commented Aug 28, 2019

eclare108213 commented Aug 28, 2019

phil-blain commented Aug 28, 2019 • edited Loading

phil-blain commented Aug 28, 2019 • edited Loading

eclare108213 commented Aug 28, 2019

phil-blain commented Aug 28, 2019 • edited Loading

phil-blain commented Aug 28, 2019

dabail10 commented Aug 28, 2019

eclare108213 commented Aug 30, 2019

rallard77 commented Sep 11, 2019 via email

apcraig commented Feb 2, 2021

eclare108213 commented Feb 2, 2021

apcraig commented Feb 2, 2021

daveh150 commented Feb 2, 2021 via email • edited by eclare108213 Loading

apcraig commented Feb 2, 2021

daveh150 commented Feb 2, 2021 via email • edited by eclare108213 Loading

apcraig commented Feb 27, 2021

apcraig commented Feb 27, 2021

apcraig commented Mar 16, 2021

phil-blain commented Aug 28, 2019 •

edited

Loading

phil-blain commented Aug 28, 2019 •

edited

Loading

phil-blain commented Aug 28, 2019 •

edited

Loading

daveh150 commented Feb 2, 2021 via email •

edited by eclare108213

Loading

daveh150 commented Feb 2, 2021 via email •

edited by eclare108213

Loading