Divide-by-zero in cvmix_math_cubic_root_find() #65

nichannah · 2016-06-02T03:09:05Z

Newton's method is encountering a stationary point. Suggest using one of the bracketed algorithms?

mnlevy1981 · 2016-06-02T05:36:50Z

Would it be possible to include a block of code along the lines of

if (abs(slope).lt.CVMIX_MATH_NEWTON_TOL) then
  print*, 'x0 = ', x0
  print*, 'root = ', root
  print*, 'coeffs(:) = ', coeffs
end if

Just prior to root = root - fun_val/slope line? Knowing the values of x0, coeffs(:), and root that are resulting in the divide-by-zero might be helpful in figuring out how to define the bracket if we fall back to a bracketing algorithm.

Relatedly, another ocean parameterization library I'm working on has a logging module that makes it easy to pass status messages / error codes back to the GCM. While it wouldn't fix your problem, I think having something like that (and being able to trace the calls that are resulting in the divide-by-zero, in this case perhaps getting k values from compute_OBL_depth) would be extremely useful. So I might implement something like that first, and then try to find a better root-finder afterwards.

I'll try to find some time this weekend or next to work on this, but if you guys make any progress at GFDL please keep me in the loop!

nichannah · 2016-06-03T13:07:38Z

Hi Michael,
Here is some output just prior to the crash. I agree that a return value would be useful in this case. Regarding the bracketing, is there any harm in making this wide (deep)? It could be something that is only done when Newton's method fails.

slope = 0.000000000000000E+000
x0 = -2.05240021255362
root = -2.05240021255362
coeffs(:) = 2.939341553426429E+030 4.296445014121396E+030
2.093375837620301E+030 3.399882447875836E+029
forrtl: error (73): floating divide by zero
Image PC Routine Line Source
MOM6 000000000077D8E5 cvmix_math_mp_cvm 205 cvmix_math.F90
MOM6 0000000000A5E8F4 cvmix_kpp_mp_cvmi 1342 cvmix_kpp.F90
MOM6 000000000073FEEC mom_kpp_mp_kpp_ca 607 MOM_KPP.F90

nichannah · 2016-06-09T06:14:42Z

Michael, would you accept a temporary solution to this problem whereby we do newton's method on the cubic/quadratic and if that fails we fall back to doing it on a linear function? This will involve adding a return value to cvmix_math_cubic_root_find(). It has the advantage of being quick/easy to implement so I can close the MOM6 issue. We can then come up with a more robust solution later (perhaps using brent or similar).

If you're happy with this then I'll send a pull request with the changes.

mnlevy1981 · 2016-06-10T20:49:31Z

Falling back to a linear function sounds fine, although adjusting your initial guess might be a good option as well. For this particular case, our initial guess of 0.5*(depth(k)+depth(k+1)) happens to fall directly on a local optimum (or saddle point) of the cubic; I wonder if an initial guess of 0.25*(depth(k)+depth(k+1)) or 0.75*(depth(k)+depth(k+1)) would iterate to the root?

Also, it might be worth investigating why the initial guess is a point where the slope is 0. I'm a little surprised (but I suppose it could be a coincidence), so it would be nice to see depth(k-1:k+1) and Ri_bulk(k-1:k+1) (from the call to cvmix_math_poly_interp at cvmix_kpp.F90:1398).

nichannah · 2016-06-15T05:21:07Z

depth(k-1:k+1): -2.05240021240362 -2.05240021250362
-2.05240021260362
Ri_bulk(k-1:k+1): -2.739995556420065E-005 -7.835367389899912E-007
0.340014162061937
slope: 0.000000000000000E+000
forrtl: error (65): floating invalid
Image PC Routine Line Source
MOM6 00000000004C5E33 cvmix_math_mp_cvm 200 cvmix_math.F90
MOM6 0000000000576AE8 cvmix_kpp_mp_cvmi 1408 cvmix_kpp.F90
MOM6 00000000004B8893 mom_kpp_mp_kpp_ca 607 MOM_KPP.F90

…t guess if this is the case. Closes CVMix#65.

adcroft · 2016-06-15T12:54:00Z

This is a case where the column has some vanished layers in the interior. The block that needs to catch this situation is:

    if (k.eq.size(Ri_bulk)) then
      OBL_depth = abs(OBL_limit)
    elseif (k.eq.0) then
      OBL_depth = abs(zt_cntr(1))
    else
      if (k.eq.1) then
        call cvmix_math_poly_interp(coeffs, CVmix_kpp_params_in%interp_type,  &
                               depth(k:k+1), Ri_bulk(k:k+1))
      else
        call cvmix_math_poly_interp(coeffs, CVmix_kpp_params_in%interp_type,  &
                               depth(k:k+1), Ri_bulk(k:k+1), depth(k-1),      &
                               Ri_bulk(k-1))
      end if
      coeffs(1) = coeffs(1)-CVmix_kpp_params_in%ri_crit

      OBL_depth = -cvmix_math_cubic_root_find(coeffs, 0.5_cvmix_r8 *          &
                                                      (depth(k)+depth(k+1)))
.
.
. 
    end if

I think something like this would avoid calling the solver with constant depths:

      if ( abs( (depth(k+1)-depth(k) ) > 0. ) then ! This could use a tolerance rather than 0., to be robust 
        OBL_depth = -cvmix_math_cubic_root_find(coeffs, 0.5_cvmix_r8 *          &
                                                        (depth(k)+depth(k+1)))
      else
        ! Ri_bulk crossed the Ri_crit between k and k+1 but the thickness is vanished
        OBL_depth = 0.5_cvmix_r8 * (depth(k)+depth(k+1))
      endif

mnlevy1981 · 2016-06-15T16:32:14Z

Yeah, I was going to say if depth doesn't change between levels k-1 and k+1, then the cubic interpolation is going to be nonsense: you are trying to find cubic polynomial P(x) such that

P(-2.05240021260362) = -2.739995556420065E-005
P(-2.05240021260362) = -7.835367389899912E-007
P(-2.05240021260362) = 0.340014162061937

and it's computing a 4th constraint that is probably

P'(-2.05240021260362) = inf

I'm not familiar with what you guys are calling "vanishing levels", and wrote most of this code assuming that depth(k) > depth(k+1) for all k, so it isn't surprising that have 3 levels with the same depth and different bulk Richardson numbers is causing a problem.

nichannah · 2016-06-16T06:28:16Z

Just to be clear, depth(k) > depth(k+1) does hold, e.g. in this case depth(k-1:k+1) is:

-2.05240021240362
-2.05240021250362
-2.05240021260362

However I see that suitability of the interpolation is probably limited in this case!

To avoid this the tolerance mentioned in @adcroft code suggestion would need to be dependent on the minimum layer thickness.

adcroft · 2016-06-16T12:54:44Z

The highlighted digits make all the difference. I guess better than the tolerance variant above would be a normalization of the local coordinate so that coeffs are order Ri_bulk. Something like:

    if (k.eq.size(Ri_bulk)) then
      OBL_depth = abs(OBL_limit)
    elseif (k.eq.0) then
      OBL_depth = abs(zt_cntr(1))
    else
      ! dstar is the locally normalized coordinate d* = ( d-depth(k-1) ) / ( depth(k-1) - depth(k+1) )
      dstar(1) = 0. ! Corresponds to depth(k-1)
      dstar(3) = 1. ! Corresponds to depth(k+1)
      if (k>1 .and. depth(k-1) - depth(k+1) > 0.) then
        dstar(2) = ( depth(k) - depth(k+1) ) / ( depth(k-1) - depth(k+1) ) ! 0. < dstar(2) < 1.
        call cvmix_math_poly_interp(coeffs, CVmix_kpp_params_in%interp_type,  &
                               dstar(2:3), Ri_bulk(k:k+1), dstar(1), Ri_bulk(k-1))
      else
        ! For k=1 or vanished separations resort to linear interpolation
        dstar(2) = 0.0_cvmix_r8
        call cvmix_math_poly_interp(coeffs, CVmix_kpp_params_in%interp_type,  &
                               dstar(2:3), Ri_bulk(k:k+1))
      endif
      coeffs(1) = coeffs(1)-CVmix_kpp_params_in%ri_crit

      OBL_depth = depth( max(1,k-1) ) &
                  - cvmix_math_cubic_root_find(coeffs, &
                                               0.5_cvmix_r8 * ( dstar(2)+dstar(3) ) ) &
                    * ( depth( max(1,k-1) ) - depth(k+1) )
.
.
. 
    end if

mnlevy1981 · 2016-06-16T15:00:37Z

Wow, I totally missed the single digit change! I'll use these values of depth and Ri_bulk to create a new regression test, and then see how the regression test performs with the normalized depth values.

As detailed in issue CVMix#65 there is a problem with our polynomial interpolation / root-finding process if the levels are very close together (GFDL ran into an issue when levels were 1e-10 m thick). This commit adds a new test to try to reproduce that problem and also see if normalizing the depths before interpolating would help avoid the problem.

mnlevy1981 · 2016-06-16T16:03:26Z

Okay, I have a pretty basic test in reg_tests/math/ on mnlevy1981/bugfix/better_interp -- my results aren't identical to yours (probably a combination of compiler differences and the fact that I hard-coded depth and Ri_bulk based on your text output above, which might be missing a digit or two). But running my test with gfortran I get output that looks like

 Interpolating with depth
 ----
   2.9393415534264454E+030   4.2964450141214149E+030   2.0933758376203048E+030   3.3998824478758336E+029
 Polynomial is 0 when depth =   -2.0524332880733298     

 Interpolating with normalized depth
 ----
   1.0199375875839882       -4.7597833756802608        6.7997665835970169       -2.7199066334388067     
 Polynomial is 0 when depth =   -2.0524002125037253

We expect the root to be between -2.05240021260362 and -2.05240021240362; the normalized depth provides a root in this range while the non-normalized depth does not (we only have 5 sig figs in the latter case!). Intel and NAG show a similar degradation in the precision for non-normalized depth, while PGI [16.5] for some reason is the opposite

 Interpolating with depth
 ----
   2.9393415534264454E+030   4.2964450141214149E+030   2.0933758376203048E+030 
   3.3998824478758336E+029
 Root when depth =    -2.052400212553620     

 Interpolating with normalized depth
 ----
    1.019937587583988        -4.759783375680261         6.799766583597017      
   -2.719906633438807     
 Root when depth =    -2.052400212303515

Based on these results, though, I definitely support using the normalized-depth option in cvmix_kpp. I used the line

  norm_depth = (depth - depth(1)) / (depth(3) - depth(1))

to normalize the depth and

  root = root*(depth(3) - depth(1)) + depth(1)

to transform back to physical space.

nichannah · 2016-07-01T04:26:21Z

Hi @mnlevy1981 thanks for this. I'll make a pull request based on the work above. Alternatively, if your solution is already complete let me know.

nichannah mentioned this issue Jun 2, 2016

Floating divide by zero in cvmix intel compiler DEBUG build mom-ocean/MOM6#297

Closed

nichannah pushed a commit to nichannah/CVMix-src that referenced this issue Jun 15, 2016

Newton's method returns error if finds a stationary point. Change roo…

1949c64

…t guess if this is the case. Closes CVMix#65.

nichannah mentioned this issue Jun 15, 2016

Newton's method returns error if finds a stationary point. Change roo… #66

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Divide-by-zero in cvmix_math_cubic_root_find() #65

Divide-by-zero in cvmix_math_cubic_root_find() #65

nichannah commented Jun 2, 2016

mnlevy1981 commented Jun 2, 2016

nichannah commented Jun 3, 2016

nichannah commented Jun 9, 2016

mnlevy1981 commented Jun 10, 2016 •

edited

Loading

nichannah commented Jun 15, 2016

adcroft commented Jun 15, 2016

mnlevy1981 commented Jun 15, 2016 •

edited

Loading

nichannah commented Jun 16, 2016

adcroft commented Jun 16, 2016

mnlevy1981 commented Jun 16, 2016

mnlevy1981 commented Jun 16, 2016

nichannah commented Jul 1, 2016 •

edited

Loading

Divide-by-zero in cvmix_math_cubic_root_find() #65

Divide-by-zero in cvmix_math_cubic_root_find() #65

Comments

nichannah commented Jun 2, 2016

mnlevy1981 commented Jun 2, 2016

nichannah commented Jun 3, 2016

nichannah commented Jun 9, 2016

mnlevy1981 commented Jun 10, 2016 • edited Loading

nichannah commented Jun 15, 2016

adcroft commented Jun 15, 2016

mnlevy1981 commented Jun 15, 2016 • edited Loading

nichannah commented Jun 16, 2016

adcroft commented Jun 16, 2016

mnlevy1981 commented Jun 16, 2016

mnlevy1981 commented Jun 16, 2016

nichannah commented Jul 1, 2016 • edited Loading

mnlevy1981 commented Jun 10, 2016 •

edited

Loading

mnlevy1981 commented Jun 15, 2016 •

edited

Loading

nichannah commented Jul 1, 2016 •

edited

Loading