Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

numerical differences in gfortran vs. ifort and release vs. debug #154

Open
BenjaminTJohnson opened this issue Jul 26, 2024 · 1 comment

Comments

@BenjaminTJohnson
Copy link
Contributor

BenjaminTJohnson commented Jul 26, 2024

This issue captures a longstanding (and generally ignored) issue with CRTM wherein some ctest results will differ when run in release vs. debug. I don't know that there's a clear solution, but given that it only affects certain ctests, suggests that there might be a fix.

The largest difference on the order of 1e-11, so in no way would this impact anything useful.

ifort, Release:

-- Project version : 3.1.0
-- Fortran compiler : /opt/intel/oneapi/2022.1/compiler/2022.0.1/linux/bin/intel64/ifort
-- Fortran compiler flags :  -assume byterecl -fPIC
-- Build type : Release
-- Fortran compiler flags for release : -O3 -ip -unroll -inline -no-heap-arrays

ifort, Debug:

-- Project version : 3.1.0
-- Fortran compiler : /opt/intel/oneapi/2022.1/compiler/2022.0.1/linux/bin/intel64/ifort
-- Fortran compiler flags :  -assume byterecl -fPIC
-- Build type : DEBUG
-- Fortran compiler flags for debug : -O0 -g -check bounds -traceback -warn -heap-arrays -fpe-all=0 -fpe:0 -ftz -check all

gfortran, Release:

-- Project version : 3.1.0
-- Fortran compiler : /home/bjohnson/spack/opt/spack/linux-centos7-skylake_avx512/gcc-9.3.0/gcc-14.1.0-lg64yhqjdx56qc37ds2rnvguco7tkyug/bin/gfortran
-- Fortran compiler flags : -I/home/bjohnson/spack/opt/spack/linux-centos7-skylake_avx512/gcc-9.3.0/netcdf-fortran-4.6.1-l5onh6o5qivl4qkq7thsiwyn3pge3k62/include -D_REAL8_ -ffree-line-length-none
-- Build type : RELEASE
-- Fortran compiler flags for release : -O3 -funroll-all-loops -fopenmp -finline-functions 

gfortran, Debug:

-- Project version : 3.1.0
-- Fortran compiler : /home/bjohnson/spack/opt/spack/linux-centos7-skylake_avx512/gcc-9.3.0/gcc-14.1.0-lg64yhqjdx56qc37ds2rnvguco7tkyug/bin/gfortran
-- Fortran compiler flags : -I/home/bjohnson/spack/opt/spack/linux-centos7-skylake_avx512/gcc-9.3.0/netcdf-fortran-4.6.1-l5onh6o5qivl4qkq7thsiwyn3pge3k62/include -D_REAL8_ -ffree-line-length-none
-- Build type : DEBUG
-- Fortran compiler flags for debug : -O0 -g -fcheck=bounds -ffpe-trap=invalid,zero,overflow -fbacktrace
@BenjaminTJohnson
Copy link
Contributor Author

gfortran debug vs ifort release (reference)

	 13 - test_forward_Simple_atms_n21 (NUMERICAL)
	 14 - test_forward_Simple_cris-fsr_n21 (NUMERICAL)
	 15 - test_forward_Simple_v.abi_g18 (NUMERICAL)
	 16 - test_forward_Simple_atms_npp (NUMERICAL)
	 17 - test_forward_Simple_cris399_npp (NUMERICAL)
	 18 - test_forward_Simple_v.abi_gr (NUMERICAL)
	 19 - test_forward_Simple_abi_g18 (NUMERICAL)
	 20 - test_forward_Simple_modis_aqua (NUMERICAL)
	 34 - test_forward_ClearSky_cris-fsr_n21 (Failed)
	 41 - test_forward_Aircraft_cris-fsr_n21 (Failed)
	 44 - test_forward_ScatteringSwitch_cris-fsr_n21 (Failed)
	 53 - test_forward_SOI_v.abi_g18 (Failed)
	 56 - test_forward_SOI_v.abi_gr (Failed)
	130 - test_adjoint_Simple_modis_aqua (Failed)
	140 - test_tangent_linear_Simple_cris-fsr_n21 (Failed)
	141 - test_tangent_linear_Simple_v.abi_g18 (Failed)
	143 - test_tangent_linear_Simple_cris399_npp (Failed)
	144 - test_tangent_linear_Simple_v.abi_gr (Failed)
	145 - test_tangent_linear_Simple_abi_g18 (Failed)
	146 - test_tangent_linear_Simple_modis_aqua (Failed)
	149 - test_tangent_linear_ClearSky_v.abi_g18 (Failed)
	152 - test_tangent_linear_ClearSky_v.abi_gr (Failed)
1/22 Test  #13: test_forward_Simple_atms_n21 .................***Exception: Numerical  0.14 sec
Program received signal SIGFPE: Floating-point exception - erroneous arithmetic operation.

Backtrace for this error:
#0  0x7f3f1d4253ff in ???
#1  0x70f9ca in __compare_float_numbers_MOD_cwt_real_double
  at /data/users/bjohnson/CRTM/CRTMv3/src/Utility/Compare_Float_Numbers.f90:697
#2  0x5db77a in __crtm_rtsolution_define_MOD_crtm_rtsolution_compare
  at /data/users/bjohnson/CRTM/CRTMv3/src/RTSolution/CRTM_RTSolution_Define.f90:672
#3  0x40c037 in test_simple
  at /data/users/bjohnson/CRTM/CRTMv3/test/mains/regression/forward/test_Simple/test_Simple.f90:280
#4  0x413a31 in main
  at /data/users/bjohnson/CRTM/CRTMv3/test/mains/regression/forward/test_Simple/test_Simple.f90:14
 2/22 Test  #14: test_forward_Simple_cris-fsr_n21 .............***Exception: Numerical  5.41 sec
Program received signal SIGFPE: Floating-point exception - erroneous arithmetic operation.

Backtrace for this error:
#0  0x7ffbb32d03ff in ???
#1  0x70f9ca in __compare_float_numbers_MOD_cwt_real_double
  at /data/users/bjohnson/CRTM/CRTMv3/src/Utility/Compare_Float_Numbers.f90:697
#2  0x5db77a in __crtm_rtsolution_define_MOD_crtm_rtsolution_compare
  at /data/users/bjohnson/CRTM/CRTMv3/src/RTSolution/CRTM_RTSolution_Define.f90:672
#3  0x40c037 in test_simple
  at /data/users/bjohnson/CRTM/CRTMv3/test/mains/regression/forward/test_Simple/test_Simple.f90:280
#4  0x413a31 in main
  at /data/users/bjohnson/CRTM/CRTMv3/test/mains/regression/forward/test_Simple/test_Simple.f90:14
3/22 Test  #15: test_forward_Simple_v.abi_g18 ................***Exception: Numerical  0.16 sec
Program received signal SIGFPE: Floating-point exception - erroneous arithmetic operation.

Backtrace for this error:
#0  0x7fd60cd223ff in ???
#1  0x70f9ca in __compare_float_numbers_MOD_cwt_real_double
  at /data/users/bjohnson/CRTM/CRTMv3/src/Utility/Compare_Float_Numbers.f90:697
#2  0x5db77a in __crtm_rtsolution_define_MOD_crtm_rtsolution_compare
  at /data/users/bjohnson/CRTM/CRTMv3/src/RTSolution/CRTM_RTSolution_Define.f90:672
#3  0x40c037 in test_simple
  at /data/users/bjohnson/CRTM/CRTMv3/test/mains/regression/forward/test_Simple/test_Simple.f90:280
#4  0x413a31 in main
  at /data/users/bjohnson/CRTM/CRTMv3/test/mains/regression/forward/test_Simple/test_Simple.f90:14
4/22 Test  #16: test_forward_Simple_atms_npp .................***Exception: Numerical  0.15 sec
Program received signal SIGFPE: Floating-point exception - erroneous arithmetic operation.

Backtrace for this error:
#0  0x7f7439a223ff in ???
#1  0x70f9ca in __compare_float_numbers_MOD_cwt_real_double
  at /data/users/bjohnson/CRTM/CRTMv3/src/Utility/Compare_Float_Numbers.f90:697
#2  0x5db77a in __crtm_rtsolution_define_MOD_crtm_rtsolution_compare
  at /data/users/bjohnson/CRTM/CRTMv3/src/RTSolution/CRTM_RTSolution_Define.f90:672
#3  0x40c037 in test_simple
  at /data/users/bjohnson/CRTM/CRTMv3/test/mains/regression/forward/test_Simple/test_Simple.f90:280
#4  0x413a31 in main
  at /data/users/bjohnson/CRTM/CRTMv3/test/mains/regression/forward/test_Simple/test_Simple.f90:14
5/22 Test  #17: test_forward_Simple_cris399_npp ..............***Exception: Numerical  1.10 sec
Program received signal SIGFPE: Floating-point exception - erroneous arithmetic operation.

Backtrace for this error:
#0  0x7fa2828683ff in ???
#1  0x70f9ca in __compare_float_numbers_MOD_cwt_real_double
  at /data/users/bjohnson/CRTM/CRTMv3/src/Utility/Compare_Float_Numbers.f90:697
#2  0x5db77a in __crtm_rtsolution_define_MOD_crtm_rtsolution_compare
  at /data/users/bjohnson/CRTM/CRTMv3/src/RTSolution/CRTM_RTSolution_Define.f90:672
#3  0x40c037 in test_simple
  at /data/users/bjohnson/CRTM/CRTMv3/test/mains/regression/forward/test_Simple/test_Simple.f90:280
#4  0x413a31 in main
  at /data/users/bjohnson/CRTM/CRTMv3/test/mains/regression/forward/test_Simple/test_Simple.f90:14
 6/22 Test  #18: test_forward_Simple_v.abi_gr .................***Exception: Numerical  0.16 sec
Program received signal SIGFPE: Floating-point exception - erroneous arithmetic operation.

Backtrace for this error:
#0  0x7f9e21c933ff in ???
#1  0x70f9ca in __compare_float_numbers_MOD_cwt_real_double
  at /data/users/bjohnson/CRTM/CRTMv3/src/Utility/Compare_Float_Numbers.f90:697
#2  0x5db77a in __crtm_rtsolution_define_MOD_crtm_rtsolution_compare
  at /data/users/bjohnson/CRTM/CRTMv3/src/RTSolution/CRTM_RTSolution_Define.f90:672
#3  0x40c037 in test_simple
  at /data/users/bjohnson/CRTM/CRTMv3/test/mains/regression/forward/test_Simple/test_Simple.f90:280
#4  0x413a31 in main
  at /data/users/bjohnson/CRTM/CRTMv3/test/mains/regression/forward/test_Simple/test_Simple.f90:14
 7/22 Test  #19: test_forward_Simple_abi_g18 ..................***Exception: Numerical  0.19 sec
Program received signal SIGFPE: Floating-point exception - erroneous arithmetic operation.

Backtrace for this error:
#0  0x7fae6852c3ff in ???
#1  0x70f9ca in __compare_float_numbers_MOD_cwt_real_double
  at /data/users/bjohnson/CRTM/CRTMv3/src/Utility/Compare_Float_Numbers.f90:697
#2  0x5db77a in __crtm_rtsolution_define_MOD_crtm_rtsolution_compare
  at /data/users/bjohnson/CRTM/CRTMv3/src/RTSolution/CRTM_RTSolution_Define.f90:672
#3  0x40c037 in test_simple
  at /data/users/bjohnson/CRTM/CRTMv3/test/mains/regression/forward/test_Simple/test_Simple.f90:280
#4  0x413a31 in main
  at /data/users/bjohnson/CRTM/CRTMv3/test/mains/regression/forward/test_Simple/test_Simple.f90:14
 8/22 Test  #20: test_forward_Simple_modis_aqua ...............***Exception: Numerical  0.20 sec
Program received signal SIGFPE: Floating-point exception - erroneous arithmetic operation.

Backtrace for this error:
#0  0x7f54a91c23ff in ???
#1  0x70f9ca in __compare_float_numbers_MOD_cwt_real_double
  at /data/users/bjohnson/CRTM/CRTMv3/src/Utility/Compare_Float_Numbers.f90:697
#2  0x5db77a in __crtm_rtsolution_define_MOD_crtm_rtsolution_compare
  at /data/users/bjohnson/CRTM/CRTMv3/src/RTSolution/CRTM_RTSolution_Define.f90:672
#3  0x40c037 in test_simple
  at /data/users/bjohnson/CRTM/CRTMv3/test/mains/regression/forward/test_Simple/test_Simple.f90:280
#4  0x413a31 in main
  at /data/users/bjohnson/CRTM/CRTMv3/test/mains/regression/forward/test_Simple/test_Simple.f90:14

End of exception errors

9/22 Test  #34: test_forward_ClearSky_cris-fsr_n21 ...........***Failed    0.96 sec
> diff -y test_forward_ClearSky_cris-fsr_n21_gfortran_debug.txt test_forward_ClearSky_cris-fsr_n21_gfortran_release.txt | grep "|"
1/1 Test #34: test_forward_ClearSky_cris-fsr_n21 ...***Failed    2.07 sec		      |	1/1 Test #34: test_forward_ClearSky_cris-fsr_n21 ...***Failed    1.76 sec
CRTM_Tests    =   2.07 sec*proc (1 test)						      |	CRTM_Tests    =   1.76 sec*proc (1 test)
Total Test time (real) =   2.12 sec							      |	Total Test time (real) =   1.85 sec
----

So no difference between debug and release using gfortran.

Here's a "summary" of the differences observed for this specific test:

668K -rw-r--r--  1 bjohnson domain users 3.5M Jul 26 19:01 diff_gd_ir.txt
 512 -rw-r--r--  1 bjohnson domain users 269K Jul 26 19:01 diff_gd_id.txt
 512 -rw-r--r--  1 bjohnson domain users 3.5M Jul 26 19:01 diff_id_ir.txt
 512 -rw-r--r--  1 bjohnson domain users 269K Jul 26 19:02 diff_id_gr.txt
 512 -rw-r--r--  1 bjohnson domain users 3.5M Jul 26 19:02 diff_ir_gr.txt
 512 -rw-r--r--  1 bjohnson domain users  338 Jul 26 19:03 diff_gd_gr.txt

where gd = gfortran_debug, and ir = ifort_release`, etc.

The most differences occur when anything is compared to ifort release. Fewer differences occur when comparing gfortran to ifort debug. The only one with almost no difference is between gfortran debug and gfortran release.

Here's an example of the differences between gfortran release and ifortran release:

<...>
Radiance: num1 = 1.74107149954568E+00, num2 = 1.74107149954568E+00, percent_difference = 1.14779975713045E-13%
Brightness Temperature: num1 = 3.15166486936910E+02, num2 = 3.15166486936910E+02, percent_difference = 1.80359972322143E-14%
Stokes: num1 = 1.74107149954568E+00, num2 = 1.74107149954568E+00, percent_difference = 1.14779975713045E-13%
Up Radiance: num1 = 1.67948500636161E-01, num2 = 1.67948500636161E-01, percent_difference = 4.95787259377050E-14%
Down Radiance: num1 = 1.95273381650301E-01, num2 = 1.95273381650301E-01, percent_difference = 4.26411045597613E-14%
Down Solar Radiance: num1 = 3.68903722986171E+00, num2 = 3.68903722986171E+00, percent_difference = 2.40761576627791E-14%
Radiance: num1 = 1.73341571892138E+00, num2 = 1.73341571892138E+00, percent_difference = 5.12386272955224E-14%
Brightness Temperature: num1 = 3.15104527619051E+02, num2 = 3.15104527619050E+02, percent_difference = 3.60790873367136E-14%
Stokes: num1 = 1.73341571892138E+00, num2 = 1.73341571892138E+00, percent_difference = 5.12386272955224E-14%

The values that produced the largest percent difference:
Down Solar Radiance: num1 = 3.27014581763374E-71, num2 = 3.27014568271136E-71, percent_difference = 4.12588283764277E-06%

And example of differences between gfortran debug vs. ifort debug

<...>
Radiance: num1 = 1.93928626122432E+00, num2 = 1.93928626122432E+00, percent_difference = 4.57992426110106E-14%
Stokes: num1 = 1.93928626122432E+00, num2 = 1.93928626122432E+00, percent_difference = 4.57992426110106E-14%
Up Radiance: num1 = 1.34696913862698E-01, num2 = 1.34696913862698E-01, percent_difference = 8.24237907749579E-14%
Down Radiance: num1 = 1.49189426105494E-01, num2 = 1.49189426105494E-01, percent_difference = 5.58127536384566E-14%
Radiance: num1 = 1.90834969337093E+00, num2 = 1.90834969337093E+00, percent_difference = 5.81771269952125E-14%
Stokes: num1 = 1.90834969337093E+00, num2 = 1.90834969337093E+00, percent_difference = 5.81771269952125E-14%
Up Radiance: num1 = 1.78588101289685E-01, num2 = 1.78588101289685E-01, percent_difference = 6.21666850483101E-14%
Up Radiance: num1 = 1.93723696734729E-01, num2 = 1.93723696734729E-01, percent_difference = 5.73096138127808E-14%
Up Radiance: num1 = 1.95642859357333E-01, num2 = 1.95642859357333E-01, percent_difference = 5.67474339861994E-14%
Down Radiance: num1 = 2.29191736945382E-01, num2 = 2.29191736945382E-01, percent_difference = 4.84407963141242E-14%
Up Radiance: num1 = 1.67019298349406E-01, num2 = 1.67019298349406E-01, percent_difference = 6.64727391144080E-14%

The values that produced the largest percent difference:
Down Solar Radiance: num1 = 1.98443571457145E-11, num2 = 1.98443571457185E-11, percent_difference = 1.98973185487457E-11%

Overall these values are tiny, but I wanted to document these. The numerical issue " Floating-point exception - erroneous arithmetic operation." appears to be a "bug" in the float comparison routine, and likely related to underflow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant