seg fault in gfdl_fv_sat_adj #128
evankalina
started this conversation in
General
Replies: 1 comment 1 reply
-
@evankalina I was wondering if you are running one of the standard HAFS application level regression tests, or another different configuration or different test cases. Typically, this kind of errors indicate the model is experiencing numerical instability issues. Also, if you rerun the forecast, will it fail at the same model integration time step? |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
While using the latest HAFS develop with the FV3_HAFS_v0_gfdlmp_tedmf_nonsst physics suite and HYCOM coupling, I get a segmentation fault during the initialization phase of the forecast job on Orion:
forrtl: severe (174): SIGSEGV, segmentation fault occurred
Image PC Routine Line Source
hafs_forecast.x 00000000044B24D9 Unknown Unknown Unknown
libpthread-2.17.s 00002B0BCC7E65D0 Unknown Unknown Unknown
hafs_forecast.x 000000000367A902 fv_sat_adj_mp_wqs 1248 gfdl_fv_sat_adj.F90
hafs_forecast.x 0000000003676B86 fv_sat_adj_mp_fv_ 658 gfdl_fv_sat_adj.F90
hafs_forecast.x 000000000367387D fv_sat_adj_mp_fv_ 336 gfdl_fv_sat_adj.F90
hafs_forecast.x 00000000031EFFED fdlmp_tedmf_nonss 166 ccpp_FV3_HAFS_v0_gfdlmp_tedmf_nonsst_fast_physics_cap.F90
hafs_forecast.x 0000000003136840 ccpp_static_api_m 481 ccpp_static_api.F90
hafs_forecast.x 000000000250D491 fv_mapz_mod_mp_la 842 fv_mapz.F90
The seg fault occurs during the calculation of saturation vapor pressure in gfdl_fv_sat_adj.F90, which uses a table lookup:
You might think that the index it is outside the bounds of tablew, but it seems unlikely based on how it is calculated:
tablew is declared as allocatable though, so if it hasn't been allocated and then we attempt to use it, it would trigger a seg fault. If tablew is allocated, it is given a length of 2621.
BTW, this failure is identical to one reported by @BinLiu-NOAA in ufs-weather-model issue #718 back in July 2021. He fixed the problem by updating to a newer version of UWM. However, I'm using more recent code (last updated 1/26/22), so unless the bug was fixed and then reintroduced, I do not think it is a version issue.
In Bin's post, he suggests that the appearance of this bug may be sensitive to processor layout. I'm using 20x12 and could try some different ones.
If you have any thoughts about this error, please post.
Beta Was this translation helpful? Give feedback.
All reactions