Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replacement of Float command with real command #631

Merged
merged 40 commits into from
Oct 17, 2023

Conversation

jderber-NOAA
Copy link
Contributor

@jderber-NOAA jderber-NOAA commented Sep 28, 2023

The float command is used throughout the GSI to convert an integer value to a real value. However, the float command only converts to a real4 value. Most of the GSI uses real8 (along with some real16 and real4). The real command allows the conversion to a specified precision (r_kind (real8), r_quad(real16), r_single(real*4). Note that the Intel documentation appears to state that the float command internally uses the real command, so no issues should result from these changes.

The float command is removed from the gsi and replaced with the real command. This pull request addresses issue #627. More details can be found in the issue.

Fixes #627

A few very small changes (e.g. removing multiplication by one, modifying float(n) + float(m) to real(n+m,r_kind)) were also made. (Sorry I just cannot stop myself).

The code can be found in jderber-NOAA/GSI.git in branch float

  • Bug fix (non-breaking change which fixes an issue)

These code changes have been tested using regression tests (ctests) on Hera. All regression tests passed, except one where the only issue was timing. The machine was very busy at the time so this should not be a real issue.

Checklist

  • [x ] My code follows the style guidelines of this project
  • [x ] I have performed a self-review of my own code
  • [none] I have commented my code, particularly in hard-to-understand areas
  • [x ] New and existing tests pass with my changes
  • [none ] Any dependent changes have been merged and published

DUE DATE for this PR is 11/9/2023. If this PR is not merged into develop by this date, the PR will be closed and returned to the developer.

@RussTreadon-NOAA
Copy link
Contributor

NOAA-EMC/GSI develop was updated several times over the past few days. When you have time, please update jderber-NOAA:float to the current head, fae4bbf, of develop

Copy link
Contributor

@azadeh-gh azadeh-gh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tested the changes locally, and everything works as expected.

rjs=float(js)
rje=float(je)
ris=real(is,r_single)
rie=real(is,r_single)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be rie = real(ie, r_single) ?

@XuanliLi-NOAA
Copy link

I found the float command is used in a few other files. Do we want to change them into the real command as well?

atms_spatial_average_mod.f90: Line 844 XT = 1.0_r_kind / FLOAT( N )
raflib.f90: Line 4491 ANRMX=ANORM*RANGE/FLOAT(N)
raflib.f90: Line 4498 THR=THR/FLOAT(N)
ssmis_spatial_average_mod.f90: Line 1554 XT = 1.0 / FLOAT( N )

@jderber-NOAA
Copy link
Contributor Author

You are absolutely correct in both your comments. I am ashamed that I did not search for FLOAT, just float. I will make both of the changes. Good job. This is why we have the reviews. Thank you.

@jderber-NOAA
Copy link
Contributor Author

Will also update to latest version of develop!

@XuanliLi-NOAA
Copy link

Not a problem, that's why we are here. Look forward to see the new version.

@jderber-NOAA
Copy link
Contributor Author

Thank you Azadeh as well. Checking out the code and making sure it works as advertised is important!

Copy link

@XuanliLi-NOAA XuanliLi-NOAA left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look good to me!

@jderber-NOAA
Copy link
Contributor Author

Changes were made to fix 2 issues pointed out by Xuanli. Also, code was update to the current head of develop.

Regression tests were rerun. All tests passed except rtma. Output from rtma was examined and was the same for contrl and updat. Failure for rtma was only due to timing - so all is OK.

@jderber-NOAA
Copy link
Contributor Author

Thank you Xuanli for relooking at changes.

@XuanliLi-NOAA
Copy link

Thank you for making these changes to GSI.

@RussTreadon-NOAA
Copy link
Contributor

@jderber-NOAA , since @azadeh-gh and @XuanliLi-NOAA have approved this PR, we can move forward with getting these changes into develop. Your branch, jderber-NOAA:float is currently two commits behind the head of NOAA-EMC/GSI develop. Please update your branch with the most recent updates from develop. After this, would you please run the standard suite of ctests on Hera. Since you do not have Orion or WCOSS2 access, I'll run the tests on each of these machines. Given successful ctests results, we can schedule this PR for merger into develop.

@jderber-NOAA
Copy link
Contributor Author

Float updated to head of develop.

@RussTreadon-NOAA
Copy link
Contributor

WCOSS2 ctests

Install jderber-NOAA:float on Dogwood. Run ctests with following results

russ.treadon@dlogin03:/lfs/h2/emc/da/noscrub/russ.treadon/git/gsi/pr631/build> ctest -j 9
Test project /lfs/h2/emc/da/noscrub/russ.treadon/git/gsi/pr631/build
    Start 1: global_3dvar
    Start 2: global_4dvar
    Start 3: global_4denvar
    Start 4: hwrf_nmm_d2
    Start 5: hwrf_nmm_d3
    Start 6: rtma
    Start 7: rrfs_3denvar_glbens
    Start 8: netcdf_fv3_regional
    Start 9: global_enkf
1/9 Test #8: netcdf_fv3_regional ..............   Passed  485.79 sec
2/9 Test #5: hwrf_nmm_d3 ......................   Passed  498.94 sec
3/9 Test #7: rrfs_3denvar_glbens ..............   Passed  727.10 sec
4/9 Test #9: global_enkf ......................   Passed  730.26 sec
5/9 Test #4: hwrf_nmm_d2 ......................   Passed  1027.34 sec
6/9 Test #3: global_4denvar ...................   Passed  1449.52 sec
7/9 Test #1: global_3dvar .....................   Passed  1803.82 sec
8/9 Test #2: global_4dvar .....................   Passed  1804.16 sec
9/9 Test #6: rtma .............................***Failed  5413.12 sec

89% tests passed, 1 tests failed out of 9

Total Test time (real) = 5413.20 sec

The following tests FAILED:
          6 - rtma (Failed)

A check of the rtma ctest finds that it failed timing checks

The runtime for rtma_hiproc_updat is 225.958403 seconds.  This has exceeded maximum allowable threshold time of 205.794615 seconds,
resulting in Failure of timethresh2 the regression test.

The case has Failed the scalability test.
The slope for the update (2.098648 seconds per node) is less than that for the control (30.492293 seconds per node).

A check of the wall times

rtma_hiproc_contrl/stdout:The total amount of wall time                        = 187.086014
rtma_hiproc_updat/stdout:The total amount of wall time                        = 225.958403
rtma_loproc_contrl/stdout:The total amount of wall time                        = 217.578307
rtma_loproc_updat/stdout:The total amount of wall time                        = 221.761107

finds that the hiproc_updat ran longer than the `loproc_updat. This is odd. A rerun of the rtma test resulted in a pass

russ.treadon@dlogin03:/lfs/h2/emc/da/noscrub/russ.treadon/git/gsi/pr631/build> ctest -R rtma
Test project /lfs/h2/emc/da/noscrub/russ.treadon/git/gsi/pr631/build
    Start 6: rtma
1/1 Test #6: rtma .............................   Passed  3608.42 sec

100% tests passed, 0 tests failed out of 1

Total Test time (real) = 3608.50 sec

with the updat and contrl wall times being much more consistent

tmpreg_rtma/rtma_hiproc_contrl/stdout:The total amount of wall time                        = 190.691934
tmpreg_rtma/rtma_hiproc_updat/stdout:The total amount of wall time                        = 193.241115
tmpreg_rtma/rtma_loproc_contrl/stdout:The total amount of wall time                        = 221.302087
tmpreg_rtma/rtma_loproc_updat/stdout:The total amount of wall time                        = 221.443565

Orion ctests
Run ctests on Orion with the following results

Orion-login-2:/work2/noaa/da/rtreadon/git/gsi/pr631/build$ ctest -j 9
Test project /work2/noaa/da/rtreadon/git/gsi/pr631/build
    Start 1: global_3dvar
    Start 2: global_4dvar
    Start 3: global_4denvar
    Start 4: hwrf_nmm_d2
    Start 5: hwrf_nmm_d3
    Start 6: rtma
    Start 7: rrfs_3denvar_glbens
    Start 8: netcdf_fv3_regional
    Start 9: global_enkf
1/9 Test #8: netcdf_fv3_regional ..............   Passed  962.75 sec
2/9 Test #5: hwrf_nmm_d3 ......................   Passed  977.74 sec
3/9 Test #7: rrfs_3denvar_glbens ..............   Passed  1026.99 sec
4/9 Test #4: hwrf_nmm_d2 ......................   Passed  1027.94 sec
5/1 Test #9: global_enkf ......................   Passed  1063.61 sec
6/9 Test #6: rtma .............................   Passed  1811.74 sec
7/9 Test #3: global_4denvar ...................   Passed  2151.01 sec
8/9 Test #1: global_3dvar .....................   Passed  2330.21 sec
9/9 Test #2: global_4dvar .....................   Passed  2342.54 sec

100% tests passed, 0 tests failed out of 9

Total Test time (real) = 2342.55 sec

WCOSS2 (Dogwood) and Orion ctest results are acceptable.

@jderber-NOAA
Copy link
Contributor Author

Hera ctest results:

All but 1 regression tests passed. Only the rrfs_3denvar_glbens test did not pass. Looking at the results shows identical results in all tests. The wall times are:
rrfs_3denvar_glbens_hiproc_contrl/stdout:The total amount of wall time = 66.139956
rrfs_3denvar_glbens_hiproc_updat/stdout:The total amount of wall time = 82.715353
rrfs_3denvar_glbens_loproc_contrl/stdout:The total amount of wall time = 117.029998
rrfs_3denvar_glbens_loproc_updat/stdout:The total amount of wall time = 112.954982
The hiproc run times have the are significantly slower for the update. The loproc differences make sense. Rerunning rrfs_3denvar regression test.

Rerun of rrfs3denvar passed
Wall times:
rrfs_3denvar_glbens_hiproc_contrl/stdout:The total amount of wall time = 81.901227
rrfs_3denvar_glbens_hiproc_updat/stdout:The total amount of wall time = 82.601564
rrfs_3denvar_glbens_loproc_contrl/stdout:The total amount of wall time = 111.882076
rrfs_3denvar_glbens_loproc_updat/stdout:The total amount of wall time = 115.468559
Failure in previous test was probably run-to-run variability.

With Russ's tests, I think this one is ready to go.

Copy link
Contributor

@RussTreadon-NOAA RussTreadon-NOAA left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two peer reviews completed with approval. ctests run on WCOSS2 (Dogwood), Hera, and Orion with acceptable results. GSI Handling Review team consulted. No objection to merging PR #631 into develop

Approve.

@RussTreadon-NOAA RussTreadon-NOAA merged commit f76d872 into NOAA-EMC:develop Oct 17, 2023
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Remove float command from GSI
4 participants