Bit reproducibility of wind in output files when reading from restart (WRST switch) #181

ajhenrique · 2020-03-30T16:47:34Z

The switch WRST was added to provide an option to save wind data in WW3 restart files for several applications. Tests have shown that when WND output is chosen for saving input wind data interpolated to wave grids in out_grd, latter wind data in out_grd files from restarted runs at the initial time are not bit identical to wind fields save in out_grd from the preceding run, at the corresponding time step. Most other parameters are bit identical. Wind data in out_grd files from subsequent output times are also b4b when compared to overlapping outputs from the originating run.

JessicaMeixner-NOAA · 2020-03-30T16:57:42Z

Here are some details/updates I found while looking into this issue:

Both wind and Charnock are different in the out_grd.
Some differences in the out_grd wind fields were because we are taking sin and cos of val or val+/-2pi and getting round off (or were 2pi roundoff from each other). Making sure the direction was between 0-2pi minimized the differences, but there were more.
I think the wind in the out_grid might be different beyond the 2*pi issues is because depending on what TW0 vs TWN is things actually can be different based on the logic in W3UWND (See here: https://github.com/NOAA-EMC/WW3/blob/production/GFS.v16/model/ftn/w3updtmd.ftn#L529-L543) which honestly makes me slightly concerned we are just getting lucky that everything else is b4b. But I don't think we could consistently get b4b answers everywhere else if this actually an issue too, so?
I did confirm that the wind we are reading and writing with WRST is the same. @ajhenrique I think why you saw sum differences before is because in addition to reading a restart at the restart time you are writing at that restart time and because it's assumed to be TWN as the correct time for that, it's different wind fields and they do have different values.
The restart that is written is different (ie. if you request to write out a restart file at the time of initialization from a restart of which you started with, so same valid time, two different runs). One issue here is the WRST because the assumption is you are writing the restart time that corresponds to TWN, but that is not the case when you immediately write a restart.
Charnock in the output is also different in the binary file out_grd file, but this is not an official field that makes it to the grib file and is probably because it's not being calculated at initialization which is an issue we ran into in the esmf cap. But this issue could easily be postponed.

Based on my tests, I do not believe this is a WRST switch issue (or alone is a WRST switch issue). There certainly should either be an update as to whether TW0 or TWN winds are written in the restart file when using WRST and an update should be made in the manual that it's assumed that the restart time will correspond to one of those two times. However, it is my opinion that there are some more basic issues for wind and Charnock in the out_grd binary files at t=0 from a restart run. While it is of course ideal for these to be bit for bit, one solution is disregarding or not outputting output at the initial time due to the fact that one would have this information from the original run if you were restarting.

ajhenrique · 2020-03-30T17:13:21Z

@JessicaMeixner-NOAA thanks for the updates on the wind from WRST b4b issue. Ideally, we would want to have all these discrepancies sorted out as this would ensure the code is correct. Note that all results are fully b4b when winds are read from files, not from the restart file, so that the issue must be somehow related to the addition of the WRST switch.

I looked at ways to change the scripts in order to avoid writing output at the initial time step and mask out the problem with WRST wind bit reproducibility in the coupled run as you suggest. This, however, will complicate the scripting in a way that I would prefer to avoid at this point. Solving the problem at its root would help us not only correct the code, but keep scripting simpler (which will benefit operations at NCEP), and also avoid product changes and affecting downstream dependencies (eg, V&V, AWIPS, NAWIPS, etc) that would require adjustments.

JessicaMeixner-NOAA · 2020-03-30T17:36:06Z

The test case I was using to debug this can be replicated as follows:

git clone https://github.com/jessicameixner-noaa/ufs-weather-model
cd ufs-weather-model
git checkout feature/unit_test
git submodule update --init --recursive
cd tests
./utest -n fv3_gfdlmprad -c std -k #this just needs to be run once
./utest -n fv3_gfdlmprad -r restart -k

This test can be run on Hera and is likely not completely portable (because of my changes, not the unit tests themselves). The 'baseline area' will be generated at
/scratch1/NCEPDEV/stmp4/$USER/FV3_UT/UNIT_TEST

and the run directories (which are saved by the -k option above) will be located at:
/scratch1/NCEPDEV/stmp2/$USER/FV3_UT

After running ./utest -n fv3_gfdlmprad -r restart -k you will get a directory (such as: /scratch1/NCEPDEV/stmp2/Jessica.Meixner/FV3_UT/ut_6755 ) which will have the directories:
fv3_gfdlmprad_restart fv3_gfdlmprad_std
From there you can do diffs for the various files.

There are a few extra output files generated from WW3 that are generated for ease of debugging, YYYYMMDD.HHMMSS.out_txt.glo_30m which has text output from w3iogo and
debugging, YYYYMMDD.HHMMSS.rstTXT.read.glo_30m and YYYYMMDD.HHMMSS.rstTXT.write.glo_30m which output the wind in x and y space what is written to the file and what is read in when using WRST with restarting.

JessicaMeixner-NOAA · 2020-03-30T17:43:09Z

@aliabdolali if you do not want to use the WW3 that ufs-weather-model points with my debugging updates/tries you will want to point to the production/GFS.v16 branch of WW3

aliabdolali · 2020-04-04T14:18:43Z

Hi, @JessicaMeixner-NOAA @ajhenrique
I found the fix for non-identical wind fields in restart files.
I created a test for the global grid (WW3 only) and tested my fix. I did not push the fix to the feature branch yet, could you test it and let me know if it works? This fix applies to Current too.
See attached.
WindB4B.pdf

JessicaMeixner-NOAA · 2020-04-05T16:56:49Z

@aliabdolali Great catch! This is awesome.

I can test with the set-up I described above but it does not have currents, so maybe we should just wait for @ajhenrique to test with the full systems to know for sure.

ajhenrique · 2020-04-05T22:51:38Z

@aliabdolali I've tested the proposed fix running a canned case representing coupled system with IAU for 3h+48h, WW3 with a 3-grid mosaic (Arctic PS 9km NH 1/6 deg, SH 14/ deg), generating restarts at 3h+24h and running a restated leg from 24h-48h, then comparing the wave binary outputs from the overlapping period. Results with the esmf8.0.1 branch and all the most recent changes to WW3 code for speeding up initialization and internal interpolation etc. Ir an the canned case on WCOSS P3 (Dell) and Hera, in both cases the initial output step had outputs that were not b4b (wind fields had small discrepancies between runs), but all co-located outputs were b4b thereafter.

ajhenrique · 2020-04-06T03:33:10Z

Here are figures indicating where the issues occur. In my test case, out of 6=876,960 grid points there were 176 that were not b4b, typically wind speeds were relatively low, mostly <1m/s, but some to >3 m/s. First figure shows the gridded wind speeds from the first run, superposed with markers (red and yellow) where they are not reproduced in the second run. Also, there are two figures showing a series of wind speeds from the 176 not b4b points, and their ratio. Figures are all from one of the three grids in the GFSv16-wave grid mosaic.

MatthewMasarik-NOAA · 2024-04-10T17:53:01Z

May be related to #1134.

aliabdolali mentioned this issue May 6, 2021

Fb ufs regtests #350

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bit reproducibility of wind in output files when reading from restart (WRST switch) #181

Bit reproducibility of wind in output files when reading from restart (WRST switch) #181

ajhenrique commented Mar 30, 2020

JessicaMeixner-NOAA commented Mar 30, 2020

ajhenrique commented Mar 30, 2020 •

edited

Loading

JessicaMeixner-NOAA commented Mar 30, 2020

JessicaMeixner-NOAA commented Mar 30, 2020

aliabdolali commented Apr 4, 2020

JessicaMeixner-NOAA commented Apr 5, 2020

ajhenrique commented Apr 5, 2020

ajhenrique commented Apr 6, 2020 •

edited

Loading

MatthewMasarik-NOAA commented Apr 10, 2024

Bit reproducibility of wind in output files when reading from restart (WRST switch) #181

Bit reproducibility of wind in output files when reading from restart (WRST switch) #181

Comments

ajhenrique commented Mar 30, 2020

JessicaMeixner-NOAA commented Mar 30, 2020

ajhenrique commented Mar 30, 2020 • edited Loading

JessicaMeixner-NOAA commented Mar 30, 2020

JessicaMeixner-NOAA commented Mar 30, 2020

aliabdolali commented Apr 4, 2020

JessicaMeixner-NOAA commented Apr 5, 2020

ajhenrique commented Apr 5, 2020

ajhenrique commented Apr 6, 2020 • edited Loading

MatthewMasarik-NOAA commented Apr 10, 2024

ajhenrique commented Mar 30, 2020 •

edited

Loading

ajhenrique commented Apr 6, 2020 •

edited

Loading