Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow diagnostic accumulation bucket to change in fv3atm integration + develop]: Minor documentation bug fixes #2296 #2281

Merged
merged 28 commits into from
May 28, 2024

Conversation

DusanJovic-NOAA
Copy link
Collaborator

@DusanJovic-NOAA DusanJovic-NOAA commented May 14, 2024

Commit Queue Requirements:

  • Fill out all sections of this template.
  • All sub component pull requests have been reviewed by their code managers.
  • Run the full Intel+GNU RT suite (compared to current baselines) on either Hera/Derecho/Hercules
  • Commit 'test_changes.list' from previous step

Description:

The code updates to allow diagnostic accumulation bucket to change during integration in FV3ATM.

NOTE This PR replaces #2128

Commit Message:

* UFSWM - Add new capability to allow diagnostic accumulation bucket to change in fv3atm integration
  * FV3 - Add new capability to allow diagnostic accumulation bucket to change in fv3atm integration 
  * UFSWM - Update requirements.txt; fix EMC broken link in Glossary; fix sample rt.sh run command for single RT run (add compiler)

Priority:

  • Normal

Git Tracking

UFSWM:

Sub component Pull Requests:

UFSWM Blocking Dependencies:

  • None

Changes

Regression Test Changes (Please commit test_changes.list):

  • PR Updates/Changes Baselines.

Input data Changes:

  • None.

Library Changes/Upgrades:

  • No Updates

Testing Log:

  • RDHPCS
    • Hera
    • Orion
    • Hercules
    • Jet
    • Gaea
    • Derecho
  • WCOSS2
    • Dogwood/Cactus
    • Acorn
  • CI
  • opnReqTest (complete task if unnecessary)

@DusanJovic-NOAA DusanJovic-NOAA added the Baseline Updates Current baselines will be updated. label May 14, 2024
@DusanJovic-NOAA
Copy link
Collaborator Author

Regression test passed on Hera against new baselines created using test_changes.list from this PR. RegressionTests_hera.log

@DusanJovic-NOAA
Copy link
Collaborator Author

GitHub action (Pull Request Tests / Check if repos are up to date) failed with this error:

cat: /home/runner/id_file: No such file or directory
Error: Process completed with exit code 1.

ufs-weather-model and fv3atm branches used in this PR are up to date as of this moment.

@DusanJovic-NOAA
Copy link
Collaborator Author

I wanted to confirm that the PR works on Hercules. I first ran:

./rt.sh -e -c -b test_changes.list

to generate new set of baselines, then I ran:

./rt.sh -e -m

to verify that all tests pass against newly create baselines. On test failed control_wam_debug_gnu:

$ cat logs/log_hercules/rt_control_wam_debug_gnu.log 

baseline dir = /work2/noaa/stmp/djovic/stmp/djovic/FV3_RT/REGRESSION_TEST/control_wam_debug_gnu
working dir  = /work2/noaa/stmp/djovic/stmp/djovic/FV3_RT/rt_3803095/control_wam_debug_gnu
Checking test control_wam_debug_gnu results ....
 Comparing sfcf019.nc .....USING NCCMP......NOT IDENTICAL
 Comparing atmf019.nc .....USING NCCMP......OK

  0: The total amount of wall time                        = 109.138552
  0: The maximum resident set size (KB)                   = 501964

Test control_wam_debug_gnu FAIL Tries: 2

It failed because it is comparing the outputs against the current baselines, not the one I created. Because this test (control_wam_debug_gnu) is not listed in test_changes.list. It is not listed because we do not run it on Hera, where I created that list. I could manually add it to that list, but it might fail to run on Hera (it must be a reason why we are skipping this test on Hera). And this means that developers must manually run all the tests on all platforms to confirm everything works as expected.

Bottom line, relying on the list of tests that require a new set of baselines to be created by running rt.sh that automatically creates such a list is unreliable.

Because tests we run on Hera are not a strict superset of all tests we run on all other platforms.

@DusanJovic-NOAA
Copy link
Collaborator Author

Another example is this group of three regional_ifi_* tests that are run only on Acorn.

COMPILE | ifi | intel | -DAPP=ATM -DCCPP_SUITES=FV3_GFS_v15_thompson_mynn_lam3km -D32BIT=ON -DREQUIRE_IFI=ON | + acorn | fv3 |
RUN | regional_ifi_control                              | + acorn                    | baseline |                                                                                            
RUN | regional_ifi_decomp                               | + acorn                    |          |
RUN | regional_ifi_2threads                             | + acorn                    |          |

@BrianCurtis-NOAA
Copy link
Collaborator

If you run -e -c -b test_changes.list then why aren't you running -e -m -b test_changes.list ?

@BrianCurtis-NOAA
Copy link
Collaborator

-e -m should be using the REGRESSION_TESTS directory to compare against, which should have a bunch of symlinks to develop-2024XXXX where tests were not in test_changes.list

I think using -e -c -b test_changes.list and then subsequently -e -m should still work though. with -b test_changes.list it may have tests that do not belong to the current machine but the way it's written should skip those.

It looks like you were using the baselines you created:

baseline dir = /work2/noaa/stmp/djovic/stmp/djovic/FV3_RT/REGRESSION_TEST/control_wam_debug_gnu
working dir  = /work2/noaa/stmp/djovic/stmp/djovic/FV3_RT/rt_3803095/control_wam_debug_gnu

Maybe i'm missing something?

@DusanJovic-NOAA
Copy link
Collaborator Author

If you run -e -c -b test_changes.list then why aren't you running -e -m -b test_changes.list ?

Because -b <file_name> is used only when -c is used. The -m runs all the tests.

@DusanJovic-NOAA
Copy link
Collaborator Author

DusanJovic-NOAA commented May 15, 2024

-e -m should be using the REGRESSION_TESTS directory to compare against, which should have a bunch of symlinks to develop-2024XXXX where tests were not in test_changes.list

Yes, -m is using the REGRESSION_TESTS directory to compare against, that's the purpose of -m.

I think using -e -c -b test_changes.list and then subsequently -e -m should still work though. with -b test_changes.list it may have tests that do not belong to the current machine but the way it's written should skip those.

It does not work if test_changes.list is not created on the same machine. For example if test_changes.list is created on Hera, but the -c -b test_changes.list and then subsequently -m on Hercules. It will work if you are running on a machine which runs test that are strict subset of the test on Hera. Tests on Hercules are not.

The problem are not test that do not belong to the current machine, those are skipped, the problem are test that should be in test_changes.list but are not, because they are skipped on Hera.

It looks like you were using the baselines you created:

baseline dir = /work2/noaa/stmp/djovic/stmp/djovic/FV3_RT/REGRESSION_TEST/control_wam_debug_gnu
working dir  = /work2/noaa/stmp/djovic/stmp/djovic/FV3_RT/rt_3803095/control_wam_debug_gnu

Yes, I'm using the baselines I created, because it contains newly created baselines. BUT only new baselines as it would be created on Hera, not the baselines we actually need on Hercules. And that's what I explained in my previous comment.

Bottom line, for all platforms which run tests that are not strict subset of Hera's test, like Hercules and Acorn, we need separate test_changes.list. Unless we make Hera run union of all tests from all platforms. Which we can not at this moment because of missing IFI library on Hera.

Maybe i'm missing something?

@BrianCurtis-NOAA
Copy link
Collaborator

I do understand better, thanks. In a perfect case we'd have all tests running on all machines with gnu and intel. With IRI as you mentioned, those can only be run on Acorn.

I'm curious your thoughts on how to tackle this issue.

@FernandoAndrade-NOAA
Copy link
Collaborator

FernandoAndrade-NOAA commented May 24, 2024

Hi @DusanJovic-NOAA, please go ahead and sync your branch/resolve conflicts and bring in changes from #2296 which itself brings in changes from #2292 as well. There should be no baseline changes.

@DusanJovic-NOAA
Copy link
Collaborator Author

Hi @DusanJovic-NOAA, please go ahead and sync your branch/resolve conflicts and bring in changes from #2296 which itself brings in changes from #2292 as well. There should be no baseline changes.

Done. I'm running full test on Hera to verify that the test_changes.list has correct list of tests that need new baselines.

@jkbk2004 jkbk2004 changed the title Allow diagnostic accumulation bucket to change in fv3atm integration Allow diagnostic accumulation bucket to change in fv3atm integration + develop]: Minor documentation bug fixes #2296 May 28, 2024
@jkbk2004 jkbk2004 added the Ready for Commit Queue The PR is ready for the Commit Queue. All checkboxes in PR template have been checked. label May 28, 2024
@jkbk2004
Copy link
Collaborator

All tests are completed ok. We can start merging this pr.

@FernandoAndrade-NOAA FernandoAndrade-NOAA merged commit 317e530 into ufs-community:develop May 28, 2024
3 checks passed
@DusanJovic-NOAA DusanJovic-NOAA deleted the fhbucket branch May 29, 2024 21:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Baseline Updates Current baselines will be updated. Ready for Commit Queue The PR is ready for the Commit Queue. All checkboxes in PR template have been checked.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add the ability to handle accumulations and max/min computations over multiple time windows
6 participants