Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds an alternative python workflow generation path #698

Merged
merged 47 commits into from
Apr 26, 2022

Conversation

danielabdi-noaa
Copy link
Collaborator

@danielabdi-noaa danielabdi-noaa commented Mar 10, 2022

DESCRIPTION OF CHANGES:

This PR provides an alternative workflow generation path using python3.

TESTS CONDUCTED:

Hera

All W2E tests have been run on Hera using the python workflow generator and the existing shell script workflow generator.
Although there were about 24 failures, both workfkow generation paths result in the same tests successfully completed.

Orion

All W2E tests have been run on Hera and there are 21 failures, even better than Hera.

Jet

On hold. Jet seems to have a lot of missing data, especially with latest develop, which could not even generate the workflow
because of missing data:

EXTRN_MDL_SOURCE_BASEDIR_ICS = "/mnt/lfs4/BMC/wrfruc/UFS_SRW_app/staged_extrn_mdl_files/FV3GFS/nemsio"

Unit tests

Unittest on the new files can be done as follows

python3 -m unittest -b \
	check_ruc_lsm.py \
	create_diag_table_file.py \
	create_model_configure_file.py \
	link_fix.py \
	set_cycle_dates.py \
	set_extrn_mdl_params.py \
	set_FV3nml_sfc_climo_filenames.py \
	set_FV3nml_ens_stoch_seeds.py \
	set_gridparams_ESGgrid.py \
	set_gridparams_GFDLgrid.py \
	set_ozone_param.py \
	set_predef_grid_params.py \
	set_thompson_mp_fix_files.py \
        get_crontab_contents.py

..............
----------------------------------------------------------------------
Ran 14 tests in 1.448s

OK

DEPENDENCIES:

None

DOCUMENTATION:

None

ISSUE (optional):

None

CONTRIBUTORS (optional):

@christinaholtNOAA for suggestion

Copy link
Contributor

@christinaholtNOAA christinaholtNOAA left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@danielabdi-noaa Thank you for this huge effort! I have gotten about halfway through a review, and a few of the comments I've made so far are applicable in many places within the code. In general I think it is useful to use Python-native commands where we can so that we can more easily leverage exception handling. We may also find that it will be faster than spinning up subprocesses to do things Python will handle for us (mostly untested conjecture).

I have not finished my review just yet. I only made it through the ush/set_thompson_mp_fix_files.py and need to step away from it for a little while (maybe until Monday).

ush/create_diag_table_file.py Show resolved Hide resolved
ush/create_diag_table_file.py Outdated Show resolved Hide resolved
ush/create_diag_table_file.py Show resolved Hide resolved
ush/create_model_configure_file.py Show resolved Hide resolved
ush/generate_FV3LAM_wflow.py Outdated Show resolved Hide resolved
ush/set_FV3nml_sfc_climo_filenames.py Outdated Show resolved Hide resolved
ush/set_FV3nml_sfc_climo_filenames.py Outdated Show resolved Hide resolved
ush/set_cycle_dates.py Show resolved Hide resolved
ush/set_ozone_param.py Outdated Show resolved Hide resolved
ush/set_predef_grid_params.py Show resolved Hide resolved
@danielabdi-noaa danielabdi-noaa force-pushed the python_workflow branch 3 times, most recently from c28dde1 to d80b1ca Compare March 14, 2022 15:22
Copy link
Contributor

@christinaholtNOAA christinaholtNOAA left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@danielabdi-noaa Thanks again for all this work. A few more similar comments below. I think I made it through all of it this time...

ush/set_thompson_mp_fix_files.py Outdated Show resolved Hide resolved
ush/setup.py Show resolved Hide resolved
ush/setup.py Outdated Show resolved Hide resolved
ush/setup.py Outdated Show resolved Hide resolved
ush/setup.py Outdated Show resolved Hide resolved
ush/setup.py Show resolved Hide resolved
ush/setup.py Show resolved Hide resolved
Copy link
Contributor

@christinaholtNOAA christinaholtNOAA left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@danielabdi-noaa This is looking really nice. I have just a couple of super minor comments below.

I think things are ready.

ush/setup.py Outdated Show resolved Hide resolved
ush/setup.py Show resolved Hide resolved
@gsketefian
Copy link
Collaborator

@danielabdi-noaa This generally looks good to me. Since it's hard to look through every line of this new code, it would be nice to at least try it out. Do you have any instructions on how to run the python version of the workflow generation? Say I want to run a single experiment with a given experiment configuration file.

@danielabdi-noaa
Copy link
Collaborator Author

@gsketefian To use the python workflow generation path, the only thing different is you execute generate_FV3LAM_wflow.py instead of the corresponding shell script file generate_FV3LAM_wflow.sh. The python workflow path is setup to access a shell script config file config.sh by default, even though it can use YAML format too, so there is no other changes you need to make.
It may be better to run both workflow patchs for a specific test case to compare the files generated (xml file, nml file etc). I've run the full regression test on Hera after modifying run_W2E2_tests.sh to use generate_FV3LAM_wflow.py and got same failures and passes as the shell script path. It would be good to have the test run on other systems.

@gsketefian
Copy link
Collaborator

@danielabdi-noaa Ok thanks. I will try it out on Hera and at least one more platform.

@gsketefian
Copy link
Collaborator

@danielabdi-noaa I've been running tests on both Hera and Orion. I decided that it's best (and enough) to run the wflow_features tests (i.e. no need to run the tests in grids_extrn_mdls_suites_community and grids_extrn_mdls_suites_nco). There are a total of 35 of those. I've run 10 tests on Hera (including the bash counterparts) with the same results as with bash (2 failures), so I'm not worried about the failures so far. I'm planning on running the rest today.

I've also run 6 tests on Orion with some failures, although I haven't run the bash counterparts there yet. I had to make some adjustments to get things to run on Orion, but nothing directly related to this PR. I'm working through those failures now and will do more tests today.

One thing I noticed is that for a pre-existing file or directory that needs to be renamed, the python version adds the date and time to the end of the name instead of "oldNN". That is good since it identifies when it was renamed. But can we also add the string "old" somewhere in the rename? E.g. instead of renaming specify_DOT_OR_USCORE to say specify_DOT_OR_USCORE_20220407_100006, rename it to specify_DOT_OR_USCORE_old_20220407_100006. That's just to make it clear that it's an older copy.

I'll keep you posted with more results soon. Thanks.

@danielabdi-noaa
Copy link
Collaborator Author

@gsketefian Thank you for putting the effort to test it, I know it is an unusually big PR!
I will make the changes you suggested for renaming old folders.
One thing I should mention is that the python code could be a couple of commits behind so that could make some difference.
It was upto date last week but haven't checked since then.

I just got access to the wrfruc project jet and niagra so I will do the rull regression tests there too.
I've run a couple of tests on jet and looks to run there just fine.

Thanks again!

@gsketefian
Copy link
Collaborator

@danielabdi-noaa You're welcome. And sorry it takes a while for me to get to these big PRs.

I did notice that the python scripts are not completely up-to-date, so I'm testing with slightly older hashes. In particular, I am not using the latest hashes that went in PR #233. If you want to update to the latest, that would be great (I guess you'll have to do that anyway before merging). Let me know if you do. There are some tests (specifically with the GFS_2017_gfdlmp[_regional] suite that are failing that would be fixed if you merged in the latest from develop.

Also, on Orion, I need to update the data that's available to run several of the tests since it's not up-to-date with the latest data.

@gsketefian
Copy link
Collaborator

@danielabdi-noaa One thing I'm finding on Orion is that after a test finishes (either successfully or not), the cron job for it is not removed from the user's cron table. The simplest case to demo this is with the deactivate_tasks test (which completes successfully). I ran it with the bash scripts, and the cron job was removed. Then I tried with python, and it wasn't.

Do you have an account on Orion? I imagine it would be hard to debug if you don't. The interesting thing is that this doesn't happen on Hera, which is very similar to Orion.

Note that on Orion, you have to be on login node 1 to be able to use cron. I can provide more details about this if you like.

@danielabdi-noaa
Copy link
Collaborator Author

@danielabdi-noaa One thing I'm finding on Orion is that after a test finishes (either successfully or not), the cron job for it is not removed from the user's cron table. The simplest case to demo this is with the deactivate_tasks test (which completes successfully). I ran it with the bash scripts, and the cron job was removed. Then I tried with python, and it wasn't.

Do you have an account on Orion? I imagine it would be hard to debug if you don't. The interesting thing is that this doesn't happen on Hera, which is very similar to Orion.

Note that on Orion, you have to be on login node 1 to be able to use cron. I can provide more details about this if you like.

I haven't used my Orion account in a long time, but I will try to reactivate it and see if I can reproduce the problem.
Thanks

@willmayfield
Copy link
Collaborator

willmayfield commented Apr 14, 2022

@danielabdi-noaa Thanks for your work this is great!

I tried generating and running the GST case by hand on Cheyenne and ran into some minor issues: 1. while bringing in environment variables, for some reason the (NPL) from the Cheyenne workflow environment prompt ends up causing an error. 2. When EXPT_BASEDIR is an empty string in the configs (as it is by default) indexing EXPT_BASEDIR[0] gives an error in setup.py. 3. MACHINE="cheyenne" fails when it gets compared to the upper-case machine names list in valid_param_vals.yaml.

I also tried it on Orion and still got the same problems 2 and 3 there. If I avoid triggering these checks, it generated and ran successfully on both platforms!

@danielabdi-noaa
Copy link
Collaborator Author

@willmayfield Thank you for testing and finding out issues -- I will try and fix them later!

@danielabdi-noaa
Copy link
Collaborator Author

danielabdi-noaa commented Apr 14, 2022

@danielabdi-noaa Thanks for your work this is great!

I tried generating and running the GST case by hand on Cheyenne and ran into some minor issues: 1. while bringing in environment variables, for some reason the (NPL) from the Cheyenne workflow environment prompt ends up causing an error. 2. When EXPT_BASEDIR is an empty string in the configs (as it is by default) indexing EXPT_BASEDIR[0] gives an error in setup.py. 3. MACHINE="cheyenne" fails when it gets compared to the upper-case machine names list in valid_param_vals.yaml.

I also tried it on Orion and still got the same problems 2 and 3 there. If I avoid triggering these checks, it generated and ran successfully on both platforms!

@willmayfield I believe that I 've fixed the last two issues but can not test the Cheyenne issue because I do not have access to the machine. I was wondering if you can give me more information to debug the problem. There is a brittle section of code in export/import vars that could be the cause of the problem:

https://github.com/danielabdi-noaa/regional_workflow/blob/python_workflow/ush/python_utils/environment.py#L239-L245

I skip python functions, lowercase environment variables that are unlikely to be variables defined by SRW app, which is what I want to import/export. It could be that something on cheyenne made it through and is causing problems?

@danielabdi-noaa danielabdi-noaa force-pushed the python_workflow branch 2 times, most recently from c58a5e9 to 95a50f3 Compare April 20, 2022 01:14
characters are brackets for shell variable to be considered an array.
@JeffBeck-NOAA
Copy link
Collaborator

@willmayfield, do you have time to test @danielabdi-noaa's changes to this PR on Cheyenne? @gsketefian, does this need to be retested on Hera?

@danielabdi-noaa
Copy link
Collaborator Author

@willmayfield, do you have time to test @danielabdi-noaa's changes to this PR on Cheyenne? @gsketefian, does this need to be retested on Hera?

@JeffBeck-NOAA I believe Will has tested and confirmed my latest changes fix the problem on Cheyenne. Our communications shifted to email a couple of days ago.

@JeffBeck-NOAA
Copy link
Collaborator

@danielabdi-noaa, thanks! I will go ahead and approve. @gsketefian, did you want to test one more time on Hera?

@christinaholtNOAA christinaholtNOAA added ci-hera-intel-WE Kicks off automated workflow test on hera with intel ci-jet-intel-WE Kicks off automated workflow test on jet with intel labels Apr 25, 2022
@venitahagerty venitahagerty removed the ci-jet-intel-WE Kicks off automated workflow test on jet with intel label Apr 25, 2022
@venitahagerty
Copy link
Collaborator

Machine: jet
Compiler: intel
Job: WE
Repo location: /lfs1/BMC/nrtrr/rrfs_ci/autoci/pr/876620093/20220425230518/ufs-srweather-app
Build was Successful
Rocoto jobs started
Experiment failed: grid_RRFS_CONUS_25km_ics_GSMGFS_lbcs_GSMGFS_suite_GFS_v15p2
2022-04-25 23:42:15 +0000 :: fe2 :: Task get_extrn_ics, jobid=2642517, in state DEAD (FAILED), ran for 15.0 seconds, exit status=256, try=1 (of 1)
Long term tracking will be done on 9 experiments
If test failed, please make changes and add the following label back:
ci-jet-intel-WE

@danielabdi-noaa danielabdi-noaa merged commit bf0fae9 into ufs-community:develop Apr 26, 2022
mark-a-potts added a commit that referenced this pull request May 11, 2022
* Add Gaea as a supported platform for the regional_workflow (#734)

* Updates to port regional workflow to gaea

* Temp change with -v as batch option

* new fixes for gaea/slurm

* Updated time for make lbcs

* added TEST data directory path

* Update gaea.sh

* fixes for PR

* Add more parameters to CSV file containing WE2E test info (#740)

## DESCRIPTION OF CHANGES: 
The script/function `get_WE2Etest_names_subdirs_descs.sh` (which is called from `run_WE2E_tests.sh` if needed) creates a CSV (Comma-Separated Value) file named `WE2E_test_info.csv` that contains information about the WE2E tests.  Currently, this CSV file contains only 3 columns: the test name, the names of any alternate names for the test, and the test description.  In order to have a more complete summary of the WE2E tests, this PR modifies `get_WE2Etest_names_subdirs_descs.sh` so that additional information is included in the CSV file.  This additional information consists of the values of the following experiment variables for each test:
```
PREDEF_GRID_NAME
CCPP_PHYS_SUITE
EXTRN_MDL_NAME_ICS
EXTRN_MDL_NAME_LBCS
DATE_FIRST_CYCL
DATE_LAST_CYCL
CYCL_HRS
INCR_CYCL_FREQ
FCST_LEN_HRS
LBC_SPEC_INTVL_HRS
NUM_ENS_MEMBERS
```
In addition, the script uses this information to calculate the number of times each test calls the forecast model (e.g. if the test uses 3 different cycle dates, then the forecast model will be called 3 times; if it is an ensemble test for a single cycle, the test will call the forecast model as many times as the number of ensemble members).  

## TESTS CONDUCTED: 
The script `run_WE2E_tests.sh` was called that in turn calls `get_WE2Etest_names_subdirs_descs.sh`.  This created a new CSV file that contained the new fields (columns).  The CSV file was imported into Google Sheets (using "|" as the field/column separator) and looked correct.

## DOCUMENTATION:
The documentation is for the most part already within the `get_WE2Etest_names_subdirs_descs.sh`.  This PR slightly modifies that documentation to update it.

* Update directory structure of NCO mode (#743)

* update vertical structure of NCO mode

* update sample script for nco

* Fix typo on write component of new RRFS CONUS

* Default CCPP physics option is FV3_GFS_v16 (#746)

* Updated the default CCPP physics option to FV3_GFS_v16

* Updated the default CCPP physics option to FV3_GFS_v16 in config_defaults.sh

Co-authored-by: Natalie Perlin <[email protected]>

* Adds an alternative python workflow generation path (#698)

* Workflow in python starting to work.

* Use new python_utils package structure.

* Some bug fixes.

* Use uppercase TRUE/FALSE in var_dfns

* Use config.sh by default.

* Minor bug fixes.

* Remove config.yaml

* Update to the latest develop

* Remove quotes from numbers in predef grid.

* Minor bug fix.

* Move validity checker to the bottom of setup

* Add more unit tests.

* Update with python_utils changes.

* Update to latest develop additions (Need to re-run regression test)

* Use set_namelist and fill_jinja_template as python functions.

* Replace sed regex searches with python re.

* Use python realpath.

* Construct settings as dictionary before passing to fill_jinja and set_namelist

* Use yaml for setting predefined grid parameters.

* Use xml parser for ccpp phys suite definition file.

* Remove more run_command calls.

* Simplify some func argument processing.

* Move different config format parsers to same file.

* Use os.path.join for the sake of macosx

* Remove remaining func argument processing via os.environ.

* Minor bug fix in set_extrn_mdl_params.sh

* Add suite defn in test_data.

* Minor fixes on unittest on jet.

* Simplify boolean condition checks.

* Include old in renaming of old directories

* Fix conflicting yaml !join tag for paths and strings.

* Bug fix with setting sfcperst dict.

* Imitate "readlink -m" with os.path.realpath instead of os.readlink

* Don't use /tmp as that is shared by multiple users.

* Bug fix with cron line, maintain quotes around TRUE/FALSE.

* Update to latest develop (untested)

* Bug fix with existing cron line and quotes.

* Bug fix with case-sensitive MACHINE name, and empty EXPT_DIR.

* Update to latest develop

* More updates.

* Bug fix thanks to @willmayfield! Check both starting/ending
characters are brackets for shell variable to be considered an array.

* Make empty EXPT_BASEDIR workable.

* Update to latest develop

* Update in predef grid.

* Check f90nml as well.

Co-authored-by: Daniel Abdi <[email protected]>

* Fix typo and crontab issue on wcoss dell in workflow python scripts (#750)

* Fix typo and failure on wcoss

* fix new line issue on wcoss dell

* remove capture_output

* Get USER from environment

Co-authored-by: Daniel Abdi <[email protected]>

* Add new WE2E configs (#748)

## DESCRIPTION OF CHANGES: 
Added two new WE2E config files for the Sub-CONUS Indianapolis domain to support the upcoming SRW release. 

In addition, modified the external data used in the `config.specify_EXTRN_MDL_SYSBASEDIR_ICS_LBCS.sh` to match more common datasets used in the WE2E testing process. 

## TESTS CONDUCTED: 
Successfully ran the new WE2E tests (`config.SUBCONUS_Ind_3km_ics_HRRR_lbcs_RAP_suite_HRRR.sh`, `config.SUBCONUS_Ind_3km_ics_HRRR_lbcs_RAP_suite_RRFS_v1beta.sh`) and `config.specify_EXTRN_MDL_SYSBASEDIR_ICS_LBCS.sh` on NOAA Parallel Works AWS instance.

## DEPENDENCIES:
None.

## DOCUMENTATION:
No documentation changes are required.

* Added a fixed WoF grid and the python tool to determine the write component parameters (#733)

* Added a fixed WoF grid and the python tool to determine the write component parameters

* Update set_predef_grid_params.sh

* Renamed file as recommended and removed unused lines

* Modified comment

Co-authored-by: JeffBeck-NOAA <[email protected]>
Co-authored-by: WYH@MBP <[email protected]>

* Replace env with modulefiles in scripts (#752)

* change env to mod

* update we2e script

* WE2E script improvements for usability (#745)

## DESCRIPTION OF CHANGES: 
* Modifications to `run_WE2E_tests.sh`:
  * Add examples to help/usage statement
* Modifications to `check_expts_status.sh`:
  * Add arguments list that can be processed by `process_args`
  * Add new optional arguments:  `num_log_lines`, `verbose`
  * Include a help/usage message

## TESTS CONDUCTED:
* Ran `run_WE2E_tests.sh --help` from the command line and got the expected help message.
* Ran `check_expts_status.sh --help` from the command line and got the expected help message.
* Used `run_WE2E_tests.sh` to run a set of 2 WE2E tests -- works as expected.
* Used `check_expts_status` to check on the status of the 2 tests run above and got the expected status message.
 
## DEPENDENCIES:
PR #[241](ufs-community/ufs-srweather-app#241)

## DOCUMENTATION:
A lot of this PR is documentation in the scripts.  There is an accompanying documentation PR #[241](ufs-community/ufs-srweather-app#241) into ufs-srweather-app.

* Standardize static data across Tier-1 platforms; fix and improve IC and LBC data retrieval (#744)

* Bug fixes (grid size + suppress screen output from module load) (#756)

## DESCRIPTION OF CHANGES: 
1) Adjust y-direction size of write-component grid of `SUBCONUS_Ind_3km` predefined grid from 195 to 197 (this was just an oversight in PR #725 ).
2) Redirect output of module load in launch script (`launch_FV3LAM_wflow.sh`) to `/dev/null` to avoid unwanted screen output (which was introduced in PR #[238](ufs-community/ufs-srweather-app#238) in ufs-srweather-app and is about how to load the `regional_workflow` environment and is not relevant in this context).

## TESTS CONDUCTED: 
1) Plotted the `SUBCONUS_Ind_3km` grid to ensure it has correct size (it does).
2) Manually ran `launch_FV3LAM_wflow.sh` from the command line to verify that screen output is suppressed (it is).

* Update default SPP ISEED array in config_defaults.sh to use unique values (#759)

* Modify RRFS North America 3- and 13-km domain configuration and WE2E test.

* Modify default ISEED values for SPP

* Fix grid in WE2E test

* Update workflow python scripts (#760)

* update python scripts

* Change output file name of run_post to meet NCO standards (#758)

* change output file name

* change variable name

* update python script

* remove duplicates

* add a check for empty variables

* move variable to common area

* clean up unnecessary comments

* update scripts

* remove duplicate

* update python scripts

* fix user-staged dir path issue in python script

* Add POST_OUTPUT_DOMAIN_NAME to WE2E tests for new grids (#763)

* Add new var to we2e tests for new grids

* rename we2e tests for custom grid

* remove unnecessary $

Co-authored-by: Mark Potts <[email protected]>
Co-authored-by: gsketefian <[email protected]>
Co-authored-by: Chan-Hoo.Jeon-NOAA <[email protected]>
Co-authored-by: Natalie Perlin <[email protected]>
Co-authored-by: Natalie Perlin <[email protected]>
Co-authored-by: danielabdi-noaa <[email protected]>
Co-authored-by: Daniel Abdi <[email protected]>
Co-authored-by: Daniel Abdi <[email protected]>
Co-authored-by: EdwardSnyder-NOAA <[email protected]>
Co-authored-by: Yunheng Wang <[email protected]>
Co-authored-by: JeffBeck-NOAA <[email protected]>
Co-authored-by: WYH@MBP <[email protected]>
Co-authored-by: Michael Kavulich <[email protected]>
christinaholtNOAA added a commit to NOAA-GSL/regional_workflow that referenced this pull request Jun 7, 2022
* Add missing user-defined stochastic physics options; fix stochastic physics seed generation script (ufs-community#704)

## DESCRIPTION OF CHANGES: 
Add missing user-defined options for tendency-based stochastic physics and fix the ensemble-based seed generation script to work regardless of whether stochastic physics is turned on or not.

## TESTS CONDUCTED: 
Tested on Hera using the following WE2E configurations with and without stochastic physics:

config.grid_RRFS_CONUS_3km_ics_HRRR_lbcs_RAP_suite_HRRR.sh
config.community_ensemble_2mems.sh

## ISSUE (optional): 
[Issue ufs-community#702](ufs-community#702)

## CONTRIBUTORS (optional): 
Thanks to @mkavulich and @chan-hoo for finding this problem.

* Add namelist option for netCDF4 when running with the 3-km NA domain; update NAM HPSS settings and WE2E tests (ufs-community#707)

* Change to netcdf4 when using the NA 3-km domain

* Update HPSS paths for NAM data

* Update NAM HPSS locations and dates for WE2E tests.

* Remove lines from merge.

* Tweaks to allow compiler and build_env_fn to be specified in the run_WE2E_test.sh script (ufs-community#711)

* Changed 20200304 to 20200303 in ush/mrms_pull_topofhour.py (ufs-community#712)

* Remove unused rocoto directory in ush (ufs-community#720)

* Fix bug for nco we2e tests on Orion; re-organize we2e input data and nco we2e tests (ufs-community#713)

* Update machine script for orion

* Update machine script for wcoss_dell_p3

* Update we2e run script for wcoss and orion

* Reorganize nco we2e tests

* remove machine based logic

* Add symlink for nco inline post test

* Added stand-alone verification scripts (feature/issue_683_standaloneVX) (ufs-community#726)

* Grid-stat and point-stat run scripts.

* Stand-alone scripts for verification.

* Added comments to gridvx scripts.

* Added qsub_job.sh and added comments to provide context on running Vx.

* remove machine base logic (ufs-community#727)

* Allow user-defined file names for input template files (ufs-community#717)

* Allow multiple template names

* parameterize file_TMPL_FN and add a we2e test

* Increase maxtries_task for make_grid/orog/sfc_climo

* Modify file name and description

* Changes to RRFS 3- and 13-km domains, setup.sh script bug fixes, make_ics task modification, and tweaks to stochastic physics namelist settings (ufs-community#721)

* Modify RRFS North America 3- and 13-km domain configuration and WE2E test.

* Change sotyp_from_climo to "true" based on operational RAP grib2 files.

* Update for changes to stochastic physics namelist options.

* Check for DO_ENSEMBLE="TRUE" when running ensemble verification and turn of VX when running in NCO mode.

* Revert to 3-km domain.

* Remove commented-out GFDL grid for the RRFS_NA_13km domain

* Add RRFS_NA_13km WE2E test

* Changes to comments.

* Adding 25 km tests to Jet/Hera suites. (ufs-community#718)

* Add a small 3km predefined grid over Indianapolis for testing (ufs-community#725)

* Add 3km grid over Indianapolis.  This is about 600km x 600km in extent (200 x 200 grid points).  It is intended for use in the WE2E tests.

* Edit comments.

* Use Python tool for get_extrnl_mdl_file tasks (ufs-community#681)

These changes hook in the Python-based data ingest tool, replacing the previous scripts that handled this work as part of the get_extrn_mdl_file task. No attempt was made in this PR to replace the NOMADS fetching script with the Python utility, but the NOMADS data location has been added to the data_locations.yml file.

The functionality to write the data summary file has also been added to the Python tool to match the capabilities of the existing workflow tools.

* Increase size of RRFS CONUS grid (ufs-community#724)

Co-authored-by: Benjamin.Blake EMC <[email protected]>
Co-authored-by: Benjamin.Blake EMC <[email protected]>
Co-authored-by: Benjamin.Blake EMC <[email protected]>
Co-authored-by: chan-hoo <[email protected]>

* add include-style quality mark options in metplus confs (ufs-community#738)

* Add Gaea as a supported platform for the regional_workflow (ufs-community#734)

* Updates to port regional workflow to gaea

* Temp change with -v as batch option

* new fixes for gaea/slurm

* Updated time for make lbcs

* added TEST data directory path

* Update gaea.sh

* fixes for PR

* Add more parameters to CSV file containing WE2E test info (ufs-community#740)

## DESCRIPTION OF CHANGES: 
The script/function `get_WE2Etest_names_subdirs_descs.sh` (which is called from `run_WE2E_tests.sh` if needed) creates a CSV (Comma-Separated Value) file named `WE2E_test_info.csv` that contains information about the WE2E tests.  Currently, this CSV file contains only 3 columns: the test name, the names of any alternate names for the test, and the test description.  In order to have a more complete summary of the WE2E tests, this PR modifies `get_WE2Etest_names_subdirs_descs.sh` so that additional information is included in the CSV file.  This additional information consists of the values of the following experiment variables for each test:
```
PREDEF_GRID_NAME
CCPP_PHYS_SUITE
EXTRN_MDL_NAME_ICS
EXTRN_MDL_NAME_LBCS
DATE_FIRST_CYCL
DATE_LAST_CYCL
CYCL_HRS
INCR_CYCL_FREQ
FCST_LEN_HRS
LBC_SPEC_INTVL_HRS
NUM_ENS_MEMBERS
```
In addition, the script uses this information to calculate the number of times each test calls the forecast model (e.g. if the test uses 3 different cycle dates, then the forecast model will be called 3 times; if it is an ensemble test for a single cycle, the test will call the forecast model as many times as the number of ensemble members).  

## TESTS CONDUCTED: 
The script `run_WE2E_tests.sh` was called that in turn calls `get_WE2Etest_names_subdirs_descs.sh`.  This created a new CSV file that contained the new fields (columns).  The CSV file was imported into Google Sheets (using "|" as the field/column separator) and looked correct.

## DOCUMENTATION:
The documentation is for the most part already within the `get_WE2Etest_names_subdirs_descs.sh`.  This PR slightly modifies that documentation to update it.

* Update directory structure of NCO mode (ufs-community#743)

* update vertical structure of NCO mode

* update sample script for nco

* Fix typo on write component of new RRFS CONUS

* Default CCPP physics option is FV3_GFS_v16 (ufs-community#746)

* Updated the default CCPP physics option to FV3_GFS_v16

* Updated the default CCPP physics option to FV3_GFS_v16 in config_defaults.sh

Co-authored-by: Natalie Perlin <[email protected]>

* Adds an alternative python workflow generation path (ufs-community#698)

* Workflow in python starting to work.

* Use new python_utils package structure.

* Some bug fixes.

* Use uppercase TRUE/FALSE in var_dfns

* Use config.sh by default.

* Minor bug fixes.

* Remove config.yaml

* Update to the latest develop

* Remove quotes from numbers in predef grid.

* Minor bug fix.

* Move validity checker to the bottom of setup

* Add more unit tests.

* Update with python_utils changes.

* Update to latest develop additions (Need to re-run regression test)

* Use set_namelist and fill_jinja_template as python functions.

* Replace sed regex searches with python re.

* Use python realpath.

* Construct settings as dictionary before passing to fill_jinja and set_namelist

* Use yaml for setting predefined grid parameters.

* Use xml parser for ccpp phys suite definition file.

* Remove more run_command calls.

* Simplify some func argument processing.

* Move different config format parsers to same file.

* Use os.path.join for the sake of macosx

* Remove remaining func argument processing via os.environ.

* Minor bug fix in set_extrn_mdl_params.sh

* Add suite defn in test_data.

* Minor fixes on unittest on jet.

* Simplify boolean condition checks.

* Include old in renaming of old directories

* Fix conflicting yaml !join tag for paths and strings.

* Bug fix with setting sfcperst dict.

* Imitate "readlink -m" with os.path.realpath instead of os.readlink

* Don't use /tmp as that is shared by multiple users.

* Bug fix with cron line, maintain quotes around TRUE/FALSE.

* Update to latest develop (untested)

* Bug fix with existing cron line and quotes.

* Bug fix with case-sensitive MACHINE name, and empty EXPT_DIR.

* Update to latest develop

* More updates.

* Bug fix thanks to @willmayfield! Check both starting/ending
characters are brackets for shell variable to be considered an array.

* Make empty EXPT_BASEDIR workable.

* Update to latest develop

* Update in predef grid.

* Check f90nml as well.

Co-authored-by: Daniel Abdi <[email protected]>

* Fix typo and crontab issue on wcoss dell in workflow python scripts (ufs-community#750)

* Fix typo and failure on wcoss

* fix new line issue on wcoss dell

* remove capture_output

* Get USER from environment

Co-authored-by: Daniel Abdi <[email protected]>

* Add new WE2E configs (ufs-community#748)

## DESCRIPTION OF CHANGES: 
Added two new WE2E config files for the Sub-CONUS Indianapolis domain to support the upcoming SRW release. 

In addition, modified the external data used in the `config.specify_EXTRN_MDL_SYSBASEDIR_ICS_LBCS.sh` to match more common datasets used in the WE2E testing process. 

## TESTS CONDUCTED: 
Successfully ran the new WE2E tests (`config.SUBCONUS_Ind_3km_ics_HRRR_lbcs_RAP_suite_HRRR.sh`, `config.SUBCONUS_Ind_3km_ics_HRRR_lbcs_RAP_suite_RRFS_v1beta.sh`) and `config.specify_EXTRN_MDL_SYSBASEDIR_ICS_LBCS.sh` on NOAA Parallel Works AWS instance.

## DEPENDENCIES:
None.

## DOCUMENTATION:
No documentation changes are required.

* Added a fixed WoF grid and the python tool to determine the write component parameters (ufs-community#733)

* Added a fixed WoF grid and the python tool to determine the write component parameters

* Update set_predef_grid_params.sh

* Renamed file as recommended and removed unused lines

* Modified comment

Co-authored-by: JeffBeck-NOAA <[email protected]>
Co-authored-by: WYH@MBP <[email protected]>

* Replace env with modulefiles in scripts (ufs-community#752)

* change env to mod

* update we2e script

* WE2E script improvements for usability (ufs-community#745)

## DESCRIPTION OF CHANGES: 
* Modifications to `run_WE2E_tests.sh`:
  * Add examples to help/usage statement
* Modifications to `check_expts_status.sh`:
  * Add arguments list that can be processed by `process_args`
  * Add new optional arguments:  `num_log_lines`, `verbose`
  * Include a help/usage message

## TESTS CONDUCTED:
* Ran `run_WE2E_tests.sh --help` from the command line and got the expected help message.
* Ran `check_expts_status.sh --help` from the command line and got the expected help message.
* Used `run_WE2E_tests.sh` to run a set of 2 WE2E tests -- works as expected.
* Used `check_expts_status` to check on the status of the 2 tests run above and got the expected status message.
 
## DEPENDENCIES:
PR #[241](ufs-community/ufs-srweather-app#241)

## DOCUMENTATION:
A lot of this PR is documentation in the scripts.  There is an accompanying documentation PR #[241](ufs-community/ufs-srweather-app#241) into ufs-srweather-app.

* Standardize static data across Tier-1 platforms; fix and improve IC and LBC data retrieval (ufs-community#744)

* Bug fixes (grid size + suppress screen output from module load) (ufs-community#756)

## DESCRIPTION OF CHANGES: 
1) Adjust y-direction size of write-component grid of `SUBCONUS_Ind_3km` predefined grid from 195 to 197 (this was just an oversight in PR ufs-community#725 ).
2) Redirect output of module load in launch script (`launch_FV3LAM_wflow.sh`) to `/dev/null` to avoid unwanted screen output (which was introduced in PR #[238](ufs-community/ufs-srweather-app#238) in ufs-srweather-app and is about how to load the `regional_workflow` environment and is not relevant in this context).

## TESTS CONDUCTED: 
1) Plotted the `SUBCONUS_Ind_3km` grid to ensure it has correct size (it does).
2) Manually ran `launch_FV3LAM_wflow.sh` from the command line to verify that screen output is suppressed (it is).

* Update default SPP ISEED array in config_defaults.sh to use unique values (ufs-community#759)

* Modify RRFS North America 3- and 13-km domain configuration and WE2E test.

* Modify default ISEED values for SPP

* Fix grid in WE2E test

* Update workflow python scripts (ufs-community#760)

* update python scripts

* Change output file name of run_post to meet NCO standards (ufs-community#758)

* change output file name

* change variable name

* update python script

* remove duplicates

* add a check for empty variables

* move variable to common area

* clean up unnecessary comments

* update scripts

* remove duplicate

* update python scripts

* fix user-staged dir path issue in python script

* Add POST_OUTPUT_DOMAIN_NAME to WE2E tests for new grids (ufs-community#763)

* Add new var to we2e tests for new grids

* rename we2e tests for custom grid

* remove unnecessary $

* Modifications to `CODEOWNERS` file (ufs-community#757)

* Add @gspetro-NOAA, @natalie-perlin, and @EdwardSnyder-NOAA to CODEOWNERS so they are notified of all PRs and can review them.

* Remove duplicates in CODEOWNERS; remove users who will no longer be working with the repo.

* Adding a python utility for summarizing compute. (ufs-community#769)

Adds a utility that summarizes Rocoto database computational usage information.

* Add github actions for python unittests. (ufs-community#747)

* Add github actions for python unittests.

* Include all python script in ush

* Skip defining QUILTING params when it is set to False

* Update py_workflow

* Update unittest for set_extrn_mdl_params.

* Updates from develop.

Co-authored-by: Daniel Shawul <[email protected]>

* Update sample script for NCO mode (ufs-community#771)

* update config.nco.sh

* Add comment

* Feature/noaacloud (ufs-community#767)

* updates for noaacloud

* working version

* fixes for noaacloud

* added extra modules for post

* removed cheyenne-specific crontab editing section (ufs-community#773)

* Pin down hera miniconda3 module file version. (ufs-community#770)

Pin down the version of miniconda3 on Hera, and do not append to the module path.

* update staged data dir (ufs-community#774)

Co-authored-by: JeffBeck-NOAA <[email protected]>
Co-authored-by: Mark Potts <[email protected]>
Co-authored-by: michelleharrold <[email protected]>
Co-authored-by: Chan-Hoo.Jeon-NOAA <[email protected]>
Co-authored-by: gsketefian <[email protected]>
Co-authored-by: BenjaminBlake-NOAA <[email protected]>
Co-authored-by: Benjamin.Blake EMC <[email protected]>
Co-authored-by: Benjamin.Blake EMC <[email protected]>
Co-authored-by: Benjamin.Blake EMC <[email protected]>
Co-authored-by: chan-hoo <[email protected]>
Co-authored-by: Will Mayfield <[email protected]>
Co-authored-by: Natalie Perlin <[email protected]>
Co-authored-by: Natalie Perlin <[email protected]>
Co-authored-by: danielabdi-noaa <[email protected]>
Co-authored-by: Daniel Abdi <[email protected]>
Co-authored-by: Daniel Abdi <[email protected]>
Co-authored-by: EdwardSnyder-NOAA <[email protected]>
Co-authored-by: Yunheng Wang <[email protected]>
Co-authored-by: WYH@MBP <[email protected]>
Co-authored-by: Michael Kavulich <[email protected]>
Co-authored-by: Daniel Shawul <[email protected]>
ywangwof added a commit that referenced this pull request Jun 9, 2022
* Added workflow for RRFS_v1nssl

* Renamed FV3_RRFS_v1nssl to FV3_WoFS_v0 and added WE2E tests configuraiton files for suite FV3_WoFS_v0

* Added SUBCONUS test for WoFS suite

* water_nc initial value sets to zero for FV3GFS data

* Add Gaea as a supported platform for the regional_workflow (#734)

* Updates to port regional workflow to gaea

* Temp change with -v as batch option

* new fixes for gaea/slurm

* Updated time for make lbcs

* added TEST data directory path

* Update gaea.sh

* fixes for PR

* Add more parameters to CSV file containing WE2E test info (#740)

## DESCRIPTION OF CHANGES: 
The script/function `get_WE2Etest_names_subdirs_descs.sh` (which is called from `run_WE2E_tests.sh` if needed) creates a CSV (Comma-Separated Value) file named `WE2E_test_info.csv` that contains information about the WE2E tests.  Currently, this CSV file contains only 3 columns: the test name, the names of any alternate names for the test, and the test description.  In order to have a more complete summary of the WE2E tests, this PR modifies `get_WE2Etest_names_subdirs_descs.sh` so that additional information is included in the CSV file.  This additional information consists of the values of the following experiment variables for each test:
```
PREDEF_GRID_NAME
CCPP_PHYS_SUITE
EXTRN_MDL_NAME_ICS
EXTRN_MDL_NAME_LBCS
DATE_FIRST_CYCL
DATE_LAST_CYCL
CYCL_HRS
INCR_CYCL_FREQ
FCST_LEN_HRS
LBC_SPEC_INTVL_HRS
NUM_ENS_MEMBERS
```
In addition, the script uses this information to calculate the number of times each test calls the forecast model (e.g. if the test uses 3 different cycle dates, then the forecast model will be called 3 times; if it is an ensemble test for a single cycle, the test will call the forecast model as many times as the number of ensemble members).  

## TESTS CONDUCTED: 
The script `run_WE2E_tests.sh` was called that in turn calls `get_WE2Etest_names_subdirs_descs.sh`.  This created a new CSV file that contained the new fields (columns).  The CSV file was imported into Google Sheets (using "|" as the field/column separator) and looked correct.

## DOCUMENTATION:
The documentation is for the most part already within the `get_WE2Etest_names_subdirs_descs.sh`.  This PR slightly modifies that documentation to update it.

* Update directory structure of NCO mode (#743)

* update vertical structure of NCO mode

* update sample script for nco

* Fix typo on write component of new RRFS CONUS

* Default CCPP physics option is FV3_GFS_v16 (#746)

* Updated the default CCPP physics option to FV3_GFS_v16

* Updated the default CCPP physics option to FV3_GFS_v16 in config_defaults.sh

Co-authored-by: Natalie Perlin <[email protected]>

* Adds an alternative python workflow generation path (#698)

* Workflow in python starting to work.

* Use new python_utils package structure.

* Some bug fixes.

* Use uppercase TRUE/FALSE in var_dfns

* Use config.sh by default.

* Minor bug fixes.

* Remove config.yaml

* Update to the latest develop

* Remove quotes from numbers in predef grid.

* Minor bug fix.

* Move validity checker to the bottom of setup

* Add more unit tests.

* Update with python_utils changes.

* Update to latest develop additions (Need to re-run regression test)

* Use set_namelist and fill_jinja_template as python functions.

* Replace sed regex searches with python re.

* Use python realpath.

* Construct settings as dictionary before passing to fill_jinja and set_namelist

* Use yaml for setting predefined grid parameters.

* Use xml parser for ccpp phys suite definition file.

* Remove more run_command calls.

* Simplify some func argument processing.

* Move different config format parsers to same file.

* Use os.path.join for the sake of macosx

* Remove remaining func argument processing via os.environ.

* Minor bug fix in set_extrn_mdl_params.sh

* Add suite defn in test_data.

* Minor fixes on unittest on jet.

* Simplify boolean condition checks.

* Include old in renaming of old directories

* Fix conflicting yaml !join tag for paths and strings.

* Bug fix with setting sfcperst dict.

* Imitate "readlink -m" with os.path.realpath instead of os.readlink

* Don't use /tmp as that is shared by multiple users.

* Bug fix with cron line, maintain quotes around TRUE/FALSE.

* Update to latest develop (untested)

* Bug fix with existing cron line and quotes.

* Bug fix with case-sensitive MACHINE name, and empty EXPT_DIR.

* Update to latest develop

* More updates.

* Bug fix thanks to @willmayfield! Check both starting/ending
characters are brackets for shell variable to be considered an array.

* Make empty EXPT_BASEDIR workable.

* Update to latest develop

* Update in predef grid.

* Check f90nml as well.

Co-authored-by: Daniel Abdi <[email protected]>

* Fix typo and crontab issue on wcoss dell in workflow python scripts (#750)

* Fix typo and failure on wcoss

* fix new line issue on wcoss dell

* remove capture_output

* Get USER from environment

Co-authored-by: Daniel Abdi <[email protected]>

* Add new WE2E configs (#748)

## DESCRIPTION OF CHANGES: 
Added two new WE2E config files for the Sub-CONUS Indianapolis domain to support the upcoming SRW release. 

In addition, modified the external data used in the `config.specify_EXTRN_MDL_SYSBASEDIR_ICS_LBCS.sh` to match more common datasets used in the WE2E testing process. 

## TESTS CONDUCTED: 
Successfully ran the new WE2E tests (`config.SUBCONUS_Ind_3km_ics_HRRR_lbcs_RAP_suite_HRRR.sh`, `config.SUBCONUS_Ind_3km_ics_HRRR_lbcs_RAP_suite_RRFS_v1beta.sh`) and `config.specify_EXTRN_MDL_SYSBASEDIR_ICS_LBCS.sh` on NOAA Parallel Works AWS instance.

## DEPENDENCIES:
None.

## DOCUMENTATION:
No documentation changes are required.

* Added a fixed WoF grid and the python tool to determine the write component parameters (#733)

* Added a fixed WoF grid and the python tool to determine the write component parameters

* Update set_predef_grid_params.sh

* Renamed file as recommended and removed unused lines

* Modified comment

Co-authored-by: JeffBeck-NOAA <[email protected]>
Co-authored-by: WYH@MBP <[email protected]>

* Replace env with modulefiles in scripts (#752)

* change env to mod

* update we2e script

* WE2E script improvements for usability (#745)

## DESCRIPTION OF CHANGES: 
* Modifications to `run_WE2E_tests.sh`:
  * Add examples to help/usage statement
* Modifications to `check_expts_status.sh`:
  * Add arguments list that can be processed by `process_args`
  * Add new optional arguments:  `num_log_lines`, `verbose`
  * Include a help/usage message

## TESTS CONDUCTED:
* Ran `run_WE2E_tests.sh --help` from the command line and got the expected help message.
* Ran `check_expts_status.sh --help` from the command line and got the expected help message.
* Used `run_WE2E_tests.sh` to run a set of 2 WE2E tests -- works as expected.
* Used `check_expts_status` to check on the status of the 2 tests run above and got the expected status message.
 
## DEPENDENCIES:
PR #[241](ufs-community/ufs-srweather-app#241)

## DOCUMENTATION:
A lot of this PR is documentation in the scripts.  There is an accompanying documentation PR #[241](ufs-community/ufs-srweather-app#241) into ufs-srweather-app.

* Standardize static data across Tier-1 platforms; fix and improve IC and LBC data retrieval (#744)

* Bug fixes (grid size + suppress screen output from module load) (#756)

## DESCRIPTION OF CHANGES: 
1) Adjust y-direction size of write-component grid of `SUBCONUS_Ind_3km` predefined grid from 195 to 197 (this was just an oversight in PR #725 ).
2) Redirect output of module load in launch script (`launch_FV3LAM_wflow.sh`) to `/dev/null` to avoid unwanted screen output (which was introduced in PR #[238](ufs-community/ufs-srweather-app#238) in ufs-srweather-app and is about how to load the `regional_workflow` environment and is not relevant in this context).

## TESTS CONDUCTED: 
1) Plotted the `SUBCONUS_Ind_3km` grid to ensure it has correct size (it does).
2) Manually ran `launch_FV3LAM_wflow.sh` from the command line to verify that screen output is suppressed (it is).

* Update default SPP ISEED array in config_defaults.sh to use unique values (#759)

* Modify RRFS North America 3- and 13-km domain configuration and WE2E test.

* Modify default ISEED values for SPP

* Fix grid in WE2E test

* Update workflow python scripts (#760)

* update python scripts

* Change output file name of run_post to meet NCO standards (#758)

* change output file name

* change variable name

* update python script

* remove duplicates

* add a check for empty variables

* move variable to common area

* clean up unnecessary comments

* update scripts

* remove duplicate

* update python scripts

* fix user-staged dir path issue in python script

* Add POST_OUTPUT_DOMAIN_NAME to WE2E tests for new grids (#763)

* Add new var to we2e tests for new grids

* rename we2e tests for custom grid

* remove unnecessary $

* Modifications to `CODEOWNERS` file (#757)

* Add @gspetro-NOAA, @natalie-perlin, and @EdwardSnyder-NOAA to CODEOWNERS so they are notified of all PRs and can review them.

* Remove duplicates in CODEOWNERS; remove users who will no longer be working with the repo.

* Adding a python utility for summarizing compute. (#769)

Adds a utility that summarizes Rocoto database computational usage information.

* Add github actions for python unittests. (#747)

* Add github actions for python unittests.

* Include all python script in ush

* Skip defining QUILTING params when it is set to False

* Update py_workflow

* Update unittest for set_extrn_mdl_params.

* Updates from develop.

Co-authored-by: Daniel Shawul <[email protected]>

* Update sample script for NCO mode (#771)

* update config.nco.sh

* Add comment

* Feature/noaacloud (#767)

* updates for noaacloud

* working version

* fixes for noaacloud

* added extra modules for post

* removed cheyenne-specific crontab editing section (#773)

* Pin down hera miniconda3 module file version. (#770)

Pin down the version of miniconda3 on Hera, and do not append to the module path.

* update staged data dir (#774)

* Changed to smaller SUBCONUS_Ind_3km domain

Co-authored-by: Mark Potts <[email protected]>
Co-authored-by: gsketefian <[email protected]>
Co-authored-by: Chan-Hoo.Jeon-NOAA <[email protected]>
Co-authored-by: Natalie Perlin <[email protected]>
Co-authored-by: Natalie Perlin <[email protected]>
Co-authored-by: danielabdi-noaa <[email protected]>
Co-authored-by: Daniel Abdi <[email protected]>
Co-authored-by: Daniel Abdi <[email protected]>
Co-authored-by: EdwardSnyder-NOAA <[email protected]>
Co-authored-by: JeffBeck-NOAA <[email protected]>
Co-authored-by: WYH@MBP <[email protected]>
Co-authored-by: Michael Kavulich <[email protected]>
Co-authored-by: Christina Holt <[email protected]>
Co-authored-by: Daniel Shawul <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci-hera-intel-WE Kicks off automated workflow test on hera with intel
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants