-
Notifications
You must be signed in to change notification settings - Fork 119
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MET_ensemble_verification_only_vx_time_lag no longer works on Tier 1 machines #900
Comments
@mkavulich - some input on the changes from #PR-864 and whether they could could have affected this test could be really helpful! I'm not sure the |
I was able to clone the develop branch on Orion, build the SRW App, then submit the The test was fundamentally changed in PR #864 to require the verification data to be pulled from HPSS (please see lines 31-34 of the MET_ensemble_verification_only_vx_time_lag configuration file). The test no longer uses the staged data. With this change, this test can only be run on Hera and Jet. It should also be noted that the data in question appears to contain restricted data. If you aren't a member of the rstprod project, then you will be unable to pull the necessary data from HPSS, resulting in the test failing. |
@natalie-perlin - I can confirm that the removal of line 31-34 in the |
@MichaelLueken thanks for jumping in with a reply. Your summary is correct: these two WE2E tests are intended to only check for data on HPSS. I used HPSS data for the If there is a desire to make the time-lag test use staged data that would be fine, but at least one of the verification tests should be run pulling data from HPSS to test that functionality. |
@mkavulich @MichaelLueken - thank you for your comments
|
@natalie-perlin - As per today's meeting, please ensure that you log into AIM and request access to the rstprod project. You will be asked to provide justification to be granted permission. If you include:
access should be granted. Once you have access to rstprod on RDHPCS, you will need to let HPSS know that you have been granted access to rstprod on RDHPCS so that you can pull the tarball that contains restricted data from HPSS. The email for the HPSS helpdesk is [email protected]. Skylar Nelson is the lead for the HPSS helpdesk, so including an email to him might expedite the process. Closing this issue now. |
MET verification tests use modules met and metplus from software stacks on Tier 1 machines, and the changes were implemented in PR-826 (#826)
Since then, changes that were implemented affected the MET verification tasks, and MET_ensemble_verification_only_vx_time_lag no longer seem to work (Tested Hera, Gaea, Orion; new platform Derecho).
Tasks get_obs_ccpa, get_obs_mrms, get_obs_ndas fail.
Log files could be viewed on Hera:
/scratch1/NCEPDEV/stmp2/Natalie.Perlin/SRW/expt_dirs/MET_ensemble_verification_only_vx_time_lag/log/get_obs_ndas_2021050500.log, get_obs_mrms_2021050500.log, get_obs_ccpa_2021050500.log
Attached are the get_obs_*_2021050500.log files, var_defns.sh and generated FV3LAM_wflow.xml workflow.
Expected behavior
MET_ensemble_verification_only_vx_time_lag test passes successfully on Hera (intel and gnu, Gaea, Orion, Jet, Derecho). Tasks
get_obs_ccpa, get_obs_mrms, get_obs_ndas do not need to be run, as the data is staged on these systems.
Current behavior
Tasks that fail are get_obs_ccpa, get_obs_mrms, get_obs_ndas.
Machines affected
Any system running SRW
Steps To Reproduce
Example for Orion :
See the bug... -->
In MET_ensemble_verification_only_vx_time_lag tests done before merging the PR-826, no get_obs_ccpa, get_obs_mrms, get_obs_ndas were run, as all of the data were staged on each machine.
An example of a successful MET_ensemble_verification_only_vx_time_lag test on Hera:
SRW base directory: /scratch1/NCEPDEV/stmp2/Natalie.Perlin/SRW/srw-dev-met
Experiment directory: /scratch1/NCEPDEV/stmp2/Natalie.Perlin/SRW/INTEL/MET_ensemble_verification_only_vx_time_lag
Detailed Description of Fix (optional)
May need to be related to configurations in parm/wflow/*.yaml, and ./ush/machine/verify_*.yaml files
Additional Information (optional)
There are differences between machine files used in PR-826, i.e., setting the OBS data directories, and their current versions (and also earlier version, before the PR-826). Example for Hera; used the following data in PR-826:
Current hera.yaml and the machine file before the merge of PR-826 contain the following:
Possible Implementation (optional)
Output (optional)
get_obs_ccpa_2021050500.log
get_obs_ndas_2021050500.log
get_obs_mrms_2021050500.log
var_defns.sh.txt
FV3LAM_wflow.xml.txt
The text was updated successfully, but these errors were encountered: