-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refine SEEPS processing logic and output naming conventions #2882
Comments
QA review 99% complete, and almost ready to hand over code changes. Pull request will be a combination of bug fixes and code tidy/clean up to make it less ambiguous. One of the key issues has been that this code has been translated between so many different languages, some of which are row-dominant/major and others are column-dominant/major. In these situations using matrix referencing becomes highly ambiguous and some issues with data being used in a (wrongly) transposed form were found. |
Recommend serving up the gridded SEEPS climatology data via this DTCenter Zenodo community. @mpm-meto and @RachelNorth, please sign into Zenodo and send me you usernames so that I can add you as curators to the DTCenter community. Once this data is posted to Zenodo, we'll need to update the MET User's Guide with instructions for finding/using it. |
Will also want to test with data once accessible; @hsoh-u worked on SEEPS previously and should be able to help. |
We don't have permissions to create a branch. Please can this be fixed or someone create a branch for us? |
Rachel and I have found a way to edit using a fork. This creates a branch but it does create two branches so we're not editing the same branch. The branches will need to be merged. We don't want to create extra work but we don't know enough about git (yet) to know how to do that. |
Changes to variable names to make them less ambiguous and bug fixes associated with SEEPS QA in dtcenter#2882
@mpm-meto and @RachelNorth, please find the results of our work today in this feature branch on the MET repo: As a reminder, it currently fails to compile in
|
@RachelNorth and @mpm-meto, this is a reminder about 2 topics. Based on our recent work together, it looks like we'll be modifying the output column names in the SEEPS line type, replacing If the PR for this work, does change those column names...
@bikegeek, I'm tagging you here to put this on your radar. |
…d revisiting the volume of SEEPS-related Debug log messages and reducing them once its fully tested.
@mpm-meto and @RachelNorth, FYI, I got this feature branch compiling again with the commit listed right above this comment. I did merge the latest changes from the That commit includes a lot of standardizing of debug log messages as well. Although, once you're satisfied with the functionality, I'd recommend removing some of them... since there are so many. It's good to have useful debugging, but each one slows down the code slightly. Even if it's not printed by the logger, it still takes time to construct the message, pass it to the logger, and have the logger decide whether or not to print it. I did also manually trigger this testing.yml GitHub action run. It compiled the code and ran all the unit tests, one of which failed with:
So clearly, I we need to update the variable names in @robdarvell hopefully the following commands will help in getting this branch compiled there:
Then @mpm-meto and @RachelNorth will make edits as needed in the source code.
|
@mpm-meto and @RachelNorth, FYI, I ran the following
The result can be found in: I also re-triggered another regression test run for the MET-feature_2882_seeps_qa branch. That compiled the MET code, saw that there's a new timestamp on the input data tarfile, automatically pulled the updated inputs, ran all the unit tests, and then diffed against our previous output. The good news is that all the tests ran. However some differences were flagged, as can be seen by downloading this diff artifact tar file. You'll find files in there with the For example, comparing
Those numbers differ a lot. And I also assume that we need to modify the actual output columns names of Please take a look and let me know what the next steps are. Would you like me to change the output columns names on this feature branch? |
….nc file name. Rename as _v12.0.nc for the updated version with the new names so that the existing regressions tests and nightly builds for main_v11.1 and develop continue to work. We can remove the _v12.0 once this feature branch is merged into develop but for the time being, we need both versions to exist.
When meeting on 9/20/24, @mpm-meto confirmed that @JohnHalleyGotway should proceed with the changes to the output column names described above and submit a PR to include these changes in the MET-12.0.0-beta6 development release. @mpm-meto will coordinate with @RachelNorth to confirm that no additional changes are needed at this time. However, they will plan to test this functionality in MET-12.0.0-beta6 for accuracy and will submit a bugfix issue if they encounter any unexpected behvior. |
… to the more descriptive ODFL, ODFH, OLFD, OLFH, OHFD, OHFL names.
… like the SEEPS score itself so that they're handled in a consistent manner. Note however that it's hard-coded to NOT write the weighted means/score, only the unweighted ones.
…ean_fcst and mean_obs values.
…id of compiler warning message.
…loat and double inputs to satisfy compiler pb2nc compiler warnings.
…on number from the gridded SEEPS climo file name ci-skip-all
* Update seeps.h Change variable names to reduce ambiguity for interpretation and aid useability. * Update seeps.cc Pull through variable name changes and renaming of functions to aid legibility and clarity. Introduced some additional debug print statements. * Update grid-stat.rst Add documentation about the location of the gridded climatology files for SEEPS and which environment variable to use. * Replace read_seeps_scores() with get_seeps_climo_grid() * Manually merging Rachel's patch-1 changes. * Getting close to getting these seeps changes to compile. But it's failing in pair_data_point.cc * Per #2882, get branch feature_2882_seeps_qa compiling again. Recommend revisiting the volume of SEEPS-related Debug log messages and reducing them once its fully tested. * Per #2882, need to update the handling of the PPT24_seepsweights_grid.nc file name. Rename as _v12.0.nc for the updated version with the new names so that the existing regressions tests and nightly builds for main_v11.1 and develop continue to work. We can remove the _v12.0 once this feature branch is merged into develop but for the time being, we need both versions to exist. * Per #2882, rename the SEEPS columns from S12, S13, S21, S23, S31, S32 to the more descriptive ODFL, ODFH, OLFD, OLFH, OHFD, OHFL names. * Per #2882, update SEEPS details * Per #2882, store and report the weighted mean fcst and mean obs, just like the SEEPS score itself so that they're handled in a consistent manner. Note however that it's hard-coded to NOT write the weighted means/score, only the unweighted ones. * Per #2882, change SEEPS debug log levels and correct the storage of mean_fcst and mean_obs values. * Per #2882, correct SEEPS column name lookups * Per #2882, call is_bad_data() instead of is_eq(..., -9999.0) to get rid of compiler warning message. * Per #2882, add 2 more variations of the is_eq() function with mixed float and double inputs to satisfy compiler pb2nc compiler warnings. * Per #2882, switch from dynamically allocated arrays to std::vector * Per #2882, enhance Stat-Analysis to write the SEEPS line type to an output .stat file. * Per #2882, update the aggregated seeps computation to use better-initialized vectors. * Per #2882, resolve a few more SonarQube code smells. * Per #2882, now that this PR is ready to merge, remove the v12.0 version number from the gridded SEEPS climo file name ci-skip-all --------- Co-authored-by: mpm-meto <[email protected]>
* Per #2887, update NumArray::vals() to return a reference to the vector rather a pointer to doubles. * Per #2887, switch over the whole ContingencyTable class heirarchy from storing integer counts to storing double-precision weights. * Add ContingencyTable::is_integer() member function to check whether the table contains all integers * Per #2887, update parse_stat_line.cc to get it to compile after changing PCT to store thresholds in a std::vector. * Per #2887, update PCTInfo::clear() logic. * Per #2887, update ctc_by_row() logic to create reproducible results with the develop branch. * Per #2887, update logic of define_prob_bins() to add a final >=1.0 threshold if needed. While ==0.1 works fine, I found that ==0.05 did not because the last >=1.0 threshold was missing likely do to floating point precision issues. This change should fix that problem. * Per #2887, update roc_auc() function to match the develop branch * Per #2887, fix bug if computation of far() * Per #2887, replaced all ==0 integer equality checks with calls to is_eq() instead and fix a couple of equations to snuff out diffs in some CTS statistics. * Per #2887, address some of the 34 SonarQube code smells flagged for this PR. Note that the compute_ci.h/.cc changes are necessary and good since we should be computing CI's using doubles instead of integer counts. * Per #2887, update run_sonarqube.sh to specify the target CXX standard as 11. The hope is that that will limit the findings to only those features available in the C++11 standard. * Per #2887, update to SonarQube version 6.1.0.4477 released on 6/27/2024. * Per #2887, updating build_met_sonarqube.sh to specify --std=c++11 since c++17 is used by default * Per #2887, swap in a much simpler implementation of the ORSS statistic to match the equation listed in the MET User's Guide. * Per #2887, update grid_stat and library code to actually apply the grid_weight_flag settings to the computation of contingency table counts and statistics. * Per #2887, fix the handling of bad data in the ORSS equation. * Per #2887, add Npairs member to the ContingencyTable class, eliminate the n() accessor function, and carefully replace references to n() with n_pairs() for the integer number of matched pairs or total() with the double-precision sum of the weights. * Per #2887, reset Npairs = 0 for ContingencyTable::zero_out() * Per #2883, need to call set_n_pairs() in a few spots to set ECLV TOTAL column correctly ci-run-unit * Per #2887, call set_n_pairs() when aggregating PCT data in Series-Analysis ci-run-unit * Per #2887, update stat_analysis to parse the TOTAL column for the PCT and MCTC line types. * Pet #2882, call set_n_pairs() after set_size() ci-run-unit * Per #2887, reconfigure existing Ensemble-Stat unit test to request probabilistic output to see that it's impacted by the grid_weight_flag setting. * Per #2887, update Ensemble-Stat test to provide climo stdev data * Per #2887, add grid_weight_flag to the list of config options for Grid-Stat and Ensemble-Stat. * Per #2887, disable FHO output if grid_weight_flag != NONE. * Per #2887, revise the existing unit_grid_weight.xml unit tests for Grid-Stat to write CTC/CTS/MCTC/MCTS output and for the DESC column to be populated to indicate the type of grid weighting that was applied. * Per #2887, relatively small changes to drive down SonarQube code smells. Also, switch from total() to n_pairs() when computing confidence intervals. * Per #2887, more SonarQube tweaks * Per #2887, more SonarQube tweaks. * Per #2887, more SonarQube tweaks. * Per #2887, whitespace only changes. * Per #2287, fix path the seeps climo grid. * Per #2887, update the grid_weight_flag documentation. * Per #2887, tweak the wording.
* Per #2887, update NumArray::vals() to return a reference to the vector rather a pointer to doubles. * Per #2887, switch over the whole ContingencyTable class heirarchy from storing integer counts to storing double-precision weights. * Add ContingencyTable::is_integer() member function to check whether the table contains all integers * Per #2887, update parse_stat_line.cc to get it to compile after changing PCT to store thresholds in a std::vector. * Per #2887, update PCTInfo::clear() logic. * Per #2887, update ctc_by_row() logic to create reproducible results with the develop branch. * Per #2887, update logic of define_prob_bins() to add a final >=1.0 threshold if needed. While ==0.1 works fine, I found that ==0.05 did not because the last >=1.0 threshold was missing likely do to floating point precision issues. This change should fix that problem. * Per #2887, update roc_auc() function to match the develop branch * Per #2887, fix bug if computation of far() * Per #2887, replaced all ==0 integer equality checks with calls to is_eq() instead and fix a couple of equations to snuff out diffs in some CTS statistics. * Per #2887, address some of the 34 SonarQube code smells flagged for this PR. Note that the compute_ci.h/.cc changes are necessary and good since we should be computing CI's using doubles instead of integer counts. * Per #2887, update run_sonarqube.sh to specify the target CXX standard as 11. The hope is that that will limit the findings to only those features available in the C++11 standard. * Per #2887, update to SonarQube version 6.1.0.4477 released on 6/27/2024. * Per #2887, updating build_met_sonarqube.sh to specify --std=c++11 since c++17 is used by default * Per #2887, swap in a much simpler implementation of the ORSS statistic to match the equation listed in the MET User's Guide. * Per #2887, update grid_stat and library code to actually apply the grid_weight_flag settings to the computation of contingency table counts and statistics. * Per #2887, fix the handling of bad data in the ORSS equation. * Per #2887, add Npairs member to the ContingencyTable class, eliminate the n() accessor function, and carefully replace references to n() with n_pairs() for the integer number of matched pairs or total() with the double-precision sum of the weights. * Per #2887, reset Npairs = 0 for ContingencyTable::zero_out() * Per #2883, need to call set_n_pairs() in a few spots to set ECLV TOTAL column correctly ci-run-unit * Per #2887, call set_n_pairs() when aggregating PCT data in Series-Analysis ci-run-unit * Per #2887, update stat_analysis to parse the TOTAL column for the PCT and MCTC line types. * Pet #2882, call set_n_pairs() after set_size() ci-run-unit * Per #2887, reconfigure existing Ensemble-Stat unit test to request probabilistic output to see that it's impacted by the grid_weight_flag setting. * Per #2887, update Ensemble-Stat test to provide climo stdev data * Per #2887, add grid_weight_flag to the list of config options for Grid-Stat and Ensemble-Stat. * Per #2887, disable FHO output if grid_weight_flag != NONE. * Per #2887, revise the existing unit_grid_weight.xml unit tests for Grid-Stat to write CTC/CTS/MCTC/MCTS output and for the DESC column to be populated to indicate the type of grid weighting that was applied. * Per #2279, add the MaskSID struct to store information about station id names and corresponding weights. * Per #2279, add new PointWeightType enumeration along with code to parse it. * Per #2279, adding point_weight_flag option to all Point-Stat and Ensemble-Stat config file and tweaking whitespace. * Per #2279, add point_weight_flag to the Point-Stat and Ensemble-Stat config class. Also remove sue unneeded wgt_dp argument for the add_point_obs() functions. Plan to add logic to set the point weights only AFTER all the observations have been collected for each verification task. * Per #2279, use the default_weight contstant instead of the literal 1.0 value. * Per #2279, add stubs for actually applying the point_weight_flag settings. * Per #2279, fix PairBase to actually set point weight values parsed from station id masks. * Per #2279, trying to fix 2 sonarqurqube bugs * Per #2279, fix a couple bugs parsing the SID weights and add a new unit_point_weight.xml unit test to run Point-Stat on scalar and probability inputs weighting the stations by their elevation. Still need to add Ensemble-Stat calls. * Per #2279, fix small bug ci-run-unit * Per #2279, add ensemble_stat calls to unit_point_weight.xml * Per #2279, add documentation about the point_weight_flag configuration option. * Per #2279, working on debug and warning messages. * Per #2279, tweak the user's guide * Per #2279, switch MaskSID::sid_list from a vector of pairs to a simpler map named sid_map. * Per #2279, fix the madis2nc call to parse_sid_mask() * Per #2279, move MaskSID from vx_config over into dedicated vx_util/mask_sid.h and .cc to be consistent with mask_poly.h. I note that the members of the MaskSID struct were not being initialized properly. So making it a complete class was the right solution. * Per #2279, another change to make it compile. * Per #2279, more tweaks to get it to compile.
Following the addition of the SEEPS code to PointStat and GridStat a science quality assurance (QA) review of the code needs to be carried out.
Met Office will conduct this code review. Changes implemented by DTC.
Assignee
Checklist
See the METplus Workflow for details.
Branch name:
bugfix_<Issue Number>_main_<Version>_<Description>
Pull request:
bugfix <Issue Number> main_<Version> <Description>
Select: Reviewer(s) and Development issue
Select: Milestone as the next bugfix version
Select: Coordinated METplus-X.Y Support project for support of the current coordinated release
Branch name:
bugfix_<Issue Number>_develop_<Description>
Pull request:
bugfix <Issue Number> develop <Description>
Select: Reviewer(s) and Development issue
Select: Milestone as the next official version
Select: MET-X.Y.Z Development project for development toward the next official release
The text was updated successfully, but these errors were encountered: