point_stat & nearest neighbour matching #1158

cbridge234 · 2021-09-15T00:54:55Z

cbridge234
Sep 15, 2021

Hi METplus team,

I hope you are well.

I am writing from the Bureau of Meteorology in Australia where we have begun replacing our existing NWP verification software with METplus. Part of this work has involved intercomparing verification results from our existing software with those from METplus.

The intercomparison activity has included verification of high-resolution, deterministic, NWP screen temperature forecasts - using the HiRA framework to calculate the CRPS. In MET, as you know, this is available through point_stat.

Having produced CRPS values from both verification systems the results match quite well. But there are some differences. To explore these differences I looked at the individual forecast/observation pairs produced by each system. Both MET and our existing software package create these pairs using the "nearest neighbour" approach to match NWP grid points to the observation location. In many cases the forecast/observation pairs from both verification systems match. But sometimes I found differences in the choice of the nearest neighbour forecast grid point identified by the two verification systems.

I did a couple of manual spot checks to identify the nearest neighbour myself and my results agreed with those from our existing verification software. In both cases the different choice of nearest neighbour occured when the observation location was almost halfway between two grid points. Specifically, the lat/lon forecast grid spacing is 0.0135 deg and the observation location was 0.007 deg from point A and 0.0065 deg from point B. Our existing software selected point B as the nearest neighbour whereas MET chose point A as the nearest neighbour. (There was no masking applied that could interfere with the "nearest neighbour" choice.)

What I noticed in the MET log file was the spatial dimensions of the forecast grid were rounded to 3 decimal places so the grid spacing was reported as 0.014 degrees instead of 0.0135 degrees.

DEBUG 3: Grid Definition: Projection: Lat/Lon Nx: 891 Ny: 742 lat_ll: -37.986 lon_ll: -144.014 delta_lat: 0.014 delta_lon: 0.014

I am not sure if MET's spatial precision really is to three decimal places but I did wonder if that might explain the discrepancy in finding the nearest neighbour. An observation 0.007 deg from point A would also be 0.007 deg from point B if the grid spacing were 0.014 deg. In this equidistant scenario picking point A as the nearest neighbour would be a legitimate option. Do you know if there is such a limit on spatial precision in MET and could it explain what I am seeing?

Any help would be much appreciated. I can provide the forecast/obs data and MET config file if needed.

Thanks,
Chris

Answered by JohnHalleyGotway

Sep 21, 2021

Chris, following up with some testing details. I ran the following commands to run pcp_combine and then manually reset the delta_lon value to "0.0135".

pcp_combine -add scrn_temp_v2020020212.nc 'name="air_temperature"; level="(0,*,*)";' scrn_temp_v2020020212_pcp_combine.nc
ncatted -a delta_lon,global,o,c,"0.0135 degrees" -a lon_ll,global,o,c,"144.0135 degrees_east" scrn_temp_v2020020212_pcp_combine.nc -o scrn_temp_v2020020212_FIX_LON.nc

Testing using the original and modified versions of the gridded data does result in a different choice for nearest neighbor:
Original inputs chooses the upper-left corner:

VERSION MODEL DESC FCST_LEAD FCST_VALID_BEG  FCST_VALID_END  OBS_LEAD OBS_VALID_BEG…

View full answer

georgemccabe · 2021-09-15T15:39:24Z

georgemccabe
Sep 15, 2021
Maintainer

Hi Chris,

An example of the forecast/obs data are your MET config file would be very helpful to look into this discrepancy. You can attach
the files to this discussion if they are below the size limit or upload them to our FTP server if they are too large (see #954).

I have assigned this issue to @JohnHalleyGotway who will take a closer look at your data when it is available.

1 reply

cbridge234 Sep 16, 2021
Author

Thanks George, I'll get on to it.
Chris

JohnHalleyGotway · 2021-09-15T16:05:40Z

JohnHalleyGotway
Sep 15, 2021
Maintainer

Hi Chris,

I read through this discussion post and see that you've already done a lot of investigation on your end. I appreciate that! I'm glad to hear that there's a lot of agreement between the MET output and the existing BOM system. As to why there is a discrepancy on the nearest neighbor choice for some of the observations, particularly those close to the middle of a grid box, your proposed explanation sounds reasonable to me. If the BOM software uses a grid with delta_lat = delta_lon = 0.0135, and MET is using 0.014 that difference would likely manifest in the behavior you describe.

That being said, no, MET does not have a limitation of 3 decimal places for the grid. However, data stored in GRIB1 files is limited in exactly this way I believe. In fact we ran into something very similar when working with grids from NOAA/NCEP. They have a set of pre-defined grids, see here, each of which is identified by an integer. In MET, these can be used by name, like "G212" means NOAA/NCEP grid number 212. However some of the grids on the website are defined using more than 3 decimals places. But in GRIB1, the grid parameters are stored as integers in thousandths of a degree... thus limiting them to 3 decimal places. This issue does not exist in GRIB version 2, to my knowledge.

So please take a step back and look at how you're running Point-Stat. MET extracts the grid from the gridded forecast data you pass to it. How is that grid defined in that data file? If its helpful, feel free to send use a sample forecast data file. If it's small enough (i.e. 1 GRIB record), you can just tar/zip it and attach it to this discussion. Otherwise you can post to our anonymous FTP site that @georgemccabe mentioned.

Also note that MET supports interpolation methods for UPPER_LEFT, UPPER_RIGHT, LOWER_RIGHT, and LOWER_LEFT to select the 4 corners of the grid box in which each observation point falls. While they don't solve this issue, they may be useful in your debugging.

Thanks!

0 replies

cbridge234 · 2021-09-16T05:09:24Z

cbridge234
Sep 16, 2021
Author

Hi John,
Thanks for your quick reply.

I have also come across the GRIB1 problem you described. In this case though I am using NETCDF data - I should have mentioned that in my original post. I've also gone through my config file and trimmed out what I thought might be potentially problematic but I still get the same result. I am still quite new to MET/METplus though…

Anyway I have attached the following files, zipped up, to see if you are able to shed some light on this one:
• A netCDF4 Fcst file containing a single 6hr screen temperature forecast grid for New South Wales in Australia: scrn_temp_v2020020212.nc
• A netCDF4 observation file containing a single reading from a single automatic weather station (WMO id 95770): single_ob_v2020020212.nc
• Two config files – sfc_temp.conf and a modified version of PointStatConfig_wrapped
• The stat file that was produced: point_stat_060000L_20200202_120000V.stat
• The log file produced: metplus.log.20210916022811

I have also included a picture to try and explain what's happening.

I ran METplus v4 like this:
run_metplus.py -c sfc_temp.conf

Do let me know if the attachments don't make it through or if anything is unclear.

Best regards,
Chris

data_config_and_outputs.zip

0 replies

JohnHalleyGotway · 2021-09-20T20:17:02Z

JohnHalleyGotway
Sep 20, 2021
Maintainer

Chris, thanks for sending along your sample data. Before jumping into the details of this one point, I wanted to check to see how MET is interpreting the input grid. I did this by running the pcp_combine tool and checking the NetCDF output file. This call to pcp_combine just reads the air_temperature input data and writes the result out to another NetCDF file:

pcp_combine -add scrn_temp_v2020020212.nc 'name="air_temperature"; level="(0,*,*)";' add.nc

Here's the grid def from that file:

ncdump -h add.nc
...
		:Projection = "LatLon" ;
		:lat_ll = "-37.986500 degrees_north" ;
		:lon_ll = "144.013504 degrees_east" ;
		:delta_lat = "0.013500 degrees" ;
		:delta_lon = "0.013504 degrees" ;
		:Nlat = "742 grid_points" ;
		:Nlon = "891 grid_points" ;

And I suspect it's that "delta_lon" setting of 0.013504, rather than 0.0135, that's causing the behavior you're seeing. That could cause the difference in choice between neighboring grid points.

But the question is why. Here's the line where that it set:
https://github.com/dtcenter/MET/blob/ba739729040dff1b513b4d44a296bc098167103f/met/src/libcode/vx_data2d_nccf/nccf_file.cc#L2910

So it's just computed as the difference between the first and second values of longitude. And further down, the code checks to make sure those diffs remain constant over the grid.

But I'm still not quite sure why we're getting 0.013504 instead of 0.0135. I suspect that if we can correct this, then the expected corner of the grid box will be chosen.

1 reply

cbridge234 Sep 21, 2021
Author

Hi John,
Thanks for the detective work here, I think you are on to something. The input grid started life as a Unified Model fields file on a rotated pole grid. I transformed it to a netCDF regular lat/lon grid to use with METplus but I wonder if I've subtley goofed the transformation. Let me dig into this and get back to you.
Cheers,
Chris

JohnHalleyGotway · 2021-09-21T16:51:58Z

JohnHalleyGotway
Sep 21, 2021
Maintainer

Chris, following up with some testing details. I ran the following commands to run pcp_combine and then manually reset the delta_lon value to "0.0135".

pcp_combine -add scrn_temp_v2020020212.nc 'name="air_temperature"; level="(0,*,*)";' scrn_temp_v2020020212_pcp_combine.nc
ncatted -a delta_lon,global,o,c,"0.0135 degrees" -a lon_ll,global,o,c,"144.0135 degrees_east" scrn_temp_v2020020212_pcp_combine.nc -o scrn_temp_v2020020212_FIX_LON.nc

Testing using the original and modified versions of the gridded data does result in a different choice for nearest neighbor:
Original inputs chooses the upper-left corner:

VERSION MODEL DESC FCST_LEAD FCST_VALID_BEG  FCST_VALID_END  OBS_LEAD OBS_VALID_BEG   OBS_VALID_END   FCST_VAR        FCST_UNITS FCST_LEV OBS_VAR OBS_UNITS OBS_LEV OBTYPE VX_MASK INTERP_MTHD INTERP_PNTS FCST_THRESH OBS_THRESH COV_THRESH ALPHA LINE_TYPE TOTAL INDEX OBS_SID OBS_LAT OBS_LON OBS_LVL OBS_ELV FCST      OBS       OBS_QC CLIMO_MEAN CLIMO_STDEV CLIMO_CDF
V10.0.0 WRF   NA   060000    20200202_120000 20200202_120000 000000   20200202_110000 20200202_110000 air_temperature K          0,*,*    TMP     NA        L0      ADPSFC FULL    NEAREST     1           NA          NA         NA         NA    MPR           1     1 95770.0 -33.267 151.567       0       0 296.96875 296.10001      2         NA          NA        NA
V10.0.0 WRF   NA   060000    20200202_120000 20200202_120000 000000   20200202_110000 20200202_110000 air_temperature K          0,*,*    TMP     NA        L0      ADPSFC FULL    UPPER_LEFT  1           NA          NA         NA         NA    MPR           1     1 95770.0 -33.267 151.567       0       0 296.96875 296.10001      2         NA          NA        NA

Modified inputs chooses the upper-right corner:

VERSION MODEL DESC FCST_LEAD FCST_VALID_BEG  FCST_VALID_END  OBS_LEAD OBS_VALID_BEG   OBS_VALID_END   FCST_VAR        FCST_UNITS FCST_LEV OBS_VAR OBS_UNITS OBS_LEV OBTYPE VX_MASK INTERP_MTHD INTERP_PNTS FCST_THRESH OBS_THRESH COV_THRESH ALPHA LINE_TYPE TOTAL INDEX OBS_SID OBS_LAT OBS_LON OBS_LVL OBS_ELV FCST      OBS       OBS_QC CLIMO_MEAN CLIMO_STDEV CLIMO_CDF
V10.0.0 WRF   NA   060000    20200202_120000 20200202_120000 000000   20200202_110000 20200202_110000 air_temperature K          0,*,*    TMP     NA        L0      ADPSFC FULL    NEAREST     1           NA          NA         NA         NA    MPR           1     1 95770.0 -33.267 151.567       0       0 297.53125 296.10001      2         NA          NA        NA
V10.0.0 WRF   NA   060000    20200202_120000 20200202_120000 000000   20200202_110000 20200202_110000 air_temperature K          0,*,*    TMP     NA        L0      ADPSFC FULL    UPPER_RIGHT 1           NA          NA         NA         NA    MPR           1     1 95770.0 -33.267 151.567       0       0 297.53125 296.10001      2         NA          NA        NA

4 replies

JohnHalleyGotway Sep 21, 2021
Maintainer

Some more followup. I added some print statements where the code determines the grid definition from the lat/lon dimensions. I also called "cout.precision(25)" to print LOTS of precision and I see:

lat_values[0] = -37.986499786376953125 lat_values[1] = -37.97299957275390625 dlat = 0.013500213623046875
lon_values[0] = 144.0135040283203125 lon_values[1] = 144.027008056640625 dlon = 0.0135040283203125

So you can see that "dlat" rounds nicely to 0.0135 while "dlon" is 0.013504. These sort of precision issues are pretty tedious. I see two options here:

You can try re-creating this NetCDF file but use more precision when writing the lat/lon values so that we get dlon = 0.0135 instead of 0.013504.
You can instead manually override the grid definition that MET is reading using the "set_attr_grid" configuration option described in this configuration section.

For example, rerunning Point-Stat using your original inputs but tweaking the configuration (see below), MET does select the UPPER_RIGHT grid point as the nearest neighbor:

fcst = {
   // LatLon GridSpec String Format "latlon Nx Ny lat_ll lon_ll delta_lat delta_lon"
   set_attr_grid = "latlon 891 742 -37.9865 144.0135 0.0135 0.0135";
   field = [ { name  = "air_temperature"; level = [ "(0,*,*)" ]; } ];
}

See LatLon grid specification string in Appendix B.

However, having to manually define the grid each time you run is pretty onerous. We do have an issue for making the addition of "named" pre-defined grids much more easy. But that really doesn't help that much.

Please let me know if you have any suggestions for how make the parsing of grids from NetCDF files more reliable. I suppose we could add some configuration option to specify the grid specification precision. But I'm guessing this level of detail would elude 99% of users. Most wouldn't know to use that config option even if it did exist! I definitely see this as one advantage to GRIB2 data, fewer issues in the significant digits used to define the grid.

cbridge234 Sep 21, 2021
Author

Hi John,
I've just replied to one of your earlier posts without noticing these new ones, sorry! It's great that you have come up with these solutions. Let me take a closer look at my input file and see if I can clean up the precision and I'll get back to you.
Appreciate your efforts here.
Chris

cbridge234 Sep 28, 2021
Author

Hi John,

First, following your investigations I have taken a closer look at the precision of the longitudes in my netCDF file and you were right.
At first glance the longitudes seemed fine:
ncdump -v longitude scrn_temp_iris.nc
longitude = 144.0135, 144.027 , 144.0405….

But things looked different when I added another couple of decimal places - I saw what you saw.
ncdump -p 9 -v longitude scrn_temp_iris.nc
longitude = 144.013504, 144.027008 , 144.040497…

As a quick fix I recreated the netCDF file, forcing the longitudes to be as I expected them. Similar to what you did with ncatted. Sure enough, this replicated what you found and led to a perfect match in the fcst/obs pairs produced by METplus and the VER verification system that I have been comparing against. So you were on the money - mystery solved.

To find out if there is a smarter way to do the Fieldsfile to netCDF conversion and avoid the longitude precision problem I asked some of my colleagues. They used a different approach but, in fact, this also showed similar precision problems. The plot below shows differences between the true longitude in the original Fieldsfile (horizontal orange line), my netCDF file (green), and my colleagues netCDF file converted using another utility (blue) as a function of longitude index in the netCDF file. Both netCDF conversion processes lead to (different!) differences in longitude compared to the original file. Current thinking is that both conversion approaches are limited by single precision. This might need more exploration.

Another alternative is for me to try a Fieldsfile to GRIB conversion instead. As you mentioned, GRIB2 might be easier to work with in this respect than netCDF.

Second, your suggestion for making the handling of netCDF grids more reliable in METplus sounds quite reasonable. Though I take your point that it could be too much down in the detail for most people. Coming up with good solutions to this sort of thing strikes me as being tricky - what should METplus provide versus how well should the user know the technical limitations of their data/meta data? My only suggestion for this particular case is whether the line you pointed out in nccf_file.cc:
double dlon = rescale_lon(lon_values[1] - lon_values[0]);
could be modified to something like:
double dlon = rescale_lon((lon_values[last] - lon_values[0])/(num_lons-1));
This might get round precision problems with determining dlon from neighbouring grid points. But there may well be good reasons for the current implementation being as it is.

Finally, thanks for coming up with the two solutions you provided. Fixing up the longitudes before the file sees METplus seems to work fine for me. But the "set_attr_grid" config option that you described is a useful alternative too.

Cheers,
Chris

JohnHalleyGotway Sep 28, 2021
Maintainer

Chris, thanks for following up to confirm!

At this point I'm going to LOCK this conversation to prevent future posts. Our team has decided that once discussions have been answered, and the answer has been confirmed, we'll lock them. We want to encourage users to ask new questions in new discussions rather than posting to old ones. Hopefully that'll make the questions/answers easier for other users to follow.

So if/when more issues/questions arise, please feel free to start a new discussion.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

point_stat & nearest neighbour matching #1158

{{title}}

Replies: 5 comments 6 replies

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

Select a reply

point_stat & nearest neighbour matching #1158

cbridge234 Sep 15, 2021

Replies: 5 comments · 6 replies

georgemccabe Sep 15, 2021 Maintainer

cbridge234 Sep 16, 2021 Author

JohnHalleyGotway Sep 15, 2021 Maintainer

cbridge234 Sep 16, 2021 Author

JohnHalleyGotway Sep 20, 2021 Maintainer

cbridge234 Sep 21, 2021 Author

JohnHalleyGotway Sep 21, 2021 Maintainer

JohnHalleyGotway Sep 21, 2021 Maintainer

cbridge234 Sep 21, 2021 Author

cbridge234 Sep 28, 2021 Author

JohnHalleyGotway Sep 28, 2021 Maintainer

cbridge234
Sep 15, 2021

Replies: 5 comments 6 replies

georgemccabe
Sep 15, 2021
Maintainer

cbridge234 Sep 16, 2021
Author

JohnHalleyGotway
Sep 15, 2021
Maintainer

cbridge234
Sep 16, 2021
Author

JohnHalleyGotway
Sep 20, 2021
Maintainer

cbridge234 Sep 21, 2021
Author

JohnHalleyGotway
Sep 21, 2021
Maintainer

JohnHalleyGotway Sep 21, 2021
Maintainer

cbridge234 Sep 21, 2021
Author

cbridge234 Sep 28, 2021
Author

JohnHalleyGotway Sep 28, 2021
Maintainer