point_stat & nearest neighbour matching #1158
-
Hi METplus team, I hope you are well. I am writing from the Bureau of Meteorology in Australia where we have begun replacing our existing NWP verification software with METplus. Part of this work has involved intercomparing verification results from our existing software with those from METplus. The intercomparison activity has included verification of high-resolution, deterministic, NWP screen temperature forecasts - using the HiRA framework to calculate the CRPS. In MET, as you know, this is available through point_stat. Having produced CRPS values from both verification systems the results match quite well. But there are some differences. To explore these differences I looked at the individual forecast/observation pairs produced by each system. Both MET and our existing software package create these pairs using the "nearest neighbour" approach to match NWP grid points to the observation location. In many cases the forecast/observation pairs from both verification systems match. But sometimes I found differences in the choice of the nearest neighbour forecast grid point identified by the two verification systems. I did a couple of manual spot checks to identify the nearest neighbour myself and my results agreed with those from our existing verification software. In both cases the different choice of nearest neighbour occured when the observation location was almost halfway between two grid points. Specifically, the lat/lon forecast grid spacing is 0.0135 deg and the observation location was 0.007 deg from point A and 0.0065 deg from point B. Our existing software selected point B as the nearest neighbour whereas MET chose point A as the nearest neighbour. (There was no masking applied that could interfere with the "nearest neighbour" choice.) What I noticed in the MET log file was the spatial dimensions of the forecast grid were rounded to 3 decimal places so the grid spacing was reported as 0.014 degrees instead of 0.0135 degrees.
I am not sure if MET's spatial precision really is to three decimal places but I did wonder if that might explain the discrepancy in finding the nearest neighbour. An observation 0.007 deg from point A would also be 0.007 deg from point B if the grid spacing were 0.014 deg. In this equidistant scenario picking point A as the nearest neighbour would be a legitimate option. Do you know if there is such a limit on spatial precision in MET and could it explain what I am seeing? Any help would be much appreciated. I can provide the forecast/obs data and MET config file if needed. Thanks, |
Beta Was this translation helpful? Give feedback.
Replies: 5 comments 6 replies
-
Hi Chris, An example of the forecast/obs data are your MET config file would be very helpful to look into this discrepancy. You can attach I have assigned this issue to @JohnHalleyGotway who will take a closer look at your data when it is available. |
Beta Was this translation helpful? Give feedback.
-
Hi Chris, I read through this discussion post and see that you've already done a lot of investigation on your end. I appreciate that! I'm glad to hear that there's a lot of agreement between the MET output and the existing BOM system. As to why there is a discrepancy on the nearest neighbor choice for some of the observations, particularly those close to the middle of a grid box, your proposed explanation sounds reasonable to me. If the BOM software uses a grid with delta_lat = delta_lon = 0.0135, and MET is using 0.014 that difference would likely manifest in the behavior you describe. That being said, no, MET does not have a limitation of 3 decimal places for the grid. However, data stored in GRIB1 files is limited in exactly this way I believe. In fact we ran into something very similar when working with grids from NOAA/NCEP. They have a set of pre-defined grids, see here, each of which is identified by an integer. In MET, these can be used by name, like "G212" means NOAA/NCEP grid number 212. However some of the grids on the website are defined using more than 3 decimals places. But in GRIB1, the grid parameters are stored as integers in thousandths of a degree... thus limiting them to 3 decimal places. This issue does not exist in GRIB version 2, to my knowledge. So please take a step back and look at how you're running Point-Stat. MET extracts the grid from the gridded forecast data you pass to it. How is that grid defined in that data file? If its helpful, feel free to send use a sample forecast data file. If it's small enough (i.e. 1 GRIB record), you can just tar/zip it and attach it to this discussion. Otherwise you can post to our anonymous FTP site that @georgemccabe mentioned. Also note that MET supports interpolation methods for UPPER_LEFT, UPPER_RIGHT, LOWER_RIGHT, and LOWER_LEFT to select the 4 corners of the grid box in which each observation point falls. While they don't solve this issue, they may be useful in your debugging. Thanks! |
Beta Was this translation helpful? Give feedback.
-
Hi John, I have also come across the GRIB1 problem you described. In this case though I am using NETCDF data - I should have mentioned that in my original post. I've also gone through my config file and trimmed out what I thought might be potentially problematic but I still get the same result. I am still quite new to MET/METplus though… Anyway I have attached the following files, zipped up, to see if you are able to shed some light on this one: I have also included a picture to try and explain what's happening. I ran METplus v4 like this: Do let me know if the attachments don't make it through or if anything is unclear. Best regards, |
Beta Was this translation helpful? Give feedback.
-
Chris, thanks for sending along your sample data. Before jumping into the details of this one point, I wanted to check to see how MET is interpreting the input grid. I did this by running the pcp_combine tool and checking the NetCDF output file. This call to pcp_combine just reads the air_temperature input data and writes the result out to another NetCDF file:
Here's the grid def from that file:
And I suspect it's that "delta_lon" setting of 0.013504, rather than 0.0135, that's causing the behavior you're seeing. That could cause the difference in choice between neighboring grid points. But the question is why. Here's the line where that it set: So it's just computed as the difference between the first and second values of longitude. And further down, the code checks to make sure those diffs remain constant over the grid. But I'm still not quite sure why we're getting 0.013504 instead of 0.0135. I suspect that if we can correct this, then the expected corner of the grid box will be chosen. |
Beta Was this translation helpful? Give feedback.
-
Chris, following up with some testing details. I ran the following commands to run pcp_combine and then manually reset the delta_lon value to "0.0135".
Testing using the original and modified versions of the gridded data does result in a different choice for nearest neighbor:
Modified inputs chooses the upper-right corner:
|
Beta Was this translation helpful? Give feedback.
Chris, following up with some testing details. I ran the following commands to run pcp_combine and then manually reset the delta_lon value to "0.0135".
Testing using the original and modified versions of the gridded data does result in a different choice for nearest neighbor:
Original inputs chooses the upper-left corner: