-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding 'latitude' and 'longitude' to acceptable coordinate names #38
Conversation
All these additions do is to add 'latitude' and 'longitude' to the acceptable coordinate names in compliance with netCDF CF 1.6 conventions. It still supports the netCDF COARDS convention of 'lat' and 'lon'
I like this. I'm wondering if it would be within reason to generalize this even further, to allow the user to specify their own lat and lon coordinate names, e.g. if their data uses 'x' and 'y' or 'LONGITUDES' and 'LATITUDES' or any other variants. |
Thanks for the PR, that's a good point... We need to think carefully about this because @spencerahill 's comment can be further generalized to
'y' seems a bad idea because the coordinate value is latitude, but other options all seem reasonable. But too many aliases will cause confusion. Also, what if a grid object contains multiple valid variable names? Even worse for boundary variables:
You name it. I am happy to discuss how to handle this potential chaos... |
One way is to have a signature like
where |
It might be worthwhile to take a step back and look at what functionality other popular regridding tools offer in this area. The two I'm most familiar with are NCO and CDO, both are command line tools. My understanding is that both of these tools default to using coordinate information in the dataset (netCDF file). NCO seems to be the most flexible and works like this:
So xarray provides a lot of the necessary metadata to make the first two steps possible. One suggestion would be to first look at the grid variables and see if we can determine their coordinate variables, next, look for common names, finally, if we can't find a coordinate variable, raise an error. Of course, the signature of |
I suggest that we support by default a limited number of them but add the
option to override it with a user provided keyword
…On Fri, Oct 19, 2018 at 11:02 PM Joe Hamman ***@***.***> wrote:
It might be worthwhile to take a step back and look at what functionality
other popular regridding tools offer in this area. The two I'm most
familiar with are NCO and CDO, both are command line tools. My
understanding is that both of these tools default to using coordinate
information in the dataset (netCDF file). NCO seems to be the most flexible
and works like this:
- if the grid dataset has variables with coordinates attributes, these
are used to define the grid
- if no variables have the coordinates attribute, then some basic
heuristics are used to determine where to find the coordinate information
- finally, these can all be overridden with command line options (e.g. cremap
-R "--rgr lat_nm=xq --rgr lon_nm=zj" -d dst.nc -O ~/rgr in.nc # Manual)
So xarray provides a lot of the necessary metadata to make the first two
steps possible. One suggestion would be to first look at the grid variables
and see if we can determine their coordinate variables, next, look for
common names, finally, if we can't find a coordinate variable, raise an
error. Of course, the signature of lon_name and lat_name should be
optional, probably defaulting to None.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#38 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AVFKt05r41xm-WYzvatab9_4a_YdIICJks5umpI4gaJpZM4Xwn5e>
.
|
raise ValueError | ||
except ValueError: | ||
print('Must have coordinates compliant with NETCDF COARDS or CF conventions') | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As written this doesn't function doesn't return anything. Needs something like
return lon_name, lat_name |
I like this.
Yes, I think so. Otherwise in order to use xESMF a user has to first rename their coordinate(s), then use xESMF, and then if their pipeline requires the original name down the line to rename them back to the original. |
Thanks @spencerahill for catching that. Like I said I suggest that you should support a limited number of named variables for latitude and longitude. Stick with the big conventions like COARDS and CF and add the option to override the automatic detection if the variable is provided. |
Added in kwargs for the regridder class so users could supply either ds_in or ds_out coordinate names. note still need to figure out exactly how to deal with the call __call__() for the xarray.DataArray regrid_dataarray class when user-supplied variables are found.
Adding the latitude and longitude names as class variables. This allows it to be passed to the regridder.regrid_dataarray without additional frustration
lat_name = 'lat_b' | ||
lon_name = 'lon_b' | ||
# NETCDF CF 1.6 complaint | ||
elif 'latitude_b' in ds.variables: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the COARD convention specify boundary names? I can't find anything here
CF convention uses 'lat_bnds' for boundaries but its size is (N, 2), not N+1 as expected by ESMF (#32 (comment)).
The current 'lat_b' (including the proposed 'latitude_b') does not follow existing convention, but is rather my arbitrary choice. I don't particularly like this choice but switching to 'lat_bnds' is even more confusing due to the above size issue. So we can't really say we are "looking for CF-compliant names" here.
Also a typo: "complaint" -> "compliant"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lat_bnds is for regional boundary conditions. It isn't for the same thing as edge or corners of the grid cell.
lat_out=None, | ||
lon_out=None, | ||
lat_b_out=None, | ||
lon_b_out=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am concerned about adding 8 more arguments to the function signature. Looks fairly complicated to a new user. Also, lat_in
can be easily misinterpreted as latitude values, instead of the variable name in ds_in
(before a user looks at the docstring).
How about consolidating them to a single name_dict
argument, just like for xarray.Dataset.rename
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like that idea.
I can see this point. A simple way would be adding a Or a global configuration capability like
Or even as context manager:
So people can set any coordinate names they are accustomed to. Would this be more intuitive & convenient from a user's perspective? This also avoids complicating the major API, considering that the majority of users should be OK with the default settings. |
This might be over-engineering, but if a user really wants to mix multiple names, a list of candidate names can also be possible:
I would prefer to let users explicitly code up the rules they want, rather than to have some implicit heuristics for them. The later might lead to tricky conditions that are hard to explain & debug. |
@JiaweiZhuang those are all cool ideas. But I do wonder if all but the But ultimately whatever you decide is fine. My final 2 cents on this issue is just that whatever is implemented needs tests...there could be a fair number of tricky corner cases. |
@JiaweiZhuang I like the idea of being able to pass a configure dictionary. Do you mean to be able to pass multiple keys for it to search through to find in the configure? If so I like the idea. This way multiple keywords could be loaded at once. It could be very advantageous if you open multiple datasets with different definitions of variables. |
How about searching for lon and lat variables also by checking the standard_name and units attributes? |
86e1e45
to
0a3d391
Compare
metpy has a |
@bbakernoaa and @JiaweiZhuang - I had a workaround that I now realize is very similar to this pull request (it is here: pochedls@7ec903d), though it doesn't deal with the bounds (and my get_axis_ids function is a little different). Is there any reason the pull request in this conversation can't be merged (after conflicts are resolved)? Let me know if I can help. |
Adding 'latitude' and 'longitude' to acceptable coordinate names to be compliant with netCDF CF 1.6 conventions. It adds a new method called get_latlon_names that checks if 'lat' or 'latitude' or netCDF COARDS convention or netCDF CF convention is in the xr.DataArray