Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding 'latitude' and 'longitude' to acceptable coordinate names #38

Closed
wants to merge 4 commits into from

Conversation

bbakernoaa
Copy link

Adding 'latitude' and 'longitude' to acceptable coordinate names to be compliant with netCDF CF 1.6 conventions. It adds a new method called get_latlon_names that checks if 'lat' or 'latitude' or netCDF COARDS convention or netCDF CF convention is in the xr.DataArray

All these additions do is to add 'latitude' and 'longitude' to the acceptable coordinate names in compliance with netCDF CF 1.6 conventions.  It still supports the netCDF COARDS convention of 'lat' and 'lon'
@spencerahill
Copy link

I like this. I'm wondering if it would be within reason to generalize this even further, to allow the user to specify their own lat and lon coordinate names, e.g. if their data uses 'x' and 'y' or 'LONGITUDES' and 'LATITUDES' or any other variants.

@JiaweiZhuang
Copy link
Owner

JiaweiZhuang commented Oct 20, 2018

Thanks for the PR, that's a good point...

We need to think carefully about this because @spencerahill 's comment can be further generalized to

  • 'lat'
  • 'latitude'
  • 'latitudes'
  • 'LAT'
  • 'LATITUDES'
  • 'y'
  • ...

'y' seems a bad idea because the coordinate value is latitude, but other options all seem reasonable. But too many aliases will cause confusion. Also, what if a grid object contains multiple valid variable names?

Even worse for boundary variables:

  • 'lat_b'
  • 'lat_bnds'
  • 'latitude_b'
  • 'latitude_bnds'
  • ...

You name it. I am happy to discuss how to handle this potential chaos...

@JiaweiZhuang
Copy link
Owner

One way is to have a signature like

Regridder(grid_in, grid_out, ..., lon_name = 'lon', lat_name = 'lat')

where lon_name can be overwritten by users. But is this really more convenient than simply renaming the variable?

@jhamman
Copy link

jhamman commented Oct 20, 2018

It might be worthwhile to take a step back and look at what functionality other popular regridding tools offer in this area. The two I'm most familiar with are NCO and CDO, both are command line tools. My understanding is that both of these tools default to using coordinate information in the dataset (netCDF file). NCO seems to be the most flexible and works like this:

  • if the grid dataset has variables with coordinates attributes, these are used to define the grid
  • if no variables have the coordinates attribute, then some basic heuristics are used to determine where to find the coordinate information
  • finally, these can all be overridden with command line options (e.g. ncremap -R "--rgr lat_nm=xq --rgr lon_nm=zj" -d dst.nc -O ~/rgr in.nc # Manual)

So xarray provides a lot of the necessary metadata to make the first two steps possible. One suggestion would be to first look at the grid variables and see if we can determine their coordinate variables, next, look for common names, finally, if we can't find a coordinate variable, raise an error. Of course, the signature of lon_name and lat_name should be optional, probably defaulting to None.

@bbakernoaa
Copy link
Author

bbakernoaa commented Oct 20, 2018 via email

raise ValueError
except ValueError:
print('Must have coordinates compliant with NETCDF COARDS or CF conventions')

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As written this doesn't function doesn't return anything. Needs something like

Suggested change
return lon_name, lat_name

@spencerahill
Copy link

So xarray provides a lot of the necessary metadata to make the first two steps possible. One suggestion would be to first look at the grid variables and see if we can determine their coordinate variables, next, look for common names, finally, if we can't find a coordinate variable, raise an error. Of course, the signature of lon_name and lat_name should be optional, probably defaulting to None.

I like this.

But is this really more convenient than simply renaming the variable?

Yes, I think so. Otherwise in order to use xESMF a user has to first rename their coordinate(s), then use xESMF, and then if their pipeline requires the original name down the line to rename them back to the original.

@bbakernoaa
Copy link
Author

Thanks @spencerahill for catching that.

Like I said I suggest that you should support a limited number of named variables for latitude and longitude. Stick with the big conventions like COARDS and CF and add the option to override the automatic detection if the variable is provided.

Added in kwargs for the regridder class so users could supply either ds_in or ds_out coordinate names.  

note still need to figure out exactly how to deal with the call __call__() for the xarray.DataArray regrid_dataarray class when user-supplied variables are found.
Adding the latitude and longitude names as class variables. This allows it to be passed to the regridder.regrid_dataarray without additional frustration
lat_name = 'lat_b'
lon_name = 'lon_b'
# NETCDF CF 1.6 complaint
elif 'latitude_b' in ds.variables:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the COARD convention specify boundary names? I can't find anything here

CF convention uses 'lat_bnds' for boundaries but its size is (N, 2), not N+1 as expected by ESMF (#32 (comment)).

The current 'lat_b' (including the proposed 'latitude_b') does not follow existing convention, but is rather my arbitrary choice. I don't particularly like this choice but switching to 'lat_bnds' is even more confusing due to the above size issue. So we can't really say we are "looking for CF-compliant names" here.

Also a typo: "complaint" -> "compliant"

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lat_bnds is for regional boundary conditions. It isn't for the same thing as edge or corners of the grid cell.

lat_out=None,
lon_out=None,
lat_b_out=None,
lon_b_out=None):
Copy link
Owner

@JiaweiZhuang JiaweiZhuang Oct 22, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am concerned about adding 8 more arguments to the function signature. Looks fairly complicated to a new user. Also, lat_in can be easily misinterpreted as latitude values, instead of the variable name in ds_in (before a user looks at the docstring).

How about consolidating them to a single name_dict argument, just like for xarray.Dataset.rename?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like that idea.

@JiaweiZhuang
Copy link
Owner

JiaweiZhuang commented Oct 22, 2018

Otherwise in order to use xESMF a user has to first rename their coordinate(s), then use xESMF, and then if their pipeline requires the original name down the line to rename them back to the original.

I can see this point.

A simple way would be adding a rename_dict argument to allow overwriting the default names. For example
xe.Regridder(..., rename_dict = {'lon': 'x', 'lat': 'y', 'lon_b': 'x_b', 'lat': 'y_b'})
This basically integrates xarray.Dataset.rename into the regridder API...

Or a global configuration capability like dask.config.set:

xe.config.set(rename_dict = {'lon': 'x', 'lat': 'y', 'lon_b': 'x_b', 'lat': 'y_b'})

Or even as context manager:

rename_dict1 = {'lon': 'x', 'lat': 'y', 'lon_b': 'x_b', 'lat': 'y_b'}
rename_dict2 = {'lon': 'longitude', 'lat': 'latitude', 
                'lon_b': 'longitude_b', 'lat'_b: 'latitude_b'}

with xe.config.set(rename_dict = rename_dict1):
    xe.Regridder(...)  # automatically use `rename_dict1`

with xe.config.set(rename_dict = rename_dict2):
    xe.Regridder(...)  # automatically use `rename_dict2`

So people can set any coordinate names they are accustomed to.

Would this be more intuitive & convenient from a user's perspective? This also avoids complicating the major API, considering that the majority of users should be OK with the default settings.

@JiaweiZhuang
Copy link
Owner

This might be over-engineering, but if a user really wants to mix multiple names, a list of candidate names can also be possible:

xe.config.set(rename_dict = {'lon': ['lon', 'longitude', ''x'], 
                             'lat': ['lat', 'latitude', 'y']})
#  the look-up priority is the order of names in the list: 'lon', then 'longitude', then 'x'

I would prefer to let users explicitly code up the rules they want, rather than to have some implicit heuristics for them. The later might lead to tricky conditions that are hard to explain & debug.

@spencerahill
Copy link

@JiaweiZhuang those are all cool ideas. But I do wonder if all but the name_dict argument are too much for now. After all, this may not even turn out to be a big use case.

But ultimately whatever you decide is fine. My final 2 cents on this issue is just that whatever is implemented needs tests...there could be a fair number of tricky corner cases.

@bbakernoaa
Copy link
Author

@JiaweiZhuang I like the idea of being able to pass a configure dictionary. Do you mean to be able to pass multiple keys for it to search through to find in the configure? If so I like the idea. This way multiple keywords could be loaded at once. It could be very advantageous if you open multiple datasets with different definitions of variables.

@stefraynaud
Copy link

How about searching for lon and lat variables also by checking the standard_name and units attributes?
For instance for longitudes, the standard_name must starts with longitude or longitude_at_.
Here are the conventions:
http://cfconventions.org/Data/cf-conventions/cf-conventions-1.6/build/cf-conventions.html#latitude-coordinate
http://cfconventions.org/Data/cf-conventions/cf-conventions-1.6/build/cf-conventions.html#longitude-coordinate

@ahuang11
Copy link

ahuang11 commented Oct 24, 2019

metpy has a parse_cf() function that labels coordinates' attrs with _metpy_axis and the corresponding axis https://unidata.github.io/MetPy/latest/_modules/metpy/xarray.html#MetPyDatasetAccessor.parse_cf, but that would mean having to make metpy a dependency (or check if it metpy is installed and if so, allow users to have this additional functionality)

pochedls added a commit to pochedls/xESMF that referenced this pull request Oct 25, 2019
@pochedls
Copy link

@bbakernoaa and @JiaweiZhuang - I had a workaround that I now realize is very similar to this pull request (it is here: pochedls@7ec903d), though it doesn't deal with the bounds (and my get_axis_ids function is a little different). Is there any reason the pull request in this conversation can't be merged (after conflicts are resolved)? Let me know if I can help.

@JiaweiZhuang
Copy link
Owner

@pochedls I summarized my concerns and proposals at #74. If you'd like to take a look at that it would be great!

@bbakernoaa bbakernoaa closed this Mar 22, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants