-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add some CF support for lon and lat #73
Conversation
Codecov Report
@@ Coverage Diff @@
## master #73 +/- ##
=========================================
+ Coverage 95.76% 96.2% +0.44%
=========================================
Files 6 7 +1
Lines 260 343 +83
=========================================
+ Hits 249 330 +81
- Misses 11 13 +2
Continue to review full report at Codecov.
|
With the first commit, coordinates and bounds are automatically identified and renamed, if they respect CF conventions which are based on name, units and standard_name. With the second commit, output coordinate names are automatically restored. |
@stefraynaud Thanks for the PR! I had a long backlog and didn't get time to look at this -- sorry for the delay. I agree that some degree of decoding feature is useful. On the other hand I am also worried about putting complex logic into xesmf, especially tricky edges cases like: which variable actually got used if the dataset contains all three variables To that end I summarized a meta-issue at #74. Would you be able to build the CF-convention support on top of the proposed Also, will |
CF_SPECS = { | ||
'lon': { | ||
'name': ['lon', 'longitude'], | ||
'units': ['degrees_east', 'degree_east', 'degree_E', 'degrees_E', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Checking the units and other attributes seems a bit too heavy for xesmf? A "CF-checker" is better for a standalone package or pushed to upstream like xarray.decode_cf
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with you that a CF checker for xarray is globally missing.
However, xarray is intended to remain of general purpose, and not to introduce notions of geographic and physical spatialisation.
As for the fact that these attributes are checked is the strict application of the CF conventions for longitude and latitude.
This prevent focusing on names.
xesmf/cf.py
Outdated
ds: xarray.DataArray or xarray.Dataset | ||
''' | ||
# Suffixes for bounds | ||
bsuffixes = ('_b', '_bounds', '_bnds') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here assumes a priority of '_b' > '_bounds' > '_bnds'
. If more than one of them exist in the dataset, could it lead to a name conflict error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the main caveat of this PR because it rely on names and not on attributes.
Don't blame me : a solution is to push forward the application of CF conventions and to use the bounds
attribute as explained here. This makes unique link between coordinates and bounds.
Also note that renaming the coordinate name alone does not solve CF-convention -- CF uses a boundary shape of |
You are right, but this is beyond the scope of this PR which is focused on finding variables in datasets. By the way, a conversion from the |
It first look ate the "bounds" attribute, then search for names using usual prefixes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should try to avoid renaming coordinates.
I believe the search for coordinates might be too complex to be clearly communicated to users. I can imagine questions such as "why did it work on this file and not this one ?"
We should strive to reduce the potential confusion between CF bounds and grid corners.
} | ||
|
||
|
||
def get_coord_name_from_specs(ds, specs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is certainly a more extensive search than what #74 provides. The question is do we want this ? Also, from a user perspective, all the different ways coordinates can be identified might create some confusion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In fact, it is the strict application of the CF conventions, which must be considered to be the reference for formatting dataset, and which are increasingly being followed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See #94 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep. Checking with units first since they are mandatory. And then standard_name, just in case.
|
||
|
||
def get_bounds_name_from_coord(ds, coord_name, | ||
suffixes=['_b', '_bnds', '_bounds']): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this could get really confusing, because this function could either return netCDF bounds
or the grid corners (_b
). I suggest the code clearly distinguishes bounds from corners.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep probably. I made no difference in this PR regarding the bounds/corners problem.
return bounds_name | ||
warnings.warn('invalid bounds name: ' + bounds_name) | ||
|
||
# From suffixed names |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd remove the identification by suffix.
lon_name = get_lon_name(ds) | ||
if lon_name is not None: | ||
|
||
# Search for bounds and rename |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should not have to rename coordinates.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In fact, renaming is for internal use only, just to make the code clearer. It is possible to not rename :)
Fix lon/lat extraction for DataArrays
This PR adds some support for CF conventions to search for longitude and latitude in Datasets, and rename them to 'lon' and 'lat'. Bounds data arrays are also renamed if found.
A module named 'cf' was created for that and unitests were added.