-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ADD: map_over_sweeps accessor and decorator #203
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #203 +/- ##
==========================================
+ Coverage 92.19% 92.22% +0.02%
==========================================
Files 24 24
Lines 4871 4888 +17
==========================================
+ Hits 4491 4508 +17
Misses 380 380
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@syedhamidali Thanks for being that fast. Yes, this is around the lines I was talking before. I've added a couple of comments and suggestion, which we can discuss.
@kmuehlbauer Thanks for these suggestions. Do these changes meet your expectations? I’ve run the methods on some data, and they’re working well. Let me know your thoughts! |
what about using we just use |
5 vs 16 characters? OK if we count @aladinor How would we handle this with |
that's a really good question. We need to make sure it only uses the "sweep" nodes! |
Yes, this is what we are doing (or want to do) in our boilerplate code. The user should not have to care about that, so we have to take responsibility for that. The only thing which remains, and this is already somehow mis-implemented in the current datatree-accessors, is that the original datatree is overwritten instead of returning a newly created tree. |
Hi @kmuehlbauer and @aladinor, Thanks for your suggestions and for pointing out the node issue, I hadn’t caught that earlier. I’ve made the necessary adjustments and would appreciate your feedback. |
@syedhamidali Not to be mistaken or misunderstood, It's good to see such excellent contributions coming in. What I mean't and with the possibilities xarray gives us, we might get some really neat functionality into xradar without doing that much and keeping our maintenance burden low. How about something like this (works with latest xarray-main and xradar- import xradar as xd
from open_radar_data import DATASETS
# Fetch the sample radar file
filename = DATASETS.fetch("sample_sgp_data.nc")
# Open the radar file into a DataTree object
dtree = xd.io.open_cfradial1_datatree(filename)
# define the xradar decorator to just work on sweep nodes
# and let other nodes out unchanged
# thats quick'ndirty, so the code might not be up to standards
from functools import wraps
def map_over_sweeps(func):
@wraps(func)
def wrapper_func(*args, **kwargs):
if "range" in args[0].dims:
return func(*args, **kwargs)
else:
return args[0]
return wrapper_func
# Define a function to calculate rain rate from reflectivity
# apply the decorators
@xr.map_over_datasets
@map_over_sweeps
def calculate_rain_rate(ds, ref_field='DBZH'):
def _rain_rate(dbz, a=200.0, b=1.6):
Z = 10.0 ** (dbz / 10.0)
return (Z / a) ** (1.0 / b)
ds = ds.assign(RAIN_RATE=_rain_rate(ds[ref_field]))
ds.RAIN_RATE.attrs = {'units': 'mm/h', 'long_name': 'Rain Rate'}
return ds
# Apply the function across all sweeps
dtree = dtree.pipe(calculate_rain_rate, ref_field='corrected_reflectivity_horizontal')
# Inspect the first sweep to see the added RAIN_RATE field
print(dtree['sweep_0'].data_vars) So finally the only function we have to maintain is the |
Thanks, @kmuehlbauer! You’re right—ds.pipe(func) and da.pipe(func) already cover what .apply does, so no need to add redundancy. |
Thanks, @aladinor! I agree that parameterizing map_over_datasets for sweeps is a cleaner approach. If we decide to create a wrapper for xradar, handling non-sweep nodes carefully will be key. Appreciate your thoughts! I gave it a try, something like this, but there are a lot of issues with that, maybe you can take a look def apply(self, func, *args, **kwargs):
radar = self.xarray_obj
# Apply the function only to the sweep nodes
dt_sweeps = radar.match("sweep_*")
modified_sweeps = dt_sweeps.map_over_datasets(func, *args, **kwargs)
# Rebuild the DataTree, including non-sweep nodes
new_tree = {"/": radar.ds}
new_tree.update({node.path: node.ds for node in radar.subtree if node.path != "/"})
new_tree.update({node.path: modified_sweeps[node.path].to_dataset() for node in modified_sweeps.subtree})
return xr.DataTree.from_dict(new_tree) AttributeError: Mutation of the DatasetView is not allowed, please use `.__setitem__` on the wrapping DataTree node, or use `dt.to_dataset()` if you want a mutable dataset. If calling this from within `map_over_datasets`,use `.copy()` first to get a mutable version of the input dataset.
Raised whilst mapping function over node with path /sweep_0 Now, tried to fix this error def apply(self, func, *args, **kwargs):
""" """
radar = self.xarray_obj # Get the DataTree object
# Match only the sweep nodes (e.g., sweep_0, sweep_1, etc.)
dt_sweeps = radar.match("sweep_*")
# Apply the function to all sweep datasets, ensuring each dataset is copied before modification
def apply_func(ds, *args, **kwargs):
return func(ds.copy(), *args, **kwargs) # Copy the dataset before applying the function
# Apply the function across the matched sweep nodes using map_over_datasets
modified_sweeps = dt_sweeps.map_over_datasets(apply_func, *args, **kwargs)
# Rebuild the DataTree, including non-sweep nodes (like root nodes) if necessary
tree_dict = {"/": radar.ds} # Add the root dataset ("/" path)
tree_dict.update({node.path: node.ds for node in radar.subtree if node.path != "/"}) # Add non-sweep nodes
tree_dict.update({node.path: modified_sweeps[node.path] for node in modified_sweeps.subtree})
# Return the updated DataTree
return xr.DataTree.from_dict(tree_dict) KeyError: 'Already a node object at path /sweep_0' |
Thanks again, @kmuehlbauer! I like the idea of using map_over_sweeps to focus on sweep nodes, keeping maintenance light. The decorator approach works well. What’s your take on encouraging users to rely on decorators for applying functions? Seems efficient, but open to your thoughts! This seems to work fine with the new xarray-datatree, without relying on decorators def map_over_sweeps(self, func, *args, **kwargs):
""" """
radar = self.xarray_obj
# Create a new tree dictionary
tree = {"/": radar.ds} # Start with the root Dataset
# Add all nodes except the root
tree.update({node.path: node.ds for node in radar.subtree if node.path != "/"})
# Apply the function to all sweep nodes and update the tree dictionary
tree.update(
{
node.path: func(radar[node.path].to_dataset(), *args, **kwargs)
for node in radar.match("sweep*").subtree
if node.path.startswith("/sweep")
}
)
# Return a new DataTree constructed from the modified tree dictionary
return xr.DataTree.from_dict(tree) |
OK, so from my perspective xarray is already encouraging the use of decorators via accessors. With the next xarray version the We could imagine following that path by introducing a |
@syedhamidali Your above example (#203 (comment)) would then look like: # define the xradar decorator to just work on sweep nodes
# and let other nodes out unchanged
# thats quick'ndirty, so the code might not be up to standards
import functools
import xarray as xr
def map_over_sweeps(func):
@functools.wraps(func)
def _func(*args, **kwargs):
# we do not have access to any names or other DataTree specifics
# so we need to rely on the layout of the given dataset
# maybe we can add something like `is_sweep` to check for that?
if "range" in args[0].dims:
return func(*args, **kwargs)
else:
return args
# we need another wrap here for the xarray decorator
@xr.map_over_datasets
def _map_over_sweeps(*args, **kwargs):
return _func(*args, **kwargs)
return _map_over_sweeps
# defined inside XarrayAccessor
def map_over_sweeps(self, func, *args, **kwargs):
""" """
# add our decorator for input function
@map_over_sweeps
def _func(*args, **kwargs):
return func(*args, **kwargs)
return self.xarray_obj.pipe(_func, *args, **kwargs) The user would then have two options, one via accessor and one via decorator. There are surely use cases for both. The good thing is, that both ways are implemented solely by our and the xarray decorator and we only use xarray machinery (beside our little "range"-hack). import xradar as xd
from open_radar_data import DATASETS
# Fetch the sample radar file
filename = DATASETS.fetch("sample_sgp_data.nc")
# Open the radar file into a DataTree object
dtree = xd.io.open_cfradial1_datatree(filename)
# without decorator
def calculate_rain_rate(ds, ref_field='DBZH'):
def _rain_rate(dbz, a=200.0, b=1.6):
Z = 10.0 ** (dbz / 10.0)
return (Z / a) ** (1.0 / b)
ds = ds.assign(RAIN_RATE=_rain_rate(ds[ref_field]))
ds.RAIN_RATE.attrs = {'units': 'mm/h', 'long_name': 'Rain Rate'}
return ds
# with decorator
@map_over_sweeps
def calculate_rain_rate2(ds, ref_field='DBZH'):
def _rain_rate(dbz, a=200.0, b=1.6):
Z = 10.0 ** (dbz / 10.0)
return (Z / a) ** (1.0 / b)
ds = ds.assign(RAIN_RATE=_rain_rate(ds[ref_field]))
ds.RAIN_RATE.attrs = {'units': 'mm/h', 'long_name': 'Rain Rate'}
return ds
# invocation via accessor
dtree2 = dtree.xradar.map_over_sweeps(calculate_rain_rate, ref_field='corrected_reflectivity_horizontal')
display(dtree2["sweep_0"])
# invocation via decorator + pipe
dtree3 = dtree.pipe(calculate_rain_rate2, ref_field='corrected_reflectivity_horizontal')
display(dtree3["sweep_0"]) |
@kmuehlbauer I've one quick question, should I also add the |
Yes, It would be great to have the decorator available directly from the module, instead importing from submodule. |
@kmuehlbauer, okay, I am on it, thanks! |
@kmuehlbauer It should work when you merge the nightly branch |
@syedhamidali Great! This looks really nice, thanks! The nightly run is green, so it works! Let's wait until the new xarray version is out, or we just merge it into the feature branch. Not sure, what's less invasive. |
@syedhamidali Unfortunately xarray has removed the decorator for map_over_datasets. So this will need some fixing. I'll try to fix it today 😬 |
@syedhamidali I've fixed the decorator issue, tweaked one test and fixed some docs/notebooks. Please have another look, if you are good with the PR. Then we should merge. |
Hi @kmuehlbauer, Thank you for handling the decorator issue and the additional tweaks! Everything looks great on my end after reviewing the changes, so I’m good with the PR. I appreciate your support with this and the updates to the docs/notebooks. Let me know if there’s anything else you’d like me to address. |
Great, then let's get this in! Thanks for sticking to this during the datatree conversion process. |
history.md
This PR introduces an apply method to the xradar accessor, allowing users to apply functions to DataTree, Dataset, and DataArray objects seamlessly. This enhancement follows the discussion on generalizing the approach using accessors and aims to provide a more consistent and intuitive API.
Applying to DataTree
Applying to Dataset
Applying to DataArray