Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ADD: map_over_sweeps accessor and decorator #203

Merged
merged 33 commits into from
Oct 28, 2024

Conversation

syedhamidali
Copy link
Contributor

@syedhamidali syedhamidali commented Aug 31, 2024

This PR introduces an apply method to the xradar accessor, allowing users to apply functions to DataTree, Dataset, and DataArray objects seamlessly. This enhancement follows the discussion on generalizing the approach using accessors and aims to provide a more consistent and intuitive API.

Applying to DataTree

import xradar as xd
from open_radar_data import DATASETS

# Fetch the sample radar file
filename = DATASETS.fetch("sample_sgp_data.nc")

# Open the radar file into a DataTree object
dtree = xd.io.open_cfradial1_datatree(filename)

# Define a function to calculate rain rate from reflectivity
def calculate_rain_rate(ds, ref_field='DBZH'):
    def _rain_rate(dbz, a=200.0, b=1.6):
        Z = 10.0 ** (dbz / 10.0)
        return (Z / a) ** (1.0 / b)

    ds['RAIN_RATE'] = _rain_rate(ds[ref_field])
    ds['RAIN_RATE'].attrs = {'units': 'mm/h', 'long_name': 'Rain Rate'}
    return ds

# Apply the function across all sweeps
dtree = dtree.xradar.apply(calculate_rain_rate, ref_field='corrected_reflectivity_horizontal')

# Inspect the first sweep to see the added RAIN_RATE field
print(dtree['sweep_0'].data_vars)
Data variables:
    corrected_reflectivity_horizontal  (azimuth, range) float32 1MB -5.617 .....
    reflectivity_horizontal            (azimuth, range) float32 1MB ...
    RAIN_RATE                          (azimuth, range) float32 1MB 0.01625 ....

Applying to Dataset

# Extract a single sweep as a Dataset
ds = dtree['sweep_0'].to_dataset()

# Apply the function to the Dataset using the xradar accessor
ds = ds.xradar.apply(calculate_rain_rate, ref_field='corrected_reflectivity_horizontal')

# Inspect the Dataset to see the new RAIN_RATE field
print(ds['RAIN_RATE'])
<xarray.DataArray 'RAIN_RATE' (azimuth: 400, range: 667)> Size: 1MB
array([[0.01624733, 0.04791909, 0.00855976, ..., 0.02479567, 0.02921897,
               nan],
       [0.01765691, 0.05320087, 0.00698366, ..., 0.01000767,        nan,
               nan]], dtype=float32)
Coordinates:
    time       (azimuth) datetime64[ns] 3kB 2011-05-20T06:42:11.039436300 ......
  * range      (range) float64 5kB 0.0 60.0 120.0 ... 3.99e+04 3.996e+04
  * azimuth    (azimuth) float64 3kB 0.8281 1.719 2.594 ... 358.1 359.0 360.0
    elevation  (azimuth) float64 3kB ...
    latitude   float64 8B ...
    longitude  float64 8B ...
    altitude   float64 8B ...
Attributes:
    units:      mm/h
    long_name:  Rain Rate

Applying to DataArray

# Extract the reflectivity field as a DataArray
da = ds['corrected_reflectivity_horizontal']

# Define a function to add a constant value to the DataArray
def add_constant(da, constant=10):
    return da + constant

# Apply the function using the xradar accessor
da_modified = da.xradar.apply(add_constant, constant=5)

# Inspect the modified DataArray
print(da_modified)
<xarray.DataArray 'corrected_reflectivity_horizontal' (azimuth: 400, range: 667)> Size: 1MB
array([[ -0.6171875,   6.8984375,  -5.0703125, ...,   2.3203125,
          3.4609375,         nan],
       [ -0.671875 ,   7.28125  ,  -3.1171875, ...,         nan,
         -8.4765625,         nan]], dtype=float32)
Coordinates:
    time       (azimuth) datetime64[ns] 3kB 2011-05-20T06:42:11.039436300 ......
  * range      (range) float64 5kB 0.0 60.0 120.0 ... 3.99e+04 3.996e+04
  * azimuth    (azimuth) float64 3kB 0.8281 1.719 2.594 ... 358.1 359.0 360.0
    elevation  (azimuth) float64 3kB ...
    latitude   float64 8B ...
    longitude  float64 8B ...
    altitude   float64 8B ...

Copy link

codecov bot commented Aug 31, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 92.22%. Comparing base (617bd34) to head (d5a1c18).
Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #203      +/-   ##
==========================================
+ Coverage   92.19%   92.22%   +0.02%     
==========================================
  Files          24       24              
  Lines        4871     4888      +17     
==========================================
+ Hits         4491     4508      +17     
  Misses        380      380              
Flag Coverage Δ
notebooktests 78.70% <100.00%> (+0.07%) ⬆️
unittests 90.60% <100.00%> (+0.03%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@syedhamidali syedhamidali added the enhancement New feature or request label Aug 31, 2024
@syedhamidali syedhamidali self-assigned this Aug 31, 2024
xradar/util.py Outdated Show resolved Hide resolved
xradar/accessors.py Outdated Show resolved Hide resolved
xradar/accessors.py Outdated Show resolved Hide resolved
xradar/util.py Outdated Show resolved Hide resolved
xradar/accessors.py Outdated Show resolved Hide resolved
xradar/accessors.py Outdated Show resolved Hide resolved
xradar/accessors.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@kmuehlbauer kmuehlbauer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@syedhamidali Thanks for being that fast. Yes, this is around the lines I was talking before. I've added a couple of comments and suggestion, which we can discuss.

@syedhamidali
Copy link
Contributor Author

@kmuehlbauer Thanks for these suggestions. Do these changes meet your expectations? I’ve run the methods on some data, and they’re working well. Let me know your thoughts!

docs/history.md Outdated Show resolved Hide resolved
xradar/accessors.py Outdated Show resolved Hide resolved
xradar/accessors.py Outdated Show resolved Hide resolved
@aladinor
Copy link
Member

aladinor commented Sep 1, 2024

what about using we just use map_over_subtree functionality already implemented in xarray.datatree?

@kmuehlbauer
Copy link
Collaborator

what about using we just use map_over_subtree functionality already implemented in xarray.datatree?

5 vs 16 characters? OK if we count xradar.apply it's still 12 vs 16. 😁

@aladinor How would we handle this with map_over_subtree if we only want to apply this on sweep-nodes?

@aladinor
Copy link
Member

aladinor commented Sep 1, 2024

that's a really good question. We need to make sure it only uses the "sweep" nodes!

@kmuehlbauer
Copy link
Collaborator

We need to make sure it only uses the "sweep" nodes!

Yes, this is what we are doing (or want to do) in our boilerplate code. The user should not have to care about that, so we have to take responsibility for that. The only thing which remains, and this is already somehow mis-implemented in the current datatree-accessors, is that the original datatree is overwritten instead of returning a newly created tree.

@syedhamidali
Copy link
Contributor Author

Hi @kmuehlbauer and @aladinor,

Thanks for your suggestions and for pointing out the node issue, I hadn’t caught that earlier. I’ve made the necessary adjustments and would appreciate your feedback.

@kmuehlbauer
Copy link
Collaborator

@syedhamidali Not to be mistaken or misunderstood, It's good to see such excellent contributions coming in.

What I mean't and with the possibilities xarray gives us, we might get some really neat functionality into xradar without doing that much and keeping our maintenance burden low.

How about something like this (works with latest xarray-main and xradar-xarray-nightly branch :

import xradar as xd
from open_radar_data import DATASETS

# Fetch the sample radar file
filename = DATASETS.fetch("sample_sgp_data.nc")

# Open the radar file into a DataTree object
dtree = xd.io.open_cfradial1_datatree(filename)


# define the xradar decorator to just work on sweep nodes 
# and let other nodes out unchanged
# thats quick'ndirty, so the code might not be up to standards
from functools import wraps

def map_over_sweeps(func):

    @wraps(func)
    def wrapper_func(*args, **kwargs):
        if "range" in args[0].dims:
            return func(*args, **kwargs)
        else:
            return args[0]
    return wrapper_func


# Define a function to calculate rain rate from reflectivity
# apply the decorators

@xr.map_over_datasets
@map_over_sweeps
def calculate_rain_rate(ds, ref_field='DBZH'):
    def _rain_rate(dbz, a=200.0, b=1.6):
        Z = 10.0 ** (dbz / 10.0)
        return (Z / a) ** (1.0 / b)

    ds = ds.assign(RAIN_RATE=_rain_rate(ds[ref_field]))
    ds.RAIN_RATE.attrs = {'units': 'mm/h', 'long_name': 'Rain Rate'}
    return ds

# Apply the function across all sweeps
dtree = dtree.pipe(calculate_rain_rate, ref_field='corrected_reflectivity_horizontal')

# Inspect the first sweep to see the added RAIN_RATE field
print(dtree['sweep_0'].data_vars)

So finally the only function we have to maintain is the map_over_sweeps-decorator. I think it should be possible to even hide the xr.map_over_datasets-decorator, doing the coding of the wrapper functions the right way.

@syedhamidali
Copy link
Contributor Author

@syedhamidali I think this looks good, but some thoughts of concern:

  • dtree.xradar.apply(func) - this is really neat, as it works only on sweep nodes
  • ds.xradar.apply(func) - isn't this more or less the same as ds.pipe(func)? I do not really see any additional benefit here
  • da.xradar.apply(func) - same as above

I do not think we should add another function which does actually the same as .pipe.

Also I'm not sure if .apply is the best naming in this context. Over in xarray map_over_subtree was changed in map_over_datasets for ruling out any misunderstandings. Maybe map_over_datasets can be parameterized like name="sweep" where "sweep" is a string which the node-name should contain. Then we could just let xarray do the hard work.

Pinging also @aladinor and @mgrover1 for input.

Thanks, @kmuehlbauer! You’re right—ds.pipe(func) and da.pipe(func) already cover what .apply does, so no need to add redundancy.

@syedhamidali
Copy link
Contributor Author

syedhamidali commented Oct 16, 2024

Thanks, @aladinor! I agree that parameterizing map_over_datasets for sweeps is a cleaner approach. If we decide to create a wrapper for xradar, handling non-sweep nodes carefully will be key. Appreciate your thoughts!

I gave it a try, something like this, but there are a lot of issues with that, maybe you can take a look

def apply(self, func, *args, **kwargs):
    radar = self.xarray_obj

    # Apply the function only to the sweep nodes
    dt_sweeps = radar.match("sweep_*")
    modified_sweeps = dt_sweeps.map_over_datasets(func, *args, **kwargs)

    # Rebuild the DataTree, including non-sweep nodes
    new_tree = {"/": radar.ds}
    new_tree.update({node.path: node.ds for node in radar.subtree if node.path != "/"})
    new_tree.update({node.path: modified_sweeps[node.path].to_dataset() for node in modified_sweeps.subtree})

    return xr.DataTree.from_dict(new_tree)
AttributeError: Mutation of the DatasetView is not allowed, please use `.__setitem__` on the wrapping DataTree node, or use `dt.to_dataset()` if you want a mutable dataset. If calling this from within `map_over_datasets`,use `.copy()` first to get a mutable version of the input dataset.
Raised whilst mapping function over node with path /sweep_0

Now, tried to fix this error

def apply(self, func, *args, **kwargs):
    """    """

    radar = self.xarray_obj  # Get the DataTree object

    # Match only the sweep nodes (e.g., sweep_0, sweep_1, etc.)
    dt_sweeps = radar.match("sweep_*")

    # Apply the function to all sweep datasets, ensuring each dataset is copied before modification
    def apply_func(ds, *args, **kwargs):
        return func(ds.copy(), *args, **kwargs)  # Copy the dataset before applying the function

    # Apply the function across the matched sweep nodes using map_over_datasets
    modified_sweeps = dt_sweeps.map_over_datasets(apply_func, *args, **kwargs)

    # Rebuild the DataTree, including non-sweep nodes (like root nodes) if necessary
    tree_dict = {"/": radar.ds}  # Add the root dataset ("/" path)
    tree_dict.update({node.path: node.ds for node in radar.subtree if node.path != "/"})  # Add non-sweep nodes
    tree_dict.update({node.path: modified_sweeps[node.path] for node in modified_sweeps.subtree})

    # Return the updated DataTree
    return xr.DataTree.from_dict(tree_dict)
KeyError: 'Already a node object at path /sweep_0'

@syedhamidali
Copy link
Contributor Author

syedhamidali commented Oct 16, 2024

Thanks again, @kmuehlbauer! I like the idea of using map_over_sweeps to focus on sweep nodes, keeping maintenance light. The decorator approach works well. What’s your take on encouraging users to rely on decorators for applying functions? Seems efficient, but open to your thoughts!

This seems to work fine with the new xarray-datatree, without relying on decorators

def map_over_sweeps(self, func, *args, **kwargs):
    """    """
    radar = self.xarray_obj

    # Create a new tree dictionary
    tree = {"/": radar.ds}  # Start with the root Dataset

    # Add all nodes except the root
    tree.update({node.path: node.ds for node in radar.subtree if node.path != "/"})

    # Apply the function to all sweep nodes and update the tree dictionary
    tree.update(
        {
            node.path: func(radar[node.path].to_dataset(), *args, **kwargs)
            for node in radar.match("sweep*").subtree
            if node.path.startswith("/sweep")
        }
    )

    # Return a new DataTree constructed from the modified tree dictionary
    return xr.DataTree.from_dict(tree)

@kmuehlbauer
Copy link
Collaborator

The decorator approach works well. What’s your take on encouraging users to rely on decorators for applying functions? Seems efficient, but open to your thoughts!

OK, so from my perspective xarray is already encouraging the use of decorators via accessors. With the next xarray version the map_over_datasets-decorator is first introduced.

We could imagine following that path by introducing a map_over_sweeps-decorator. And I agree that such map_over_sweeps-decorator should be backed by a dtree.xradar.map_over_sweeps accessor based functionality as you suggested above. I'd try to align these two features as much as possible. So I'd try to re-use the decorator in the accessor-function. That means, instead of reimplementing the distribution of func to the sweeps and rebuilding the tree by ourselves I'd rely on xarray machinery and just do the selection of the sweeps inside the decorator. Does that make sense? BTW, xarray does the same with the implementation of map_over_datasetsfunction.

@kmuehlbauer
Copy link
Collaborator

@syedhamidali Your above example (#203 (comment)) would then look like:

# define the xradar decorator to just work on sweep nodes 
# and let other nodes out unchanged
# thats quick'ndirty, so the code might not be up to standards
import functools
import xarray as xr

def map_over_sweeps(func):
    
    @functools.wraps(func)
    def _func(*args, **kwargs):
        # we do not have access to any names or other DataTree specifics
        # so we need to rely on the layout of the given dataset
        # maybe we can add something like `is_sweep` to check for that?
        if "range" in args[0].dims:
            return func(*args, **kwargs)
        else:
            return args

    # we need another wrap here for the xarray decorator
    @xr.map_over_datasets
    def _map_over_sweeps(*args, **kwargs):
        return _func(*args, **kwargs)
            
    return _map_over_sweeps

# defined inside XarrayAccessor
def map_over_sweeps(self, func, *args, **kwargs):
    """    """
    # add our decorator for input function
    @map_over_sweeps
    def _func(*args, **kwargs):
        return func(*args, **kwargs)
    
    return self.xarray_obj.pipe(_func, *args, **kwargs)

The user would then have two options, one via accessor and one via decorator. There are surely use cases for both. The good thing is, that both ways are implemented solely by our and the xarray decorator and we only use xarray machinery (beside our little "range"-hack).

import xradar as xd
from open_radar_data import DATASETS

# Fetch the sample radar file
filename = DATASETS.fetch("sample_sgp_data.nc")

# Open the radar file into a DataTree object
dtree = xd.io.open_cfradial1_datatree(filename)

# without decorator
def calculate_rain_rate(ds, ref_field='DBZH'):
    def _rain_rate(dbz, a=200.0, b=1.6):
        Z = 10.0 ** (dbz / 10.0)
        return (Z / a) ** (1.0 / b)

    ds = ds.assign(RAIN_RATE=_rain_rate(ds[ref_field]))
    ds.RAIN_RATE.attrs = {'units': 'mm/h', 'long_name': 'Rain Rate'}
    return ds

# with decorator
@map_over_sweeps
def calculate_rain_rate2(ds, ref_field='DBZH'):
    def _rain_rate(dbz, a=200.0, b=1.6):
        Z = 10.0 ** (dbz / 10.0)
        return (Z / a) ** (1.0 / b)

    ds = ds.assign(RAIN_RATE=_rain_rate(ds[ref_field]))
    ds.RAIN_RATE.attrs = {'units': 'mm/h', 'long_name': 'Rain Rate'}
    return ds

# invocation via accessor
dtree2 = dtree.xradar.map_over_sweeps(calculate_rain_rate, ref_field='corrected_reflectivity_horizontal')
display(dtree2["sweep_0"])

# invocation via decorator + pipe
dtree3 = dtree.pipe(calculate_rain_rate2, ref_field='corrected_reflectivity_horizontal')
display(dtree3["sweep_0"])

@kmuehlbauer kmuehlbauer mentioned this pull request Oct 17, 2024
@syedhamidali
Copy link
Contributor Author

syedhamidali commented Oct 17, 2024

@kmuehlbauer I've one quick question, should I also add the from .accessors import map_over_sweeps # noqa to the xradar/__init__.py so that we can call it @xd.map_over_sweeps?

@kmuehlbauer
Copy link
Collaborator

@syedhamidali

I've one quick question, should I also add the from .accessors import map_over_sweeps # noqa to the xradar/__init__.py so that we can call it @xd.map_over_sweeps?

Yes, It would be great to have the decorator available directly from the module, instead importing from submodule.

@syedhamidali
Copy link
Contributor Author

@kmuehlbauer, okay, I am on it, thanks!

@syedhamidali
Copy link
Contributor Author

@kmuehlbauer It should work when you merge the nightly branch

@kmuehlbauer
Copy link
Collaborator

@syedhamidali Great! This looks really nice, thanks! The nightly run is green, so it works! Let's wait until the new xarray version is out, or we just merge it into the feature branch. Not sure, what's less invasive.

@kmuehlbauer kmuehlbauer added this to the 0.8.0 milestone Oct 28, 2024
@kmuehlbauer
Copy link
Collaborator

@syedhamidali Unfortunately xarray has removed the decorator for map_over_datasets. So this will need some fixing. I'll try to fix it today 😬

@kmuehlbauer kmuehlbauer changed the title ADD: Apply accessor ADD: map_over_sweeps accessor and decorator Oct 28, 2024
@kmuehlbauer
Copy link
Collaborator

@syedhamidali I've fixed the decorator issue, tweaked one test and fixed some docs/notebooks.

Please have another look, if you are good with the PR. Then we should merge.

@kmuehlbauer kmuehlbauer mentioned this pull request Oct 28, 2024
2 tasks
@syedhamidali
Copy link
Contributor Author

Hi @kmuehlbauer, Thank you for handling the decorator issue and the additional tweaks! Everything looks great on my end after reviewing the changes, so I’m good with the PR. I appreciate your support with this and the updates to the docs/notebooks. Let me know if there’s anything else you’d like me to address.
Thanks again!

@kmuehlbauer
Copy link
Collaborator

Great, then let's get this in! Thanks for sticking to this during the datatree conversion process.

xradar/accessors.py Outdated Show resolved Hide resolved
xradar/accessors.py Outdated Show resolved Hide resolved
@kmuehlbauer kmuehlbauer merged commit 1336b60 into openradar:main Oct 28, 2024
12 checks passed
@syedhamidali syedhamidali deleted the sweeps_accessor branch October 30, 2024 15:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Development

Successfully merging this pull request may close these issues.

4 participants