Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support DataTree in Xarray's top level functions #9106

Open
8 tasks
TomNicholas opened this issue Jun 12, 2024 · 3 comments
Open
8 tasks

Support DataTree in Xarray's top level functions #9106

TomNicholas opened this issue Jun 12, 2024 · 3 comments
Labels
enhancement topic-DataTree Related to the implementation of a DataTree class

Comments

@TomNicholas
Copy link
Member

TomNicholas commented Jun 12, 2024

Is your feature request related to a problem?

Sometimes you might want to map one of the xarray top-level functions (especially xr.concat or xr.merge) over DataTree objects.

Whilst this could potentially be done manually, we could also imagine generalizing top-level functions to handle this out of the box.

Describe the solution you'd like

For this to work

xr.concat([dt1, dt2], concat_dim='time')

returning a single DataTree, with xr.concat applied to sets of datasets in corresponding nodes.

Describe alternatives you've considered

We could instead not change xarray's top-level functions but still ensure that its relatively easy to achieve using map_over_subtree, i.e.

concat_datatrees = datatree.map_over_subtree(xr.concat)
dt_concatenated = concat_datatrees([dt1, dt2], dim='time')

This would still require generalizing map_over_subtree to understand iterables of DataTree objects though (see zarr-developers/VirtualiZarr#84 (comment)).

Finally we could just not support this at all, in which case the only way for users to concatenate contents of datatrees node-wise is via something like

ds_concatenated = xr.concat([mytree[node].ds for subtree in mytree], dim="time")

but called for every node in the tree.

Additional context

See zarr-developers/VirtualiZarr#84 (comment) for an example of wanting to do this in VirtualiZarr (cc @jonas-spaeth).

This was actually already something we partly discussed in the datatree design meeting (#8747), but I forgot what the conclusion was (do you remember @keewis @flamingbear @owenlittlejohns?).

Checklist

@TomNicholas
Copy link
Member Author

We also have to consider skipping dimensions that are only defined on some levels - see #9778 (comment)

@shoyer shoyer changed the title Automatically map top-level functions over DataTree objects? Map top-level functions over DataTree objects Nov 14, 2024
@shoyer shoyer changed the title Map top-level functions over DataTree objects Support DataTree in Xarray's top level functions Nov 14, 2024
@shoyer
Copy link
Member

shoyer commented Nov 14, 2024

apply_ufunc would also be a very high-leverage function to which to add DataTree support.

@shoyer
Copy link
Member

shoyer commented Nov 14, 2024

Let's make small issues for all of these to make these more discoverable?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement topic-DataTree Related to the implementation of a DataTree class
Projects
Development

No branches or pull requests

2 participants