Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for Array API in NamedArray #8344

Draft
wants to merge 427 commits into
base: main
Choose a base branch
from

Conversation

Illviljan
Copy link
Contributor

  • Closes #xxxx
  • Tests added
  • User visible changes (including notable bug fixes) are documented in whats-new.rst
  • New functions/methods are listed in api.rst

@github-actions github-actions bot added the topic-NamedArray Lightweight version of Variable label Oct 19, 2023
@Illviljan Illviljan changed the title Add mean to array_api Add mean to NamedArray._array_api Oct 19, 2023
@@ -144,3 +146,79 @@ def real(
xp = _get_data_namespace(x)
out = x._new(data=xp.real(x._data))
return out


# %% Statistical Functions
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Illviljan, i'm curious to hear your thoughts on adopting a similar structure/organization as that used in Numpy so as to it easier to incorporate new array API compliant functionality in the future. the new structure would group functionality under /namedarray/_array_api/*.py instead of of putting everything in _array_api.py.

e.g.

├── _creation_functions.py
├── _data_type_functions.py
├── _dtypes.py
├── _elementwise_functions.py
├── _indexing_functions.py
├── _manipulation_functions.py
├── _searching_functions.py
├── _set_functions.py
├── _sorting_functions.py
├── _statistical_functions.py
├── linalg.py

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure! I just thought it was too few functions to bother with it right now.

Cubed has similar structure too.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perfect! i intend to add a few more functions to the array_api and i'm happy to put this structure in place

@dcherian
Copy link
Contributor

dcherian commented Dec 7, 2023

@Illviljan can you comment on why this approach is useful if we can't advertise array_api compliance based on upstream input?

@Illviljan
Copy link
Contributor Author

I notice sklearn for example is slowly getting array api support and I think it would be a great win to get these kind of packages to just work out of the box.
I think it's lazy of us not to give it a serious attempt considering how much our maintainers have pushed for array standardization in the past.

There's two distinct programming styles between array api vs. xarray:

  • function-focused, xp.mean(x) - array api, requires axis but it is fine to add the optional kwarg dim and encourage that one over axis in the docs.
  • method-focused, x.mean() - xarray/namedarray, we can do anything so I think we should enforce using dim instead of axis.

Due to this, I'm not yet convinced it is impossible for us to not support both. Most stuff xarray do is simply an extension of the narrow array api. Internally we have to deal with array apis and the axis parameter anyway so we will have most tools available to easily make these functions.

namedarray imports duck_array_ops.py, don't you think it's very similar to _array_api.py?

from xarray.core import duck_array_ops

@Illviljan
Copy link
Contributor Author

Illviljan commented Dec 7, 2023

This particular PR has simply grown because of the lack of typing in .reduce.
It was supposed to simply just add a mean function in _array_api.py but then you go down the typing rabbit hole...

@dcherian
Copy link
Contributor

dcherian commented Dec 8, 2023

I'm not yet convinced it is impossible for us to not support both.

Sure, but the way to proceed would be to build consensus around this in an issue discussion or at the bi-weekly meeting.

requires axis but it is fine to add the optional kwarg dim and encourage that one over axis in the docs.

For example, this assertion has had very little support within the core team in discussions.

@Illviljan
Copy link
Contributor Author

I'm not yet convinced it is impossible for us to not support both.

Sure, but the way to proceed would be to build consensus around this in an issue discussion or at the bi-weekly meeting.

It's hard to change peoples mind sometimes, so I like to have some working examples for discussion.
I don't feel like these array api focused PRs improving on a non-public module is disturbing anyone.

requires axis but it is fine to add the optional kwarg dim and encourage that one over axis in the docs.

For example, this assertion has had very little support within the core team in discussions.

Still, there is duck_array_ops.py that does basically the same thing.

It is the only way to be compliant to both ways and xarray in general seem to enjoy the method-chaining style more. So adding array_api compatible equivalents wont change current workflows much but will however allow considerable more workflows automatically. Being able to pass around attrs and dims might be enough for users.

@dcherian
Copy link
Contributor

It's hard to change peoples mind sometimes, so I like to have some working examples for discussion.

Don't you have enough already?

Still, there is duck_array_ops.py that does basically the same thing.

These ops are applied to the wrapped array not to an xarray object.

@Illviljan
Copy link
Contributor Author

Still, there is duck_array_ops.py that does basically the same thing.

These ops are applied to the wrapped array not to an xarray object.

Yes, but it doesn't look insurmountable to me for namedarray to replace duck_array_ops.py with the _array_api + some extra functions to get the same result and the widened possibilities.

Thank you for the input @dcherian. I won't merge this without approval.
I might continue testing stuff on this branch until I come to the same conclusion as you.

@dcherian
Copy link
Contributor

generally expanding duck_array_ops to use the array_api more widely would be a great improvement :) still not sure about handling namedarray at that level though

@Illviljan Illviljan mentioned this pull request Jan 17, 2024
1 task
xarray/core/variable.py Outdated Show resolved Hide resolved
Illviljan and others added 30 commits September 25, 2024 19:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic-NamedArray Lightweight version of Variable
Projects
Status: In progress
Development

Successfully merging this pull request may close these issues.

3 participants