Add overloads to get_axis_num #8547

Illviljan · 2023-12-12T21:23:41Z

Add overloads to .get_axis_num because you will get the same type out as you put in.

Seen in #8344.

Illviljan · 2023-12-12T21:43:00Z

I'm skeptical get_axis_num should be this public, especially in namedarray where there is no backwards compatibility to think about. Because one of xarray's selling points is to not have to deal with axis-likes. This is quite easy too:

a = xr.namedarray.core.NamedArray(data=np.array([[1, 2, 3], [4, 5, 6]]), dims=("x", "y"))
print(a.dims.index("y"))
1
print(a.get_axis_num("y"))
1

But that's a discussion for another PR.

max-sixty · 2023-12-13T00:11:31Z

print(a.dims.index("y"))
1

Good point! I guess the main advantage is the ability to pass tuples to get_axis_num. (Though we could alternatively inherit and override that). I like the idea of using more standard methods...

Though the python error message for .index is particularly bad:

da.dims.index('foo')
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[1], line 1
----> 1 da.dims.index('foo')

ValueError: tuple.index(x): x not in tuple

Illviljan · 2023-12-13T09:16:00Z

print(a.dims.index("y"))
1
Good point! I guess the main advantage is the ability to pass tuples to get_axis_num. (Though we could alternatively inherit and override that). I like the idea of using more standard methods...

When is it even required for a 3rd party user to use it? We of course have to use a helper like this internally, but when should a user need it?

Though the python error message for .index is particularly bad:

da.dims.index('foo')
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[1], line 1
----> 1 da.dims.index('foo')

ValueError: tuple.index(x): x not in tuple

I think it's fine? The value you tried to search for in the tuple wasn't there. Makes sense to me.
If we are going to hide it within our own public method I agree that it might be too generic.

dcherian · 2023-12-13T15:41:55Z

but when should a user need it?

Any time you need to get out of Xarray and pass an axis kwarg, but I agree that it's sugar for dims.index.

Illviljan · 2023-12-13T17:27:43Z

but when should a user need it?

Any time you need to get out of Xarray and pass an axis kwarg, but I agree that it's sugar for dims.index.

Is that what we want to encourage?

It feels out of scope to me for xarray to also handle cases outside xarray-land. We have apply_ufunc (although, quite advanced) that can handle a lot of these cases where we haven't implemented a particular function.

I mainly care about namedarray here. I think the method goes against the spirit of the package and don't think it should be one of the first methods to go public.

Here's some funny little examples:

# A typical Hashable: 
a = xr.namedarray.core.NamedArray(data=np.array([[1, 2, 3], [4, 5, 6]]), dims=("x", "y"))
a.dims.index("y")
1
a.get_axis_num("y")
1

# Another Hashable:
b = xr.namedarray.core.NamedArray(data=np.array([[1, 2, 3], [4, 5, 6]]), dims=(2023, None))
b.dims.index(None)
1
b.get_axis_num(None)
1

# And another Hashable:
c = xr.namedarray.core.NamedArray(data=np.array([[1, 2, 3], [4, 5, 6]]), dims=(("x", "y"), ("r",)))
c.dims.index(("r",))
1
c.get_axis_num(("r",)) # ValueError: 'r' not found in array dimensions (('x', 'y'), ('r',))

# And another Hashable:
d = xr.namedarray.core.NamedArray(data=np.array([[1, 2, 3], [4, 5, 6]]), dims=(b"x", b"y"))
d.dims.index(b"y")
1
d.get_axis_num(b"y") # ValueError: 121 not found in array dimensions (b'x', b'y')

max-sixty · 2023-12-13T18:04:48Z

ValueError: tuple.index(x): x not in tuple
I think it's fine? The value you tried to search for in the tuple wasn't there. Makes sense to me.
If we are going to hide it within our own public method I agree that it might be too generic.

I guess we disagree a bit there — the value it gives (x) isn't the input — it neither prints the actual input foo nor the possible values.

Maybe we make private? I generally agree we shouldn't encourage it. I do think allowing tuples + giving a better error message are useful, though.

Would also support inheriting from tuple and then just doing the da.dims.index('foo'). That would also let us do #8332 with DimsTuple(...) - ['foo', 'bar']

TomNicholas · 2023-12-13T22:25:40Z

It feels out of scope to me for xarray to also handle cases outside xarray-land. We have apply_ufunc (although, quite advanced) that can handle a lot of these cases where we haven't implemented a particular function.

80% of the scientists who use xarray regularly don't even know that apply_ufunc exists, let alone are able to use it (even I struggle to use it sometimes). It's extremely common for users to use xarray for a while (especially to start off analysis by opening netCDF-like files) then if they get stuck drop back into numpy to apply some more specific custom computation or manipulation.

Obviously we want users to be able to express everything they want to do in xarray as intuitively as possible, and we shouldn't encourage them to leave xarray, but I don't think we should actively make it harder for them to do so either.

To give a concrete example - many operations in scipy etc. don't quite have the correct API to immediately work with apply_ufunc, and it's often easier for the user to just do:

weird_analysis_func_expecting_numpy_arrays(da.data, axis=da.get_axis_num('x'))

I think this is fine. It's a bit of a special/weird method, but I don't think we should get rid of it.

Here's some funny little examples:

These are interesting, but to me represent issues with either (a) our handling of dimensions as hashables or (b) the current implementation of get_axis_num.

Illviljan · 2023-12-14T21:20:19Z

I think those issues also show that it can be better to just rely on python's own tuple methods instead. Here's how they could have done it instead in less code:

weird_analysis_func_expecting_numpy_arrays(da.data, axis=da.dims.index('x'))

I think this is fine. It's a bit of a special/weird method, but I don't think we should get rid of it.

I'm not really pushing to deprecate and remove. I just don't think it should be added to NamedArray as one of the first available, public methods. da.dims.index works better and is shorter for most axis cases.

TomNicholas · 2023-12-15T09:44:48Z

better to just rely on python's own tuple methods instead. Here's how they could have done it instead in less code:

Except that (as @max-sixty noted above) using .index won't work for applying the function over more than one dimension, i.e.

weird_analysis_func_expecting_numpy_arrays(da.data, axis=da.dims.index(['x', 'y']))

ValueError: tuple.index(x): x not in tuple

We could override .dims to return an object which implements a special version of .index, but at that point you still have a dedicated method for solving this problem, you're just debating the syntax for it.

headtr1ck · 2023-12-15T16:03:17Z

Should the input also be typed as str | Iterable[Hashable] to align with many other methods?

Illviljan · 2023-12-15T20:06:48Z

using .index won't work for applying the function over more than one dimension,

Yup, the (few?) cases you would need multiple axis it would look like:

axes = tuple(da.dims.index(v) for v in ('x', 'y'))
weird_analysis_func_expecting_numpy_arrays(da.data, axis=axes)

Should the input also be typed as str | Iterable[Hashable] to align with many other methods?

The issue is then the dim None or (2023, "b") wont pass mypy.

Things would have been so much more simple if dims was typed as tuple[str, ...] 😆

max-sixty · 2023-12-15T20:42:11Z

Possibly we have strayed too far from the original issue, but one quick comment

The issue is then the dim None or (2023, "b") wont pass mypy.

I think this is OK! For code that wants to be more robust, passing dim=[None] / dim=[(2023, "b")] can be required. At least, that's how I'm holding the move to str | Iterable[Hashable]

Illviljan · 2023-12-15T22:23:58Z

Possibly we have strayed too far from the original issue, but one quick comment

The issue is then the dim None or (2023, "b") wont pass mypy.

I think this is OK! For code that wants to be more robust, passing dim=[None] / dim=[(2023, "b")] can be required. At least, that's how I'm holding the move to str | Iterable[Hashable]

Hmm, I find it inconsistent and a constant source for typing difficulties.
For example, I don't like having to do a workaround for dim=var.dims[0] just to get mypy passing, it seem to be as canonical as you can get.
I'm warming up to something like Hashable | list[Hashable] to be easily able to discern between singular and plural dims,

max-sixty · 2023-12-15T23:34:41Z

I'm warming up to something like Hashable | list[Hashable] to be easily able to discern between singular and plural dims,

IIRC, the issue with this is that tuples don't work — ('x', 'y') will be a single dim — #6142 for discussion

* main: (153 commits) Add overloads to get_axis_num (pydata#8547) Fix CI: temporary pin pytest version to 7.4.* (pydata#8682) Bump the actions group with 1 update (pydata#8678) [namedarray] split `.set_dims()` into `.expand_dims()` and `broadcast_to()` (pydata#8380) Add chunk-friendly code path to `encode_cf_datetime` and `encode_cf_timedelta` (pydata#8575) Fix NetCDF4 C version detection (pydata#8675) groupby: Don't set `method` by default on flox>=0.9 (pydata#8657) Fix automatic broadcasting when wrapping array api class (pydata#8669) Fix unstack method when wrapping array api class (pydata#8668) Fix `variables` arg typo in `Dataset.sortby()` docstring (pydata#8670) dt.weekday_name - removal of function (pydata#8664) Add `dev` dependencies to `pyproject.toml` (pydata#8661) CI: Pin scientific-python/upload-nightly-action to release sha (pydata#8662) Update HOW_TO_RELEASE.md by clarifying where RTD build can be found (pydata#8655) ruff: use extend-exclude (pydata#8649) new whats-new section (pydata#8652) xfail another test on windows (pydata#8648) use first element of residual in _nonpolyfit_1d (pydata#8647) whatsnew for v2024.01.1 implement `isnull` using `full_like` instead of `zeros_like` (pydata#7395) ...

Illviljan added 2 commits December 12, 2023 22:15

Add overload to get_axis_num

6231ae2

fix in namedarray

48caa38

Illviljan added the plan to merge Final call for comments label Dec 13, 2023

andersy005 added 2 commits January 29, 2024 11:28

Merge branch 'main' into typing_get_axis_num

64ea6cb

Merge branch 'main' into typing_get_axis_num

b070782

andersy005 enabled auto-merge (squash) January 29, 2024 23:23

andersy005 disabled auto-merge January 30, 2024 00:08

andersy005 merged commit 81f38f3 into pydata:main Jan 30, 2024
28 of 29 checks passed

andersy005 mentioned this pull request Jan 30, 2024

Simplify get_axis_num #8330

Closed

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add overloads to get_axis_num #8547

Add overloads to get_axis_num #8547

Illviljan commented Dec 12, 2023

Illviljan commented Dec 12, 2023 •

edited

Loading

max-sixty commented Dec 13, 2023

Illviljan commented Dec 13, 2023

dcherian commented Dec 13, 2023

Illviljan commented Dec 13, 2023

max-sixty commented Dec 13, 2023

TomNicholas commented Dec 13, 2023

Illviljan commented Dec 14, 2023 •

edited

Loading

TomNicholas commented Dec 15, 2023 •

edited

Loading

headtr1ck commented Dec 15, 2023 •

edited

Loading

Illviljan commented Dec 15, 2023 •

edited

Loading

max-sixty commented Dec 15, 2023

Illviljan commented Dec 15, 2023 •

edited

Loading

max-sixty commented Dec 15, 2023

Add overloads to get_axis_num #8547

Add overloads to get_axis_num #8547

Conversation

Illviljan commented Dec 12, 2023

Illviljan commented Dec 12, 2023 • edited Loading

max-sixty commented Dec 13, 2023

Illviljan commented Dec 13, 2023

dcherian commented Dec 13, 2023

Illviljan commented Dec 13, 2023

max-sixty commented Dec 13, 2023

TomNicholas commented Dec 13, 2023

Illviljan commented Dec 14, 2023 • edited Loading

TomNicholas commented Dec 15, 2023 • edited Loading

headtr1ck commented Dec 15, 2023 • edited Loading

Illviljan commented Dec 15, 2023 • edited Loading

max-sixty commented Dec 15, 2023

Illviljan commented Dec 15, 2023 • edited Loading

max-sixty commented Dec 15, 2023

Illviljan commented Dec 12, 2023 •

edited

Loading

Illviljan commented Dec 14, 2023 •

edited

Loading

TomNicholas commented Dec 15, 2023 •

edited

Loading

headtr1ck commented Dec 15, 2023 •

edited

Loading

Illviljan commented Dec 15, 2023 •

edited

Loading

Illviljan commented Dec 15, 2023 •

edited

Loading