Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Grouper, Resampler as public api #8840

Merged
merged 32 commits into from
Jul 18, 2024
Merged

Conversation

dcherian
Copy link
Contributor

@dcherian dcherian commented Mar 15, 2024

  1. Expose UniqueGrouper, BinGrouper and TimeResampler as public API under xarray.groupers
  2. Allow providing these objects to .resample and .groupby
  3. This is a step toward grouping by multiple variables, grouping by a dask array, and unblocking "chunking to a frequency" (Support specifying chunk sizes using labels (e.g. frequency string) #7559)

We do not yet officially support custom Grouper or Resampler objects. I'd like to clear up the API a bit before exposing that.


xarray/core/dataarray.py Outdated Show resolved Hide resolved
xarray/tests/test_groupby.py Outdated Show resolved Hide resolved
xarray/core/groupers.py Outdated Show resolved Hide resolved
xarray/core/types.py Outdated Show resolved Hide resolved
@dcherian dcherian marked this pull request as ready for review June 21, 2024 22:35
@dcherian
Copy link
Contributor Author

This is finally ready for review! Please take a look at the API and docs.

* main:
  promote floating-point numeric datetimes to 64-bit before decoding (pydata#9182)
  also pin `numpy` in the all-but-dask CI (pydata#9184)
  temporarily remove `pydap` from CI (pydata#9183)
  temporarily pin `numpy<2` (pydata#9181)
  Change np.core.defchararray to np.char (pydata#9165) (pydata#9166)
  Fix example code formatting for CachingFileManager (pydata#9178)
  Slightly improve DataTree repr (pydata#9064)
  switch to unit `"D"` (pydata#9170)
  Docs: Add page with figure for navigating help resources (pydata#9147)
  Add test for pydata#9155 (pydata#9161)
  Remove mypy exclusions for a couple more libraries (pydata#9160)
  Include numbagg in type checks (pydata#9159)
  Improve zarr chunks docs (pydata#9140)
@dcherian dcherian closed this Jun 30, 2024
@dcherian dcherian reopened this Jun 30, 2024
xarray/core/groupers.py Outdated Show resolved Hide resolved
xarray/core/groupers.py Outdated Show resolved Hide resolved
xarray/core/groupers.py Outdated Show resolved Hide resolved
xarray/core/dataset.py Outdated Show resolved Hide resolved
xarray/core/groupers.py Outdated Show resolved Hide resolved
xarray/core/groupby.py Show resolved Hide resolved
@dcherian
Copy link
Contributor Author

dcherian commented Jul 11, 2024

Thanks @keewis and @Illviljan

#9233 completes the deprecation cycles for base and loffset. I'll merge after this one to avoid dealing with a bunch of conflicts.

The mypy failures appear unrelated now.

@Illviljan
Copy link
Contributor

These looks related, same problem as before I guess:

xarray/core/dataset.py: note: In member "groupby" of class "Dataset":
xarray/core/dataset.py:10321: error: Incompatible types in assignment (expression has type "Mapping[Hashable, Any]", variable has type "dict[str, Grouper]")  [assignment]
xarray/core/dataarray.py: note: In member "groupby" of class "DataArray":
xarray/core/dataarray.py:6778: error: Incompatible types in assignment (expression has type "Mapping[Hashable, Any]", variable has type "dict[str, Grouper]")  [assignment]

* main:
  exclude the bots from the release notes (pydata#9235)
  switch the documentation to run with `numpy>=2` (pydata#9177)
  `numpy` 2 compatibility in the iris code paths (pydata#9156)
  `numpy` 2 compatibility in the `netcdf4` and `h5netcdf` backends (pydata#9136)
  Fix time indexing regression in `convert_calendar` (pydata#9192)
  Use duckarray assertions in test_coding_times (pydata#9226)
  Use reshape and ravel from duck_array_ops in coding/times.py (pydata#9225)
  Cleanup test_coding_times.py (pydata#9223)
  Only use necessary dims when creating temporary dataarray (pydata#9206)
  Fix two bugs in DataTree.update() (pydata#9214)
  Use numpy 2.0-compat `np.complex64` dtype in test (pydata#9217)
@dcherian
Copy link
Contributor Author

I ended up using type: ignore after struggling for a while...

@dcherian dcherian added plan to merge Final call for comments and removed needs review labels Jul 17, 2024
* main:
  Enable pandas type checking (pydata#9213)
  Per-variable specification of boolean parameters in open_dataset (pydata#9218)
  test push
  Added a space to the documentation (pydata#9247)
  Fix typing for test_plot.py (pydata#9234)
  Allow mypy to run in vscode (pydata#9239)
  Revert "Test main push"
  Test main push
  Revert "Update _typing.py"
  Update _typing.py
  Add a `.drop_attrs` method (pydata#8258)
@dcherian
Copy link
Contributor Author

I'll merge tomorrow if there are no comments.

One thing to think about is whether we want UniqueGrouper or CategoricalGrouper or CategoryGrouper.

Copy link
Collaborator

@headtr1ck headtr1ck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, only couple of minor details.
Feel free to ignore these as well :P

xarray/core/dataarray.py Outdated Show resolved Hide resolved
xarray/core/dataset.py Outdated Show resolved Hide resolved
xarray/core/groupers.py Outdated Show resolved Hide resolved
xarray/core/types.py Outdated Show resolved Hide resolved
xarray/core/types.py Outdated Show resolved Hide resolved
xarray/core/groupers.py Outdated Show resolved Hide resolved
@dcherian dcherian enabled auto-merge (squash) July 18, 2024 03:18
@dcherian dcherian merged commit b55c783 into pydata:main Jul 18, 2024
28 checks passed
dcherian added a commit that referenced this pull request Jul 22, 2024
* main:
  add backend intro and how-to diagram (#9175)
  Fix copybutton for multi line examples in double digit ipython cells (#9264)
  Update signature for _arrayfunction.__array__ (#9237)
  Add encode_cf_datetime benchmark (#9262)
  groupby, resample: Deprecate some positional args (#9236)
  Delete ``base`` and ``loffset`` parameters to resample (#9233)
  Update dropna docstring (#9257)
  Grouper, Resampler as public api (#8840)
  Fix mypy on main (#9252)
  fix fallback isdtype method (#9250)
  Enable pandas type checking (#9213)
  Per-variable specification of boolean parameters in open_dataset (#9218)
  test push
  Added a space to the documentation (#9247)
  Fix typing for test_plot.py (#9234)
dcherian added a commit that referenced this pull request Jul 24, 2024
* main: (54 commits)
  Adding `open_datatree` backend-specific keyword arguments (#9199)
  [pre-commit.ci] pre-commit autoupdate (#9202)
  Restore ability to specify _FillValue as Python native integers (#9258)
  add backend intro and how-to diagram (#9175)
  Fix copybutton for multi line examples in double digit ipython cells (#9264)
  Update signature for _arrayfunction.__array__ (#9237)
  Add encode_cf_datetime benchmark (#9262)
  groupby, resample: Deprecate some positional args (#9236)
  Delete ``base`` and ``loffset`` parameters to resample (#9233)
  Update dropna docstring (#9257)
  Grouper, Resampler as public api (#8840)
  Fix mypy on main (#9252)
  fix fallback isdtype method (#9250)
  Enable pandas type checking (#9213)
  Per-variable specification of boolean parameters in open_dataset (#9218)
  test push
  Added a space to the documentation (#9247)
  Fix typing for test_plot.py (#9234)
  Allow mypy to run in vscode (#9239)
  Revert "Test main push"
  ...
dcherian added a commit to JoelJaeschke/xarray that referenced this pull request Jul 25, 2024
…monotonic-variable

* main: (995 commits)
  Adding `open_datatree` backend-specific keyword arguments (pydata#9199)
  [pre-commit.ci] pre-commit autoupdate (pydata#9202)
  Restore ability to specify _FillValue as Python native integers (pydata#9258)
  add backend intro and how-to diagram (pydata#9175)
  Fix copybutton for multi line examples in double digit ipython cells (pydata#9264)
  Update signature for _arrayfunction.__array__ (pydata#9237)
  Add encode_cf_datetime benchmark (pydata#9262)
  groupby, resample: Deprecate some positional args (pydata#9236)
  Delete ``base`` and ``loffset`` parameters to resample (pydata#9233)
  Update dropna docstring (pydata#9257)
  Grouper, Resampler as public api (pydata#8840)
  Fix mypy on main (pydata#9252)
  fix fallback isdtype method (pydata#9250)
  Enable pandas type checking (pydata#9213)
  Per-variable specification of boolean parameters in open_dataset (pydata#9218)
  test push
  Added a space to the documentation (pydata#9247)
  Fix typing for test_plot.py (pydata#9234)
  Allow mypy to run in vscode (pydata#9239)
  Revert "Test main push"
  ...
@dcherian dcherian deleted the grouper-public-api branch November 17, 2024 16:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
plan to merge Final call for comments topic-groupby
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Update GroupBy constructor for grouping by multiple variables, dask arrays
5 participants