Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: split xarray into multi-ouput #113

Draft
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

kmuehlbauer
Copy link
Contributor

@kmuehlbauer kmuehlbauer commented Aug 1, 2024

Checklist

  • Used a personal fork of the feedstock to propose changes
  • Bumped the build number (if the version is unchanged)
  • Reset the build number to 0 (if the version changed)
  • Re-rendered with the latest conda-smithy (Use the phrase @conda-forge-admin, please rerender in a comment in this PR for automated rerendering)
  • Ensured the license file is being packaged.

@conda-forge-webservices
Copy link
Contributor

Hi! This is the friendly automated conda-forge-linting service.

I just wanted to let you know that I linted all conda-recipes in your PR (recipe/meta.yaml) and found it was in an excellent condition.

I do have some suggestions for making it better though...

For recipe/meta.yaml:

  • It looks like the 'xarray-core' output doesn't have any tests.

@kmuehlbauer
Copy link
Contributor Author

@conda-forge-admin please rerender

Copy link
Contributor

github-actions bot commented Aug 1, 2024

Hi! This is the friendly automated conda-forge-webservice.

I tried to rerender for you but ran into some issues. Please check the output logs of the latest webservices GitHub actions workflow run for errors. You can also ping conda-forge/core for further assistance or you can try rerendeing locally.

This message was generated by GitHub actions workflow run https://github.com/conda-forge/xarray-feedstock/actions/runs/10194306297.

@conda-forge-webservices
Copy link
Contributor

Hi! This is the friendly automated conda-forge-linting service.

I just wanted to let you know that I linted all conda-recipes in your PR (recipe/meta.yaml) and found it was in an excellent condition.

recipe/meta.yaml Outdated Show resolved Hide resolved
@kmuehlbauer
Copy link
Contributor Author

There is an unresolved issue conda/conda-build#5424 where {{ Python }} isn't working.

There seems to be a workaround here conda-forge/ariadne-codegen-feedstock#3. I'll adapt to this solution for now.

@kmuehlbauer
Copy link
Contributor Author

@conda-forge-admin please rerender

@kmuehlbauer
Copy link
Contributor Author

If someone knows how to handle the below warnings, please comment or push the needed changes. Otherwise this should create two packages xarray-core and xarray as planned. Thanks!

 ********************************************************************************
          ############################
          # Package would be ignored #
          ############################
          Python recognizes 'xarray.backends' as an importable package[^1],
          but it is absent from setuptools' `packages` configuration.

          This leads to an ambiguous overall configuration. If you want to distribute this
          package, please make sure that 'xarray.backends' is explicitly added
          to the `packages` configuration field.

          Alternatively, you can also rely on setuptools' discovery methods
          (for example by using `find_namespace_packages(...)`/`find_namespace:`
          instead of `find_packages(...)`/`find:`).

          You can read more about "package discovery" on setuptools documentation page:

          - https://setuptools.pypa.io/en/latest/userguide/package_discovery.html

          If you don't want 'xarray.backends' to be distributed and are
          already explicitly excluding 'xarray.backends' via
          `find_namespace_packages(...)/find_namespace` or `find_packages(...)/find`,
          you can try to use `exclude_package_data`, or `include-package-data=False` in
          combination with a more fine grained `package-data` configuration.

          You can read more about "package data files" on setuptools documentation page:

          - https://setuptools.pypa.io/en/latest/userguide/datafiles.html


          [^1]: For Python, any directory (with suitable naming) can be imported,
                even if it does not contain any `.py` files.
                On the other hand, currently there is no concept of package data
                directory, all directories are treated like packages.
          ********************************************************************************


@keewis
Copy link
Contributor

keewis commented Aug 1, 2024

that's something that needs to be fixed in the main repository, you also get that by running python -m build to create the build artifacts (sdist and wheel).


package:
name: {{ name|lower }}
name: xarray-suite
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would this mean that we'd get a separate page here: https://anaconda.org/conda-forge/xarray If so, it might be worth figuring out if we can keep this as xarray

commands:
- pip check
outputs:
- name: xarray-core
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A while back we went from a fully loaded xarray to a "core" but the upstream devs decided for a breaking change and keep the xarray name instead of creating xarray-core. In a way, this is another breaking change. While the patter here is common in conda-forge*, I would go for xarray to be the same and avoid another breaking change and add a xarray-extras (or better yet, use the optional names upstream.)

* The main reason for that is b/c we, conda-forge, tried to ship packages with as much extras we could but we did not had the multiple outputs at the time. When multiple outputs landed, we used the -core pattern to avoid breaking changes. Xarray was an exception at the request of the upstream devs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see pydata/xarray#9149 for the reasoning of the change

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the issue is that current xarray is just the minimal

    - python >=3.9
    - numpy >=1.23
    - pandas >=2.0
    - packaging >=23.1

while the rest of the packages is just introduced via run_constrained.

I see. So the impact for current users is, that they would get the new xarray-core plus the new

        # accel
        - flox >=0.7
        - opt_einsum
        - numbagg
        # io & cloud
        - fsspec
        - netcdf4 >=1.6.0
        - zarr >=2.14
        # plotting
        - matplotlib-base >=3.7
        # misc
        - pooch

Not sure what to do now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wish they were at this position a few years back, when I did not want to make a breaking change. So, here we go, another breaking change! Go for it. At this point I lost the will to try to communicate what the implications of these changes are to the users and the maintainers here and upstream need to figure this out by themselves.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what to do now.

@kmuehlbauer, my suggestion is to avoid another breaking change, do the multiple outputs but instead of creating xarray-core, create xarray-optional-deps-names-here for the optional deps that are in the pyproject.toml. That would make xarray experience much closer to PyPI's.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dcherian, @shoyer, @max-sixty, do you have any thoughts here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I'm concerned I would be good with

LGTM. I would not create the name -minimal to avoid confusion but as long as the xarray one doesn't change and the others reflect what is in the pyproject.toml, I'm OK. That not only avoid another breaking change but also mimics the original intention in the PyPI package.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To the extent that some burden falls on conda, I can see that our choices are annoying. It sounds like this isn't the first our of changes, which should bring us pause.

That said, even taking into account that pause, I'm fairly confident that the default experience when using xarray should be a package with enough dependencies for it to operate well, and that means including the "highly recommended dependencies".

I don't see how we can realistically make something like xarray-plus the default experience, so I don't see any other alternative than making xarray have the "highly recommended dependencies". Do others?

(even if I'm correct, it doesn't mean that we did this well, and so apologies to @ocefpaf...)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's ignore what it was but focus on what it can be. My only question is: why not the same as the package published on PyPI to ensure conda-forge and pip users will have a similar experience?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I very much thought this would be the same change for PyPI and conda.

(Unless someone thought differently? Strong agree with @ocefpaf that they should be the same...)

@kmuehlbauer
Copy link
Contributor Author

kmuehlbauer commented Aug 1, 2024

Proposal

  • xarray: plain xarray as of now (as is in main of this feedstock repo, keep current status)
  • xarray-min: xarray with minimal dependencies to get users working (as laid out by @dcherian)

From here naming as in pyproject.toml

  • xarray-accel: acceleration dependencies
  • xarray-io: reading/writing capabilities
  • xarray-parallel: parallel processing capapbilities
  • xarray-viz: full vizualization capabilities
  • xarray-complete: all of the above

@conda-forge/xarray Any other thoughts and comments?

@kmuehlbauer kmuehlbauer changed the title ENH: split into xarray-core and xarray ENH: split xarray into multi-ouput Aug 1, 2024
@dcherian
Copy link
Contributor

dcherian commented Aug 1, 2024

Thanks for your help here @ocefpaf. Our thinking has moved quite a bit in response to the many support issues we've been seeing.


FWIW my original proposal was to have a 3-6 month deprecation cycle where both xarray and xarray-core meant the same thing to give enough time for the users that care about minimal dependencies to move away.

@max-sixty

This comment was marked as resolved.

@kmuehlbauer
Copy link
Contributor Author

@max-sixty I've added some comments above, hope it's clearer now.

It's a proposal to prevent that breaking change.

@max-sixty

This comment was marked as resolved.

@kmuehlbauer
Copy link
Contributor Author

But then what is "with minimal dependencies to get users working" — is that with the "highly-recommended-dependencies"?

Yep, that was the idea. And I totally understand that this might be more confusing for users than a breaking change.

I'll put this PR on hold for now.

@kmuehlbauer kmuehlbauer marked this pull request as draft August 1, 2024 19:59
@kmuehlbauer
Copy link
Contributor Author

I made some GitHub searches to get numbers for the impact of breaking changes.

  • org:conda-forge xarray language:YAML
    • 808 feedstocks listing xarray in meta.yaml
  • xarray language:YAML
    • 16.4k files where xarray is used in some CI requirement file

@dcherian
Copy link
Contributor

dcherian commented Aug 2, 2024

Github search is a great idea!

16.4k files where xarray is used in some CI requirement file

9k have xarray and matplotlib
8k have netCDF4 and Xarray.
4.3k have xarray and cftime
4.3k have Xarray and fsspex
2.2k have Xarray and pooch
2.3k have netCDF4 and Zarr and Xarray

In summary I think at least 50% of impacted users won't care.

@ocefpaf
Copy link
Member

ocefpaf commented Aug 2, 2024

In summary I think at least 50% of impacted users won't care.

Please note that what is in those CIs are not all of conda-forge users! Actually, if that is in a, CI I'm pretty sure it is a user who can easily take a breaking change (although they are the ones that will complain the loudest). I'm more concern about the actual users who will see their research envs break and no clue where/how to even ask for help.

@dcherian
Copy link
Contributor

dcherian commented Aug 2, 2024

the actual users who will see their research envs break

I have the counter opinion. Research envs tend to be a bit kitchen-sink in my experience, so they might not care. In fact, they probably just have most of this stuff in there already.

I'm not sure what you mean by "break". Are you worried about solving challenges.

IMO the people who will be most annoyed are libraries with CI whose env creation time might go up some. OR they're trying to be careful about testing in a minimal-deps configuration, and now that's broken.

@ocefpaf
Copy link
Member

ocefpaf commented Aug 2, 2024

I have the counter opinion. Research envs tend to be a bit kitchen-sink in my experience, so they might not care.

I'm happy that is not true. Otherwise I'd be out of my job! I'm also happy you are on the more evolved side of the spectrum, where people do know more about the tools they are using. You are lucky and don't know it ;-p

With that said. I'm removing myself from this thread. I'll just leave re-iterating what I said above:

  1. If we are changing things I'd prefer have a 1-1 match with the same xarray experience from PyPI. @kmuehlbauer proposal in ENH: split xarray into multi-ouput #113 (comment) is exactly that.
  2. I won't oppose this change. It is just frustrating that years ago I was defending it and could not convince the upstream devs. Now, things are reversed :-/

@max-sixty
Copy link

Thank you @ocefpaf . I definitely empathize with the frustrations, and apologize from our end that we haven't been more careful. (I realize the apology isn't that helpful now. Though you can use these words against us if anyone comes back to try and to change it back...)


@dcherian can you confirm that this is your idea too? I had thought so but worth ensuring we're all on the same page.

  1. If we are changing things I'd prefer have a 1-1 match with the same xarray experience from PyPI.

@dcherian
Copy link
Contributor

If we are changing things I'd prefer have a 1-1 match with the same xarray experience from PyPI.

My proposal was to have the default xarray be more full featured so people opt-out by intentionally using xarray-core. I also proposed that we have xarray be synonymous with xarray-core for 6 months, and announce that xarray will change to have many more dependencies after that period on the blog and in the release notes.

@max-sixty
Copy link

My proposal was to have the default xarray be more full featured so people opt-out by intentionally using xarray-core. I also proposed that we have xarray be synonymous with xarray-core for 6 months, and announce that xarray will change to have many more dependencies after that period on the blog and in the release notes.

Would PyPI also have xarray & xarray-core? I was thinking this is how we'd design it.

@dcherian
Copy link
Contributor

Would PyPI also have xarray & xarray-core? I was thinking this is how we'd design it.

Ah I see, I agree.

@ocefpaf
Copy link
Member

ocefpaf commented Sep 25, 2024

Would PyPI also have xarray & xarray-core? I was thinking this is how we'd design it.

Ah I see, I agree.

+1

As long as the conda-forge pkg and the PyPI packages are aligned, I'm ok with this change.

@shoyer
Copy link
Contributor

shoyer commented Sep 25, 2024

Is there a standard or concensus about the best practice for maintaining conda-forge packages with large numbers of optional dependencies?

Originally (many years ago), I think I argued for making xarray minimal, because that made sense to me and was consistent with other platforms like pypi. But it seems that the community has standardized on the different approach (e.g., see matplotlib vs matplotlib-base), and sticking with something different is not helpful for xarray users.

@ocefpaf
Copy link
Member

ocefpaf commented Sep 26, 2024

Is there a standard or concensus about the best practice for maintaining conda-forge packages with large numbers of optional dependencies?

Not really. We were trying to accomodate some historical awkwardness from the fact that, at first, people wanted to have all the optional dependencies, then they wanted lightweight packages, but without breaking the existing package names. That is why you see all these -core or -base out there. The original name was the fully loaded first incarnation of the conda package, and the core/base the bare minimum that showed up later. Nowadays we usually recommend folks, when adding new packages, to go for the minimum, add run constraints for the optional ones. Hopefully upstream devs added nice ImportError messages and the user can just install what is missing when they try to use it. (IMO a better pattern than the obscure and non-standard [extras/optional/whatever name that is 99% of the time not documented and we need to open the pyproject.toml to figure it out] in pip. But I digress...) One can go the extra mile and add the -extras, -plotting, etc, following the PyPI form of their packages. Be aware that adds some extra maintenance burden, use multiple outputs in that case is highly recommended.

Originally (many years ago), I think I argued for making xarray minimal, because that made sense to me and was consistent with other platforms like pypi. But it seems that the community has standardized on the different approach (e.g., see matplotlib vs matplotlib-base), and sticking with something different is not helpful for xarray users.

Indeed xarray went the other way and decided for a breaking change, at the time I was against it. Now, we are going for another breaking change. I would not do this if the reason is "consistency with other packages you see in conda-forge" b/c the ones that are like xarray's current form are "invisible" but they are far more common than the core/base packages out there. So, while there is no consensus, the core/base is only recommended for old packages to avoid breakages.

Again, as long as xarray keeps 1-1 parity with PyPI, do whatever you want here. The only thing I don't want is to explain why people installing from conda-forge are getting a different behavior than PyPI in their envs.

@shoyer
Copy link
Contributor

shoyer commented Sep 26, 2024

Again, as long as xarray keeps 1-1 parity with PyPI, do whatever you want here. The only thing I don't want is to explain why people installing from conda-forge are getting a different behavior than PyPI in their envs.

Well, I definitely don't think we should be changing PyPI to make "xarray" point to a project with many optional dependencies. That would be entirely non-standard -- no other packages in the numerical Python community do that, to my understanding.

I suspect there are some demographic differences between the PyPI and Conda-Forge communities, which may be worth taking into consideration. Users on PyPI are much more likely to be "incidental" users of Xarray that mostly use pure Python packages, where Conda-Forge users are more likely to need niche scientific/geospatial packages (which is why they're using Conda-Forge in the first place). So it may be more helpful and less disruptive to Conda-Forge users if "xarray" bundles in more dependencies, though certainly breaking changes are not ideal.

@ocefpaf
Copy link
Member

ocefpaf commented Sep 26, 2024

As the person answering to why conda diverges from PyPI in every issue, conference and meetup, I disagree. I also don't think that demographics difference exists. More of a bias from the communities we are in than a real conda vs PyPI user base breakdown. I can assure you that is not true for most conda-forge package users.

BTW, we struggle a lot with reproducible envs across different ecosystems. With the recent Anaconda kerfuffle on the ToS and the rise of uv, many are moving away from conda packages and using PyPI ones. While I disagree with the reasons and there is a lot of disinformation on the matter, those people will be hit those differences, that should not exist in the first place.

I am to blame for loading the original conda-forge packages with optional deps. If I could I would go back in time and make the PyPI parity standard from day 1. Anyway, as I said before, it is up to you'all, even if I don't recommend this change. At least I have this thread to point people to 😬

@max-sixty
Copy link

Well, I definitely don't think we should be changing PyPI to make "xarray" point to a project with many optional dependencies. That would be entirely non-standard -- no other packages in the numerical Python community do that, to my understanding.

This was the plan I was advocating for!

I guess this issue might not be the forum for this discussion, but why the difference between PyPI & Conda? Is it just what's standard?

I'm applying the same logic to the PyPI case — folks starting out want an off-the-shelf package that works well, and when we avoid including dependencies that make xarray much better, we give them a worse product. Instead we should be giving them a reasonable balance of functionality and size — even if we've done the work to allow xarray to work in some form without the dependency.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants