Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Request: add option to return strict API #180

Closed
stefanv opened this issue Aug 29, 2024 · 12 comments
Closed

Request: add option to return strict API #180

stefanv opened this issue Aug 29, 2024 · 12 comments

Comments

@stefanv
Copy link

stefanv commented Aug 29, 2024

When writing code that should be widely compatible with multiple Array implementations, it would be helpful if array-api-compat could return a strict Array API namespace, that did not blend the Array API with the target type's namespace.

E.g., in:

import numpy as np
import array_api_compat
import dask.array as da

d = da.from_array(np.arange(256), chunks=(12,))
nx = array_api_compat.get_namespace(d)

The nx object contains non-Array-API-compatible members, such as flatnonzero. This caused some confusion when examining dask/dask#11298.

While developing, I'd like to get a namespace that only consists of Array API functions. I presume the additional blending is done to produce a minimally modified version of a library namespace that, like NumPy, supports the Array API, but also does a bunch of other things. However, I'd argue that this is unhelpful in the context of developing for the Array API. In our example above, I'd be tempted to use flatnonzero when, in fact, that function is not in the array API. Worse, in the case above, that would not work (the only guaranteed Array API compatible dask implementations come from array-api-compat).

Since the blending of namespaces has the potential for confusion, and reduces the utility of the namespace in coding for Array API compatibility, I am curious why it is the default. However, since it is the default, I'd like to request a new feature, along the lines of:

nx = array_api_compat.get_namespace(d, strict=True)

@lucascolley mentioned to me the array-api-strict package for use in test suites, but while helpful, it is somewhat orthogonal to my request here.

I am only now really starting to dig into Array API compatibility, thinking about scikit-image in particular, so apologies if I missed some obvious existing mechanisms of doing what I want.

@lucascolley
Copy link
Contributor

@lucascolley mentioned to me the array-api-strict package for use in test suites, but while helpful, it is somewhat orthogonal to my request here.

I think it's useful to spell out how this is orthogonal:

  • array-api-strict is a separate package, so something else you need to install. If we had a strict flag like this, just an array library and array-api-compat would be needed.

Are there other ways in which it is orthogonal?

At least for me, installing array-api-strict isn't a blocker for my workflows. I can see that it might be nice to be able to do this with just array-api-compat, though.

@stefanv
Copy link
Author

stefanv commented Aug 29, 2024

I think what I'd like is essentially array-api-strict, but with the option of having it derive not only from NumPy, but also from dask, or cupy, or any of the other Array types supported by array-api-compat.

@lucascolley
Copy link
Contributor

but with the option of having it derive not only from NumPy, but also from dask, or cupy, or any of the other Array types

could you explain why deriving from other array types is useful? They should all be equivalent if they all implement the standard, right? Maybe you are thinking about catching bugs in array-api-compat / the array libraries?

@stefanv
Copy link
Author

stefanv commented Aug 29, 2024

Yes, exactly; finding bugs, perhaps testing my own array API library, etc. Could be useful in a test suite too.

@ntessore
Copy link

I am also interested in this. For what it's worth, I have mostly tried to solve the problem with static typing, along the lines of data-apis/array-api#589 by @nstarman. It seems promising, but also a lot of duplication of signatures, etc.

@lithomas1
Copy link
Contributor

The reason non array-api elements are bleeding in is because we do a star import from dask.array (and in other modules).

I forget why this is, but I think it's so you can use a module wrapped by array API compat as you would normally, without non-array API methods/attributes missing.

cc @asmeurer

@rgommers
Copy link
Member

rgommers commented Sep 2, 2024

Yes, exactly; finding bugs, perhaps testing my own array API library, etc. Could be useful in a test suite too.

I don't think the explanation about why this is useful is concrete enough. We have several packages with distinct purposes. Here you're talking about two other packages:

  1. for "testing my own array API library", you want the array-api-tests test suite
  2. for testing code of libraries that are adding support for array API libraries, there is array-api-strict. If tests pass with array-api-strict, you're not using anything outside of the standard.

array_api_compat.get_namespace(d, strict=True)

Note that this cannot hide for example methods on the array and dtype objects, so this is probably strictly worse than array_api_strict if you really care about not discovering things outside the standard. I suggest trying to use array_api_strict as the array library and seeing if you actually miss something in practice.

@stefanv
Copy link
Author

stefanv commented Sep 3, 2024

I would like a somewhat strict dask API that is array API compatible. It sounds like the conclusion here is "we don't want to provide that, because we don't think you need it", so I'll close the issue.

@stefanv stefanv closed this as completed Sep 3, 2024
@lucascolley
Copy link
Contributor

lucascolley commented Sep 3, 2024

I would like a somewhat strict dask API that is array API compatible. It sounds like the conclusion here is "we don't want to provide that, because we don't think you need it", so I'll close the issue.

FWIW I would be happy to provide it, I just don't quite see the merits as worth working on yet. So far, it would enable us to:

  • do (some of) array-api-strict's job without array-api-strict (for someone looking to consume Dask via the array API)
  • do (some of) array-api-tests' job without array-api-tests (for someone looking to wrap Dask to be array API compatible)
  • try to find bugs in array-api-compat or Dask by hand in case array-api-tests has missing coverage
  • try to find bugs in array-agnostic code by hand in case array-api-strict has missing coverage

I think that's enough to put it on a wishlist, but personally I feel like my time would be better spent contributing to array-api-strict or array-api-tests. Unless I've missed a use-case?

@asmeurer
Copy link
Member

asmeurer commented Sep 3, 2024

(sorry for being late to reply here. I've been on PTO)

It would help here to be more concrete about what sorts of bugs you would expect to find that aren't currently findable by using array-api-strict.

I think there probably is room to extend array-api-strict to give the functionality you are after here. array-api-strict has a flags feature that allows enabling or disabling different flags that change how the library works to emulate different behaviors that might be seen by real array API packages in the wild. For instance, there are flags to disable data-dependent shape behavior, which is optional in the standard. Since you mentioned Dask, we could add some flags to array-api-strict to make it act like a lazy library (data-apis/array-api-strict#58). A potential issue with this is that some aspects of lazy array behavior haven't been fully codified by the standard yet (see data-apis/array-api#748 (comment)).

As for implementing this in the compat library, it wouldn't be too difficult to add a flag that makes the returned namespace only have standard functions. But you should consider whether this would actually be useful or not. The main problem is that this would be a pretty far cry from the sort of "strictness" you'd get from array-api-strict. For instance,

  • The functions that remain might accept additional keyword arguments that aren't in the standard (like out).
  • Those functions might accept additional types of inputs (like Python scalars or NumPy arrays).
  • Functions might accept dtype combinations that aren't required by the standard (like sin(integer_array)).
  • The array objects themselves would still have all the additional methods, because we do not wrap the array objects at all in this package. (see https://data-apis.org/array-api-compat/dev/special-considerations.html)

The array-api-strict library does not have any of these issues, and does provide true "strictness". I would really hesitate to implement full "strictness" like this here because I don't see how it would add much value beyond what is already provided by array-api-strict, but it would be a lot of work to get this working.

Especially if you consider the last bullet point, the only real way to change the attributes on the array object is to wrap it in a separate object. In that case, you'd really not be using array library X anymore, but rather a wrapper library. This is what array-api-strict already is. There'd be no difference except it would be wrapping library X instead of NumPy, but if everything is strictly wrapped, it shouldn't really matter what the underlying library is.

@stefanv
Copy link
Author

stefanv commented Sep 3, 2024

All comments here make perfect sense from the perspective of a developer who wants to implement Array API compatible behavior in their library.

But, consider how I arrived here: skimage used to support dask in certain context, and then our related tests started failing. Wondering what caused it, I found the place where dask changed. But, then I thought, "as a library that is widely touted to work with the Array API, how does dask operate in an Array API environment"? So, my next thought was: let's take dask through the steps that an Array API-compatible implementation would take, and see what happens. When I did so I ended up with a namespace that looked anything like the Array API; and this was rather confusing—I don't have a list of Array API functions in my head.

When I realized what had happend, I requested this feature, because it would have saved me some time if I could have asked for an Array API compatible-ish dask namespace, and would have avoided some confusion.

My guess was that it would be trivial to ad (and would hopefully avoid some confusion for future, other developers too) but if it isn't I really don't want to spend time arguing for it. Given that skimage is mostly implemented in Cython, it is not a good use-case for the Array API (as far as I understand), and I was trying to sort out this one-off dask bug. For skimage, the solution was easy enough: drop the dask tests, since they didn't imply a meaningful contract with the users anyway.

@asmeurer Thank you for the detailed explanation around what array-api-strict provides; I appreciate it.

@asmeurer
Copy link
Member

asmeurer commented Sep 3, 2024

The question is whether running your skimage function through array-api-strict would have solved your problem. Based on your description, it sounds like it would have, because array-api-strict only has the functions that are in the array API. But it's also very strict in other ways too. The use-case you describe is really what strict was designed for. The array API isn't very strict in disallowing behaviors beyond what it specifies, and all libraries do implement more functions, keyword arguments, dtypes, etc. So the only way to test if your code is really being portable is to test against every possible array library, or to test against a strict minimal implementation like array-api-strict.

Given that skimage is mostly implemented in Cython, it is not a good use-case for the Array API (as far as I understand), and I was trying to sort out this one-off dask bug. For skimage, the solution was easy enough: drop the dask tests, since they didn't imply a meaningful contract with the users anyway.

I think SciPy is similar, and they are implementing array API support at least for the pure Python functions. But maybe more of SciPy is pure Python/NumPy than scikit-image (I'm not super familiar with the codebases of either, so I can't really speak to this).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants