Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a test for usability of duck arrays with chunks property #8739

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

hmaarrfk
Copy link
Contributor

xref: #8733

xarray/tests/test_variable.py F

================================================ FAILURES ================================================
____________________________ TestAsCompatibleData.test_duck_array_with_chunks ____________________________

self = <xarray.tests.test_variable.TestAsCompatibleData object at 0x7f3d1b122e60>

    def test_duck_array_with_chunks(self):
        # Non indexable type
        class CustomArray(NDArrayMixin, indexing.ExplicitlyIndexed):
            def __init__(self, array):
                self.array = array
    
            @property
            def chunks(self):
                return self.shape
    
            def __array_function__(self, *args, **kwargs):
                return NotImplemented
    
            def __array_ufunc__(self, *args, **kwargs):
                return NotImplemented
    
    
        array = CustomArray(np.arange(3))
        assert is_chunked_array(array)
        var = Variable(dims=("x"), data=array)
>       var.load()

/home/mark/git/xarray/xarray/tests/test_variable.py:2745: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
/home/mark/git/xarray/xarray/core/variable.py:936: in load
    self._data = to_duck_array(self._data, **kwargs)
/home/mark/git/xarray/xarray/namedarray/pycompat.py:129: in to_duck_array
    chunkmanager = get_chunked_array_type(data)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

args = (CustomArray(array=array([0, 1, 2])),), chunked_arrays = [CustomArray(array=array([0, 1, 2]))]
chunked_array_types = {<class 'xarray.tests.test_variable.TestAsCompatibleData.test_duck_array_with_chunks.<locals>.CustomArray'>}
chunkmanagers = {'dask': <xarray.namedarray.daskmanager.DaskManager object at 0x7f3d1b1568f0>}

    def get_chunked_array_type(*args: Any) -> ChunkManagerEntrypoint[Any]:
        """
        Detects which parallel backend should be used for given set of arrays.
    
        Also checks that all arrays are of same chunking type (i.e. not a mix of cubed and dask).
        """
    
        # TODO this list is probably redundant with something inside xarray.apply_ufunc
        ALLOWED_NON_CHUNKED_TYPES = {int, float, np.ndarray}
    
        chunked_arrays = [
            a
            for a in args
            if is_chunked_array(a) and type(a) not in ALLOWED_NON_CHUNKED_TYPES
        ]
    
        # Asserts all arrays are the same type (or numpy etc.)
        chunked_array_types = {type(a) for a in chunked_arrays}
        if len(chunked_array_types) > 1:
            raise TypeError(
                f"Mixing chunked array types is not supported, but received multiple types: {chunked_array_types}"
            )
        elif len(chunked_array_types) == 0:
            raise TypeError("Expected a chunked array but none were found")
    
        # iterate over defined chunk managers, seeing if each recognises this array type
        chunked_arr = chunked_arrays[0]
        chunkmanagers = list_chunkmanagers()
        selected = [
            chunkmanager
            for chunkmanager in chunkmanagers.values()
            if chunkmanager.is_chunked_array(chunked_arr)
        ]
        if not selected:
>           raise TypeError(
                f"Could not find a Chunk Manager which recognises type {type(chunked_arr)}"
E               TypeError: Could not find a Chunk Manager which recognises type <class 'xarray.tests.test_variable.TestAsCompatibleData.test_duck_array_with_chunks.<locals>.CustomArray'>

/home/mark/git/xarray/xarray/namedarray/parallelcompat.py:158: TypeError
============================================ warnings summary ============================================
xarray/testing/assertions.py:9
  /home/mark/git/xarray/xarray/testing/assertions.py:9: DeprecationWarning: 
  Pyarrow will become a required dependency of pandas in the next major release of pandas (pandas 3.0),
  (to allow more performant data types, such as the Arrow string type, and better interoperability with other libraries)
  but was not found to be installed on your system.
  If this would cause problems for you,
  please provide us feedback at https://github.com/pandas-dev/pandas/issues/54466
          
    import pandas as pd

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
======================================== short test summary info =========================================
FAILED xarray/tests/test_variable.py::TestAsCompatibleData::test_duck_array_with_chunks - TypeError: Could not find a Chunk Manager which recognises type <class 'xarray.tests.test_variable.Te...
====================================== 1 failed, 1 warning in 0.77s ======================================
(dev) ✘-1 ~/git/xarray [add_test_for_duck_array|✔] 
  • Closes #xxxx
  • Tests added
  • User visible changes (including notable bug fixes) are documented in whats-new.rst
  • New functions/methods are listed in api.rst

@hmaarrfk
Copy link
Contributor Author

If this change of behavior is acceptable (Raised error vs None) then I can probably work to fix the mypy errors.

@TomNicholas TomNicholas added the topic-chunked-arrays Managing different chunked backends, e.g. dask label Jul 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic-chunked-arrays Managing different chunked backends, e.g. dask
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants