Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TypeError for generic numpy.ndarray fields #155

Open
tgpfeiffer opened this issue Jul 30, 2021 · 2 comments · May be fixed by #156
Open

TypeError for generic numpy.ndarray fields #155

tgpfeiffer opened this issue Jul 30, 2021 · 2 comments · May be fixed by #156
Labels
enhancement New feature or request

Comments

@tgpfeiffer
Copy link

Since numpy 1.21 the numpy.ndarray class is generic for mypy and needs to be annotated with type parameters, and unfortunately this means that I cannot use dacite.from_dict any more when one member of the dictionary is such an ndarray.

Minimal example:

from dataclasses import dataclass
from typing import List
import dacite
import numpy as np
import numpy.typing as npt


@dataclass
class Foo:
    x: List[float]


@dataclass
class Bar:
    x: npt.NDArray[np.float64]


print(dacite.from_dict(data_class=Foo, data={"x": [1.0, 2.0]}))
print(dacite.from_dict(data_class=Bar, data={"x": np.array([1.0, 2.0])}))

In the code above, Foo works fine, but creating Bar fails with

Traceback (most recent call last):
  File "nptest.py", line 20, in <module>
    print(dacite.from_dict(data_class=Bar, data={"x": np.array([1.0, 2.0])}))
  File ".../python3.8/site-packages/dacite/core.py", line 60, in from_dict
    transformed_value = transform_value(
  File ".../lib/python3.8/site-packages/dacite/types.py", line 36, in transform_value
    return collection_cls(transform_value(type_hooks, cast, item_cls, item) for item in value)
TypeError: expected sequence object with len >= 0 or a single integer 

If I add config=dacite.Config(check_types=False) to the from_dict() call it does not work, either, and fails with the same error.

If I change the type of Bar.x from npt.NDArray[np.float64] to a plain np.ndarray then it works again, but mypy complains

error: Missing type parameters for generic type "ndarray"  [type-arg]

I am not exactly sure what that error message means. Is there anything I can do to work around this?

@tgpfeiffer
Copy link
Author

@konradhalas Please let me ask for your opinion on how I should address this issue.

As I wrote in the issue above, generic type annotations for numpy.ndarray are now possible in numpy >= 1.20, and they are required unless ignore: type[type-arg] is used. numpy.typing.NDArray[T] becomes an alias for numpy.ndarray[Any, T] and then we run into a number of issues with the conversion in

dacite/dacite/types.py

Lines 25 to 36 in d2206b2

if is_generic_collection(target_type) and isinstance(value, extract_origin_collection(target_type)):
collection_cls = value.__class__
if issubclass(collection_cls, dict):
key_cls, item_cls = extract_generic(target_type, defaults=(Any, Any))
return collection_cls(
{
transform_value(type_hooks, cast, key_cls, key): transform_value(type_hooks, cast, item_cls, item)
for key, item in value.items()
}
)
item_cls = extract_generic(target_type, defaults=(Any,))[0]
return collection_cls(transform_value(type_hooks, cast, item_cls, item) for item in value)
as the "new" ndarray is now considered a generic collection as per the code in is_generic_collection(), where the "old" ndarray is not:

  1. The code gets the type of the items of the generic collection using item_cls = extract_generic(target_type, defaults=(Any,))[0], i.e., the first generic argument is always used. However, for numpy the correct one would be the second.
  2. The code builds the instance using collection_cls(transform_value(type_hooks, cast, item_cls, item) for item in value) but numpy.ndarray cannot work with a generator as a parameter. That is actually the reason for the error message I shared in the issue description.
  3. The code assumes that passing the items of the converted collection into the constructor reproduces the collection, but that is not the case for numpy.ndarray. If the items in the array are [1, 2, 3] and we pass this list into the numpy.ndarray constructor then it will build an array with shape (1, 2, 3), not with these values.

These are a number of assumptions that would all have to be addressed in order for numpy.ndarray to work like a normal generic collection, and I don't feel it's reasonable to do so.

Therefore I would like to suggest adding an exception specifically for numpy.ndarray that prevents this class to be treated as a generic collection, something along the lines of

--- a/dacite/types.py
+++ b/dacite/types.py
@@ -142,11 +142,15 @@ def is_generic_collection(type_: Type) -> bool:
         return False
     origin = extract_origin_collection(type_)
     try:
-        return bool(origin and issubclass(origin, Collection))
+        return bool(origin and issubclass(origin, Collection) and not skip_generic_conversion(origin))
     except (TypeError, AttributeError):
         return False
 
 
+def skip_generic_conversion(origin: Type) -> bool:
+    return origin.__module__ == "numpy" and origin.__qualname__ == "ndarray"
+
+
 def extract_generic(type_: Type, defaults: Tuple = ()) -> tuple:
     try:
         if hasattr(type_, "_special") and type_._special:

which makes the conversion work for both numpy.ndarray as well as numpy.typing.NDArray[numpy.float64] members. This is a bit hacky but in particular it means we don't have to introduce a dependency on numpy. Would that be an acceptable method for you?

One issue that still happens with this is that the converted value fails the type check at the end:

wrong value type for field "a" - should be "ndarray" instead of value "[1 2 3]" of type "ndarray"

which happens because in

return isinstance(value, type_)
we have something like

>>> isinstance(numpy.array([1,2,3]), numpy.typing.NDArray[numpy.float64])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/tgp/.envs/dacite/lib64/python3.8/site-packages/numpy/typing/_generic_alias.py", line 137, in __instancecheck__
    raise TypeError("isinstance() argument 2 cannot be a "
TypeError: isinstance() argument 2 cannot be a parameterized generic

I am not entirely sure how to address this best. I have tried to add a check like

--- a/dacite/types.py
+++ b/dacite/types.py
@@ -127,6 +127,9 @@ def is_instance(value: Any, type_: Type) -> bool:
         return is_instance(value, extract_init_var(type_))
     elif is_type_generic(type_):
         return is_subclass(value, extract_generic(type_)[0])
+    elif is_generic(type_):
+        origin = extract_origin_collection(type_)
+        return isinstance(value, origin)
     else:
         try:
             # As described in PEP 484 - section: "The numeric tower"

which does work, but it makes the test_is_instance_with_not_supported_generic_types test fail. I do not have the insight into whether this test was added more for documentary purposes and it's ok to remove it, or whether you actually want to keep this behavior.

I have a PR more or less ready, but I would like to check whether you have any opinion on the proposed changes. Please let me know any thoughts you have.

@tgpfeiffer tgpfeiffer linked a pull request Aug 10, 2021 that will close this issue
@konradhalas konradhalas added the enhancement New feature or request label Apr 12, 2022
@Mjboothaus
Copy link

Yes - this is a problem for me too :(

     67     if config.check_types and not is_instance(value, field.type):
---> 68         raise WrongTypeError(field_path=field.name, field_type=field.type, value=value)
     69 except KeyError:
     70     try:

WrongTypeError: wrong value type for field "z" - should be "array" instead of value "[0. 0. 0. ... 0. 0. 0.]" of type "ndarray"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants