Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ExtraBasicDiscretizer not working with scikit-learn 1.4 #211

Open
jose-matos opened this issue Aug 15, 2024 · 0 comments
Open

ExtraBasicDiscretizer not working with scikit-learn 1.4 #211

jose-matos opened this issue Aug 15, 2024 · 0 comments

Comments

@jose-matos
Copy link

First thank you for work. I appreciate it. :-)

I run the tutorial and there is a single example that does not work, the example that uses the ExtraBasicDiscretizer:

disc = ExtraBasicDiscretizer(feat_names[:3], n_bins=3, strategy='uniform')
X_train_brl_df = disc.fit_transform(pd.DataFrame(X_train[:, :3], columns=feat_names[:3]))
X_test_brl_df = disc.transform(pd.DataFrame(X_test[:, :3], columns=feat_names[:3]))

The problem occurs in the second and third lines:

When calling X_train_brl_df = disc.fit_transform(pd.DataFrame(X_train[:, :3], columns=feat_names[:3])) I get:

[/usr/lib64/python3.13/site-packages/sklearn/preprocessing/_discretization.py:248](http://localhost:8888/usr/lib64/python3.13/site-packages/sklearn/preprocessing/_discretization.py#line=247): FutureWarning: In version 1.5 onwards, subsample=200_000 will be used by default. Set subsample explicitly to silence this warning in the mean time. Set subsample=None to disable subsampling explicitly.
  warnings.warn(

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[17], line 1
----> 1 X_train_brl_df = disc.fit_transform(pd.DataFrame(X_train[:, :3], columns=feat_names[:3]))

File [/usr/lib64/python3.13/site-packages/sklearn/utils/_set_output.py:295](http://localhost:8888/usr/lib64/python3.13/site-packages/sklearn/utils/_set_output.py#line=294), in _wrap_method_output.<locals>.wrapped(self, X, *args, **kwargs)
    293 @wraps(f)
    294 def wrapped(self, X, *args, **kwargs):
--> 295     data_to_wrap = f(self, X, *args, **kwargs)
    296     if isinstance(data_to_wrap, tuple):
    297         # only wrap the first output for cross decomposition
    298         return_tuple = (
    299             _wrap_data_with_container(method, data_to_wrap[0], X, self),
    300             *data_to_wrap[1:],
    301         )

File [/usr/lib64/python3.13/site-packages/sklearn/base.py:1098](http://localhost:8888/usr/lib64/python3.13/site-packages/sklearn/base.py#line=1097), in TransformerMixin.fit_transform(self, X, y, **fit_params)
   1083         warnings.warn(
   1084             (
   1085                 f"This object ({self.__class__.__name__}) has a `transform`"
   (...)
   1093             UserWarning,
   1094         )
   1096 if y is None:
   1097     # fit method of arity 1 (unsupervised transformation)
-> 1098     return self.fit(X, **fit_params).transform(X)
   1099 else:
   1100     # fit method of arity 2 (supervised transformation)
   1101     return self.fit(X, y, **fit_params).transform(X)

File [/usr/lib64/python3.13/site-packages/sklearn/utils/_set_output.py:295](http://localhost:8888/usr/lib64/python3.13/site-packages/sklearn/utils/_set_output.py#line=294), in _wrap_method_output.<locals>.wrapped(self, X, *args, **kwargs)
    293 @wraps(f)
    294 def wrapped(self, X, *args, **kwargs):
--> 295     data_to_wrap = f(self, X, *args, **kwargs)
    296     if isinstance(data_to_wrap, tuple):
    297         # only wrap the first output for cross decomposition
    298         return_tuple = (
    299             _wrap_data_with_container(method, data_to_wrap[0], X, self),
    300             *data_to_wrap[1:],
    301         )

File [/usr/lib/python3.13/site-packages/imodels/discretization/discretizer.py:391](http://localhost:8888/usr/lib/python3.13/site-packages/imodels/discretization/discretizer.py#line=390), in ExtraBasicDiscretizer.transform(self, X)
    389 # One-hot encode the ordinal DF
    390 disc_onehot_np = self.encoder_.transform(disc_ordinal_df_str)
--> 391 disc_onehot = pd.DataFrame(
    392     disc_onehot_np, columns=self.encoder_.get_feature_names_out())
    394 # Name columns after the interval they represent (e.g. 0.1_to_0.5)
    395 for col, bin_edges in zip(self.dcols, self.discretizer_.bin_edges_):

File [/usr/lib64/python3.13/site-packages/pandas/core/frame.py:856](http://localhost:8888/usr/lib64/python3.13/site-packages/pandas/core/frame.py#line=855), in DataFrame.__init__(self, data, index, columns, dtype, copy)
    848         mgr = arrays_to_mgr(
    849             arrays,
    850             columns,
   (...)
    853             typ=manager,
    854         )
    855     else:
--> 856         mgr = ndarray_to_mgr(
    857             data,
    858             index,
    859             columns,
    860             dtype=dtype,
    861             copy=copy,
    862             typ=manager,
    863         )
    864 else:
    865     mgr = dict_to_mgr(
    866         {},
    867         index,
   (...)
    870         typ=manager,
    871     )

File [/usr/lib64/python3.13/site-packages/pandas/core/internals/construction.py:336](http://localhost:8888/usr/lib64/python3.13/site-packages/pandas/core/internals/construction.py#line=335), in ndarray_to_mgr(values, index, columns, dtype, copy, typ)
    331 # _prep_ndarraylike ensures that values.ndim == 2 at this point
    332 index, columns = _get_axes(
    333     values.shape[0], values.shape[1], index=index, columns=columns
    334 )
--> 336 _check_values_indices_shape_match(values, index, columns)
    338 if typ == "array":
    339     if issubclass(values.dtype.type, str):

File [/usr/lib64/python3.13/site-packages/pandas/core/internals/construction.py:420](http://localhost:8888/usr/lib64/python3.13/site-packages/pandas/core/internals/construction.py#line=419), in _check_values_indices_shape_match(values, index, columns)
    418 passed = values.shape
    419 implied = (len(index), len(columns))
--> 420 raise ValueError(f"Shape of passed values is {passed}, indices imply {implied}")

ValueError: Shape of passed values is (192, 1), indices imply (192, 9)

If I run the third line before the second, X_test_brl_df = disc.transform(pd.DataFrame(X_test[:, :3], columns=feat_names[:3])), I get:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[16], line 2
      1 disc = ExtraBasicDiscretizer(feat_names[:3], n_bins=3, strategy='uniform')
----> 2 X_test_brl_df = disc.transform(pd.DataFrame(X_test[:, :3], columns=feat_names[:3]))

File [/usr/lib64/python3.13/site-packages/sklearn/utils/_set_output.py:295](http://localhost:8888/usr/lib64/python3.13/site-packages/sklearn/utils/_set_output.py#line=294), in _wrap_method_output.<locals>.wrapped(self, X, *args, **kwargs)
    293 @wraps(f)
    294 def wrapped(self, X, *args, **kwargs):
--> 295     data_to_wrap = f(self, X, *args, **kwargs)
    296     if isinstance(data_to_wrap, tuple):
    297         # only wrap the first output for cross decomposition
    298         return_tuple = (
    299             _wrap_data_with_container(method, data_to_wrap[0], X, self),
    300             *data_to_wrap[1:],
    301         )

File [/usr/lib/python3.13/site-packages/imodels/discretization/discretizer.py:385](http://localhost:8888/usr/lib/python3.13/site-packages/imodels/discretization/discretizer.py#line=384), in ExtraBasicDiscretizer.transform(self, X)
    369 """
    370 Discretize the data.
    371 
   (...)
    381     binned space. All other features remain unchanged.
    382 """
    384 # Apply discretizer transform to get ordinally coded DF
--> 385 disc_ordinal_np = self.discretizer_.transform(X[self.dcols])
    386 disc_ordinal_df = pd.DataFrame(disc_ordinal_np, columns=self.dcols)
    387 disc_ordinal_df_str = disc_ordinal_df.astype(int).astype(str)

AttributeError: 'ExtraBasicDiscretizer' object has no attribute 'discretizer_'

OK, on hindsight I understand why this fails, because we have not trained (no fit before).
Running after the second line the error is similar to the one that we get in the second line.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant