Error when executing MRMR - IndexingError: Unalignable boolean Series provided as indexer (index of the boolean Series and of the indexed object do not match) #41

MrVtR · 2024-04-01T19:01:05Z

Hello, I'm with this error below when executing mrmr, however, as far as I searched here, this shows that X and y used have different shapes, and mine doesn't, is there a way to solve this error?

My data x the mrmr example(It's a matrix of 0s and 1s):

from mrmr import mrmr_classif
selected_features = mrmr_classif(X=X_dtm, y=y, K=10)

The error when I execute with my data:

---------------------------------------------------------------------------
_RemoteTraceback                          Traceback (most recent call last)
_RemoteTraceback: 
"""
Traceback (most recent call last):
  File "C:\Users\vitor\AppData\Local\Programs\Python\Python310\lib\site-packages\joblib\externals\loky\process_executor.py", line 463, in _process_worker
    r = call_item()
  File "C:\Users\vitor\AppData\Local\Programs\Python\Python310\lib\site-packages\joblib\externals\loky\process_executor.py", line 291, in __call__
    return self.fn(*self.args, **self.kwargs)
  File "C:\Users\vitor\AppData\Local\Programs\Python\Python310\lib\site-packages\joblib\parallel.py", line 589, in __call__
    return [func(*args, **kwargs)
  File "C:\Users\vitor\AppData\Local\Programs\Python\Python310\lib\site-packages\joblib\parallel.py", line 589, in <listcomp>
    return [func(*args, **kwargs)
  File "C:\Users\vitor\AppData\Local\Programs\Python\Python310\lib\site-packages\mrmr\pandas.py", line 31, in _f_classif
    return X.apply(lambda col: _f_classif_series(col, y)).fillna(0.0)
  File "C:\Users\vitor\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\core\frame.py", line 9423, in apply
    return op.apply().__finalize__(self, method="apply")
  File "C:\Users\vitor\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\core\apply.py", line 678, in apply
    return self.apply_standard()
  File "C:\Users\vitor\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\core\apply.py", line 798, in apply_standard
    results, res_index = self.apply_series_generator()
  File "C:\Users\vitor\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\core\apply.py", line 814, in apply_series_generator
    results[i] = self.f(v)
  File "C:\Users\vitor\AppData\Local\Programs\Python\Python310\lib\site-packages\mrmr\pandas.py", line 31, in <lambda>
    return X.apply(lambda col: _f_classif_series(col, y)).fillna(0.0)
  File "C:\Users\vitor\AppData\Local\Programs\Python\Python310\lib\site-packages\mrmr\pandas.py", line 29, in _f_classif_series
    return sklearn_f_classif(x[x_not_na].to_frame(), y[x_not_na])[0][0]
  File "C:\Users\vitor\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\core\series.py", line 1029, in __getitem__
    key = check_bool_indexer(self.index, key)
  File "C:\Users\vitor\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\core\indexing.py", line 2506, in check_bool_indexer
    raise IndexingError(
pandas.errors.IndexingError: Unalignable boolean Series provided as indexer (index of the boolean Series and of the indexed object do not match).
"""

The above exception was the direct cause of the following exception:

IndexingError                             Traceback (most recent call last)
Cell In [54], line 10
      3 # from sklearn.datasets import make_classification
      4 # X, y = make_classification(n_samples = 1000, n_features = 50, n_informative = 10, n_redundant = 40)
      5 # X = pd.DataFrame(X)
      6 # y = pd.Series(y)
      7 
      8 # select top 10 features using mRMR
      9 from mrmr import mrmr_classif
---> 10 selected_features = mrmr_classif(X=X_dtm, y=y, K=10)

File ~\AppData\Local\Programs\Python\Python310\lib\site-packages\mrmr\pandas.py:171, in mrmr_classif(X, y, K, relevance, redundancy, denominator, cat_features, cat_encoding, only_same_domain, return_scores, n_jobs, show_progress)
    168 relevance_args = {'X': X, 'y': y}
    169 redundancy_args = {'X': X}
--> 171 return mrmr_base(K=K, relevance_func=relevance_func, redundancy_func=redundancy_func,
    172                  relevance_args=relevance_args, redundancy_args=redundancy_args,
    173                  denominator_func=denominator_func, only_same_domain=only_same_domain,
    174                  return_scores=return_scores, show_progress=show_progress)

File ~\AppData\Local\Programs\Python\Python310\lib\site-packages\mrmr\main.py:98, in mrmr_base(K, relevance_func, redundancy_func, relevance_args, redundancy_args, denominator_func, only_same_domain, return_scores, show_progress)
     44 def mrmr_base(K, relevance_func, redundancy_func,
     45               relevance_args={}, redundancy_args={},
     46               denominator_func=np.mean, only_same_domain=False,
     47               return_scores=False, show_progress=True):
     48     """General function for mRMR algorithm.
     49 
     50     Parameters
   (...)
     95         List of selected features.
     96     """
---> 98     relevance = relevance_func(**relevance_args)
     99     features = relevance[relevance.fillna(0) > 0].index.to_list()
    100     relevance = relevance.loc[features]

File ~\AppData\Local\Programs\Python\Python310\lib\site-packages\mrmr\pandas.py:45, in f_classif(X, y, n_jobs)
     44 def f_classif(X, y, n_jobs):
---> 45     return parallel_df(_f_classif, X, y, n_jobs=n_jobs)

File ~\AppData\Local\Programs\Python\Python310\lib\site-packages\mrmr\pandas.py:17, in parallel_df(func, df, series, n_jobs)
     15 n_jobs = min(cpu_count(), len(df.columns)) if n_jobs == -1 else min(cpu_count(), n_jobs)
     16 col_chunks = np.array_split(range(len(df.columns)), n_jobs)
---> 17 lst = Parallel(n_jobs=n_jobs)(
     18     delayed(func)(df.iloc[:, col_chunk], series)
     19     for col_chunk in col_chunks
     20 )
     21 return pd.concat(lst)

File ~\AppData\Local\Programs\Python\Python310\lib\site-packages\joblib\parallel.py:1952, in Parallel.__call__(self, iterable)
   1946 # The first item from the output is blank, but it makes the interpreter
   1947 # progress until it enters the Try/Except block of the generator and
   1948 # reach the first `yield` statement. This starts the aynchronous
   1949 # dispatch of the tasks to the workers.
   1950 next(output)
-> 1952 return output if self.return_generator else list(output)

File ~\AppData\Local\Programs\Python\Python310\lib\site-packages\joblib\parallel.py:1595, in Parallel._get_outputs(self, iterator, pre_dispatch)
   1592     yield
   1594     with self._backend.retrieval_context():
-> 1595         yield from self._retrieve()
   1597 except GeneratorExit:
   1598     # The generator has been garbage collected before being fully
   1599     # consumed. This aborts the remaining tasks if possible and warn
   1600     # the user if necessary.
   1601     self._exception = True

File ~\AppData\Local\Programs\Python\Python310\lib\site-packages\joblib\parallel.py:1699, in Parallel._retrieve(self)
   1692 while self._wait_retrieval():
   1693 
   1694     # If the callback thread of a worker has signaled that its task
   1695     # triggered an exception, or if the retrieval loop has raised an
   1696     # exception (e.g. `GeneratorExit`), exit the loop and surface the
   1697     # worker traceback.
   1698     if self._aborting:
-> 1699         self._raise_error_fast()
   1700         break
   1702     # If the next job is not ready for retrieval yet, we just wait for
   1703     # async callbacks to progress.

File ~\AppData\Local\Programs\Python\Python310\lib\site-packages\joblib\parallel.py:1734, in Parallel._raise_error_fast(self)
   1730 # If this error job exists, immediatly raise the error by
   1731 # calling get_result. This job might not exists if abort has been
   1732 # called directly or if the generator is gc'ed.
   1733 if error_job is not None:
-> 1734     error_job.get_result(self.timeout)

File ~\AppData\Local\Programs\Python\Python310\lib\site-packages\joblib\parallel.py:736, in BatchCompletionCallBack.get_result(self, timeout)
    730 backend = self.parallel._backend
    732 if backend.supports_retrieve_callback:
    733     # We assume that the result has already been retrieved by the
    734     # callback thread, and is stored internally. It's just waiting to
    735     # be returned.
--> 736     return self._return_or_raise()
    738 # For other backends, the main thread needs to run the retrieval step.
    739 try:

File ~\AppData\Local\Programs\Python\Python310\lib\site-packages\joblib\parallel.py:754, in BatchCompletionCallBack._return_or_raise(self)
    752 try:
    753     if self.status == TASK_ERROR:
--> 754         raise self._result
    755     return self._result
    756 finally:

IndexingError: Unalignable boolean Series provided as indexer (index of the boolean Series and of the indexed object do not match).

Alexanderstaehle · 2024-04-04T11:26:10Z

I have the same issue

smazzanti · 2024-05-08T19:30:37Z

@MrVtR can you check whether X and y have the same index?
assert np.all(X.index == y.index)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error when executing MRMR - IndexingError: Unalignable boolean Series provided as indexer (index of the boolean Series and of the indexed object do not match) #41

Error when executing MRMR - IndexingError: Unalignable boolean Series provided as indexer (index of the boolean Series and of the indexed object do not match) #41

MrVtR commented Apr 1, 2024 •

edited

Loading

Alexanderstaehle commented Apr 4, 2024

smazzanti commented May 8, 2024

Error when executing MRMR - IndexingError: Unalignable boolean Series provided as indexer (index of the boolean Series and of the indexed object do not match) #41

Error when executing MRMR - IndexingError: Unalignable boolean Series provided as indexer (index of the boolean Series and of the indexed object do not match) #41

Comments

MrVtR commented Apr 1, 2024 • edited Loading

Alexanderstaehle commented Apr 4, 2024

smazzanti commented May 8, 2024

MrVtR commented Apr 1, 2024 •

edited

Loading