Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error when executing MRMR - IndexingError: Unalignable boolean Series provided as indexer (index of the boolean Series and of the indexed object do not match) #41

Open
MrVtR opened this issue Apr 1, 2024 · 2 comments

Comments

@MrVtR
Copy link

MrVtR commented Apr 1, 2024

Hello, I'm with this error below when executing mrmr, however, as far as I searched here, this shows that X and y used have different shapes, and mine doesn't, is there a way to solve this error?

My data x the mrmr example(It's a matrix of 0s and 1s):
image
image
image

from mrmr import mrmr_classif
selected_features = mrmr_classif(X=X_dtm, y=y, K=10)

The error when I execute with my data:

---------------------------------------------------------------------------
_RemoteTraceback                          Traceback (most recent call last)
_RemoteTraceback: 
"""
Traceback (most recent call last):
  File "C:\Users\vitor\AppData\Local\Programs\Python\Python310\lib\site-packages\joblib\externals\loky\process_executor.py", line 463, in _process_worker
    r = call_item()
  File "C:\Users\vitor\AppData\Local\Programs\Python\Python310\lib\site-packages\joblib\externals\loky\process_executor.py", line 291, in __call__
    return self.fn(*self.args, **self.kwargs)
  File "C:\Users\vitor\AppData\Local\Programs\Python\Python310\lib\site-packages\joblib\parallel.py", line 589, in __call__
    return [func(*args, **kwargs)
  File "C:\Users\vitor\AppData\Local\Programs\Python\Python310\lib\site-packages\joblib\parallel.py", line 589, in <listcomp>
    return [func(*args, **kwargs)
  File "C:\Users\vitor\AppData\Local\Programs\Python\Python310\lib\site-packages\mrmr\pandas.py", line 31, in _f_classif
    return X.apply(lambda col: _f_classif_series(col, y)).fillna(0.0)
  File "C:\Users\vitor\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\core\frame.py", line 9423, in apply
    return op.apply().__finalize__(self, method="apply")
  File "C:\Users\vitor\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\core\apply.py", line 678, in apply
    return self.apply_standard()
  File "C:\Users\vitor\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\core\apply.py", line 798, in apply_standard
    results, res_index = self.apply_series_generator()
  File "C:\Users\vitor\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\core\apply.py", line 814, in apply_series_generator
    results[i] = self.f(v)
  File "C:\Users\vitor\AppData\Local\Programs\Python\Python310\lib\site-packages\mrmr\pandas.py", line 31, in <lambda>
    return X.apply(lambda col: _f_classif_series(col, y)).fillna(0.0)
  File "C:\Users\vitor\AppData\Local\Programs\Python\Python310\lib\site-packages\mrmr\pandas.py", line 29, in _f_classif_series
    return sklearn_f_classif(x[x_not_na].to_frame(), y[x_not_na])[0][0]
  File "C:\Users\vitor\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\core\series.py", line 1029, in __getitem__
    key = check_bool_indexer(self.index, key)
  File "C:\Users\vitor\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\core\indexing.py", line 2506, in check_bool_indexer
    raise IndexingError(
pandas.errors.IndexingError: Unalignable boolean Series provided as indexer (index of the boolean Series and of the indexed object do not match).
"""

The above exception was the direct cause of the following exception:

IndexingError                             Traceback (most recent call last)
Cell In [54], line 10
      3 # from sklearn.datasets import make_classification
      4 # X, y = make_classification(n_samples = 1000, n_features = 50, n_informative = 10, n_redundant = 40)
      5 # X = pd.DataFrame(X)
      6 # y = pd.Series(y)
      7 
      8 # select top 10 features using mRMR
      9 from mrmr import mrmr_classif
---> 10 selected_features = mrmr_classif(X=X_dtm, y=y, K=10)

File ~\AppData\Local\Programs\Python\Python310\lib\site-packages\mrmr\pandas.py:171, in mrmr_classif(X, y, K, relevance, redundancy, denominator, cat_features, cat_encoding, only_same_domain, return_scores, n_jobs, show_progress)
    168 relevance_args = {'X': X, 'y': y}
    169 redundancy_args = {'X': X}
--> 171 return mrmr_base(K=K, relevance_func=relevance_func, redundancy_func=redundancy_func,
    172                  relevance_args=relevance_args, redundancy_args=redundancy_args,
    173                  denominator_func=denominator_func, only_same_domain=only_same_domain,
    174                  return_scores=return_scores, show_progress=show_progress)

File ~\AppData\Local\Programs\Python\Python310\lib\site-packages\mrmr\main.py:98, in mrmr_base(K, relevance_func, redundancy_func, relevance_args, redundancy_args, denominator_func, only_same_domain, return_scores, show_progress)
     44 def mrmr_base(K, relevance_func, redundancy_func,
     45               relevance_args={}, redundancy_args={},
     46               denominator_func=np.mean, only_same_domain=False,
     47               return_scores=False, show_progress=True):
     48     """General function for mRMR algorithm.
     49 
     50     Parameters
   (...)
     95         List of selected features.
     96     """
---> 98     relevance = relevance_func(**relevance_args)
     99     features = relevance[relevance.fillna(0) > 0].index.to_list()
    100     relevance = relevance.loc[features]

File ~\AppData\Local\Programs\Python\Python310\lib\site-packages\mrmr\pandas.py:45, in f_classif(X, y, n_jobs)
     44 def f_classif(X, y, n_jobs):
---> 45     return parallel_df(_f_classif, X, y, n_jobs=n_jobs)

File ~\AppData\Local\Programs\Python\Python310\lib\site-packages\mrmr\pandas.py:17, in parallel_df(func, df, series, n_jobs)
     15 n_jobs = min(cpu_count(), len(df.columns)) if n_jobs == -1 else min(cpu_count(), n_jobs)
     16 col_chunks = np.array_split(range(len(df.columns)), n_jobs)
---> 17 lst = Parallel(n_jobs=n_jobs)(
     18     delayed(func)(df.iloc[:, col_chunk], series)
     19     for col_chunk in col_chunks
     20 )
     21 return pd.concat(lst)

File ~\AppData\Local\Programs\Python\Python310\lib\site-packages\joblib\parallel.py:1952, in Parallel.__call__(self, iterable)
   1946 # The first item from the output is blank, but it makes the interpreter
   1947 # progress until it enters the Try/Except block of the generator and
   1948 # reach the first `yield` statement. This starts the aynchronous
   1949 # dispatch of the tasks to the workers.
   1950 next(output)
-> 1952 return output if self.return_generator else list(output)

File ~\AppData\Local\Programs\Python\Python310\lib\site-packages\joblib\parallel.py:1595, in Parallel._get_outputs(self, iterator, pre_dispatch)
   1592     yield
   1594     with self._backend.retrieval_context():
-> 1595         yield from self._retrieve()
   1597 except GeneratorExit:
   1598     # The generator has been garbage collected before being fully
   1599     # consumed. This aborts the remaining tasks if possible and warn
   1600     # the user if necessary.
   1601     self._exception = True

File ~\AppData\Local\Programs\Python\Python310\lib\site-packages\joblib\parallel.py:1699, in Parallel._retrieve(self)
   1692 while self._wait_retrieval():
   1693 
   1694     # If the callback thread of a worker has signaled that its task
   1695     # triggered an exception, or if the retrieval loop has raised an
   1696     # exception (e.g. `GeneratorExit`), exit the loop and surface the
   1697     # worker traceback.
   1698     if self._aborting:
-> 1699         self._raise_error_fast()
   1700         break
   1702     # If the next job is not ready for retrieval yet, we just wait for
   1703     # async callbacks to progress.

File ~\AppData\Local\Programs\Python\Python310\lib\site-packages\joblib\parallel.py:1734, in Parallel._raise_error_fast(self)
   1730 # If this error job exists, immediatly raise the error by
   1731 # calling get_result. This job might not exists if abort has been
   1732 # called directly or if the generator is gc'ed.
   1733 if error_job is not None:
-> 1734     error_job.get_result(self.timeout)

File ~\AppData\Local\Programs\Python\Python310\lib\site-packages\joblib\parallel.py:736, in BatchCompletionCallBack.get_result(self, timeout)
    730 backend = self.parallel._backend
    732 if backend.supports_retrieve_callback:
    733     # We assume that the result has already been retrieved by the
    734     # callback thread, and is stored internally. It's just waiting to
    735     # be returned.
--> 736     return self._return_or_raise()
    738 # For other backends, the main thread needs to run the retrieval step.
    739 try:

File ~\AppData\Local\Programs\Python\Python310\lib\site-packages\joblib\parallel.py:754, in BatchCompletionCallBack._return_or_raise(self)
    752 try:
    753     if self.status == TASK_ERROR:
--> 754         raise self._result
    755     return self._result
    756 finally:

IndexingError: Unalignable boolean Series provided as indexer (index of the boolean Series and of the indexed object do not match).
@Alexanderstaehle
Copy link

I have the same issue

@smazzanti
Copy link
Owner

@MrVtR can you check whether X and y have the same index?
assert np.all(X.index == y.index)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants