Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Multi modality pipeline] Are number of cells and var attributes required to be the same while performing rna atac embedding? #343

Open
yojetsharma opened this issue Oct 1, 2024 · 2 comments

Comments

@yojetsharma
Copy link

yojetsharma commented Oct 1, 2024

I preprocessed the snRNA of my multiome using scanpy and it has 59000 cells with different var attributes (‘highly variable’), while obs are ‘sample’ and ‘leiden’. The obs attributes are the same in ATAC processed using snapatac2 but var attributes are not.
When I run, assert (rna.obs_names == atac.obs_names).all() I get an error saying the “lengths should match”.

@yojetsharma yojetsharma changed the title Are number of cells and var attributes required to be the same while performing rna atac embedding? [Multi modality pipeline] Are number of cells and var attributes required to be the same while performing rna atac embedding? Oct 1, 2024
@kaizhang
Copy link
Owner

kaizhang commented Oct 2, 2024

Variable length doesn't need to be the same. ATAC and RNA must share exactly the same barcodes, i.e., the data are coming from the same cell. You can run assert (rna.obs_names == atac.obs_names).all() yourself to make sure.

@yojetsharma
Copy link
Author

yojetsharma commented Oct 2, 2024

After running assert (rna.obs_names == atac.obs_names).all() I get the following error:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[27], line 1
----> 1 assert (rna.obs_names == atac.obs_names).all()

File ~/.conda/envs/scarches/lib/python3.9/site-packages/pandas/core/ops/common.py:72, in _unpack_zerodim_and_defer.<locals>.new_method(self, other)
     68             return NotImplemented
     70 other = item_from_zerodim(other)
---> 72 return method(self, other)

File ~/.conda/envs/scarches/lib/python3.9/site-packages/pandas/core/arraylike.py:42, in OpsMixin.__eq__(self, other)
     40 @unpack_zerodim_and_defer("__eq__")
     41 def __eq__(self, other):
---> 42     return self._cmp_method(other, operator.eq)

File ~/.conda/envs/scarches/lib/python3.9/site-packages/pandas/core/indexes/base.py:6962, in Index._cmp_method(self, other, op)
   6957         return arr
   6959 if isinstance(other, (np.ndarray, Index, ABCSeries, ExtensionArray)) and len(
   6960     self
   6961 ) != len(other):
-> 6962     raise ValueError("Lengths must match to compare")
   6964 if not isinstance(other, ABCMultiIndex):
   6965     other = extract_array(other, extract_numpy=True)

ValueError: Lengths must match to compare

Since, mine is a multiome expt, could this be due to order in which the barcodes are present in both modalities are not the same?

Update: there seems to be a barcode mismatch in my multiome RNA preprocessed using scanpy and multiome ATAC preprocessed using SnapATAC2. Any idea how I can prevent this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants