Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ZeroDivisionError specific to snapatac2 pipeline or incompatability issue with scvi-tools? #338

Open
yojetsharma opened this issue Sep 10, 2024 · 3 comments

Comments

@yojetsharma
Copy link

yojetsharma commented Sep 10, 2024

query= snap.pp.make_gene_matrix(atac, snap.genome.hg38)
query
AnnData object with n_obs × n_vars = 58534 × 60606
    obs: 'sample', 'leiden'
reference=snap.read("GEX.h5ad", backed=None)
AnnData object with n_obs × n_vars = 187285 × 2000
    obs: 'sample', 'cell_type'
    var: 'highly_variable'
query.obs['cell_type']=pd.NA
data = ad.concat(
    [reference, query],
    join='inner',
    label='batch',
    keys=["reference", "query"],
    index_unique='_',
)
data
AnnData object with n_obs × n_vars = 245819 × 1397
    obs: 'sample', 'cell_type', 'batch'
sc.pp.filter_genes(data, min_cells=5)
sc.pp.highly_variable_genes(
    data,
    n_top_genes = 3000,
    flavor="seurat_v3",
    batch_key="batch",
    subset=True
)
scvi.model.SCVI.setup_anndata(data, batch_key="batch")
vae = scvi.model.SCVI(
    data,
    n_layers=2,
    n_latent=30,
    gene_likelihood="nb",
    dispersion="gene-batch",
)
vae.train(max_epochs=1000, early_stopping=True)
INFO: GPU available: True (cuda), used: True
2024-09-10 00:49:45 - INFO - GPU available: True (cuda), used: True
INFO: TPU available: False, using: 0 TPU cores
2024-09-10 00:49:45 - INFO - TPU available: False, using: 0 TPU cores
INFO: IPU available: False, using: 0 IPUs
2024-09-10 00:49:45 - INFO - IPU available: False, using: 0 IPUs
INFO: HPU available: False, using: 0 HPUs
2024-09-10 00:49:45 - INFO - HPU available: False, using: 0 HPUs
INFO: LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
2024-09-10 00:49:45 - INFO - LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
/home/user/miniconda3/envs/scvi-env/lib/python3.10/site-packages/lightning/pytorch/trainer/connectors/data_connector.py:441: The 'train_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=31` in the `DataLoader` to improve performance.
/home/user/miniconda3/envs/scvi-env/lib/python3.10/site-packages/lightning/pytorch/trainer/connectors/data_connector.py:441: The 'val_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=31` in the `DataLoader` to improve performance.

Epoch 984/1000:  98%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▉  | 984/1000 [1:50:50<01:48,  6.76s/it, v_num=1, train_loss_step=399, train_loss_epoch=428]
Monitored metric elbo_validation did not improve in the last 45 records. Best score: 425.723. Signaling Trainer to stop.

ax = vae.history['elbo_train'][1:].plot()
vae.history['elbo_validation'].plot(ax=ax)
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[22], line 2
      1 ax = vae.history['elbo_train'][1:].plot()
----> 2 vae.history['elbo_validation'].plot(ax=ax)

File ~/.local/lib/python3.10/site-packages/pandas/plotting/_core.py:1000, in PlotAccessor.__call__(self, *args, **kwargs)
    997             label_name = label_kw or data.columns
    998             data.columns = label_name
-> 1000 return plot_backend.plot(data, kind=kind, **kwargs)

File ~/.local/lib/python3.10/site-packages/pandas/plotting/_matplotlib/__init__.py:71, in plot(data, kind, **kwargs)
     69         kwargs["ax"] = getattr(ax, "left_ax", ax)
     70 plot_obj = PLOT_CLASSES[kind](data, **kwargs)
---> 71 plot_obj.generate()
     72 plot_obj.draw()
     73 return plot_obj.result

File ~/.local/lib/python3.10/site-packages/pandas/plotting/_matplotlib/core.py:454, in MPLPlot.generate(self)
    452 self._make_plot()
    453 self._add_table()
--> 454 self._make_legend()
    455 self._adorn_subplots()
    457 for ax in self.axes:

File ~/.local/lib/python3.10/site-packages/pandas/plotting/_matplotlib/core.py:792, in MPLPlot._make_legend(self)
    790     title = leg.get_title().get_text()
    791     # Replace leg.LegendHandles because it misses marker info
--> 792     handles = leg.legendHandles
    793     labels = [x.get_text() for x in leg.get_texts()]
    795 if self.legend:

AttributeError: 'Legend' object has no attribute 'legendHandles'

data.obs["celltype_scanvi"] = 'Unknown'
ref_idx = data.obs['batch'] == "reference"
data.obs["celltype_scanvi"][ref_idx] = data.obs['cell_type'][ref_idx]
/tmp/ipykernel_2619671/134013430.py:3: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data.obs["celltype_scanvi"][ref_idx] = data.obs['cell_type'][ref_idx]

lvae = scvi.model.SCANVI.from_scvi_model(
    vae,
    adata=data,
    labels_key="celltype_scanvi",
    unlabeled_category="Unknown",
)

lvae = scvi.model.SCANVI.from_scvi_model(
    vae,
    adata=data,
    labels_key="celltype_scanvi",
    unlabeled_category="Unknown",
)
File ~/miniconda3/envs/scvi-env/lib/python3.10/site-packages/scvi/module/_scanvae.py:170, in SCANVAE.__init__(self, n_input, n_batch, n_labels, n_hidden, n_latent, n_layers, n_continuous_cov, n_cats_per_cov, dropout_rate, dispersion, log_variational, gene_likelihood, y_prior, labels_groups, use_labels_groups, linear_classifier, classifier_parameters, use_batch_norm, use_layer_norm, **vae_kwargs)
    147 self.encoder_z2_z1 = Encoder(
    148     n_latent,
    149     n_latent,
   (...)
    156     return_dist=True,
    157 )
    159 self.decoder_z1_z2 = Decoder(
    160     n_latent,
    161     n_latent,
   (...)
    166     use_layer_norm=use_layer_norm_decoder,
    167 )
    169 self.y_prior = torch.nn.Parameter(
--> 170     y_prior if y_prior is not None else (1 / n_labels) * torch.ones(1, n_labels),
    171     requires_grad=False,
    172 )
    173 self.use_labels_groups = use_labels_groups
    174 self.labels_groups = np.array(labels_groups) if labels_groups is not None else None

ZeroDivisionError: division by zero

I had posted this issue on scvi-tools forum and got a response as follows:

Hi, we have never tested transfering to gene activation scores and it doesn’t sound right to me. However, to fix your issue. You are setting all celltypes to None. The way you are then updating it with celltypes is incorrect and it won’t update your matrix (see pandas warning). You should do:

`data.obs.loc[ref_idx, "celltype_scanvi"] = data.obs.loc[ref_idx, 'cell_type']`
[/quote]
@emidalla
Copy link

I have the same issue. Using the whole reference (i.e. scRNAseq data coming from four donors) everything worked, then I subset the reference to only use data coming from the donor of the scATAC-seq data and got the 'division by zero' error.

@yojetsharma
Copy link
Author

It gives this error if you subset it and not otherwise? Because I have been subsetting the data in all the runs. I will try without subsetting and update.

@emidalla
Copy link

It gives this error if you subset it and not otherwise? Because I have been subsetting the data in all the runs. I will try without subsetting and update.

Exactly, only upon subsetting, i.e. (I think) ending up with some empty cell group

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants