Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

obs_names resets to original barcodes when assigned via AnnDataset #278

Open
newtonharry opened this issue Mar 30, 2024 · 0 comments
Open

Comments

@newtonharry
Copy link

newtonharry commented Mar 30, 2024

Hi,

So I'm currently working on a multi-sample pipeline and I've followed your tutorial. I currently have to rename barcodes such that they're the same across snATAC-seq and snRNA-seq data. I'm appending the sample to the beginning of the barcode as such:

E14_data_directory = f"{data_directory}/snATAC-seq/E14"
E18_data_directory = f"{data_directory}/snATAC-seq/E18"


def generate_file_paths(file_names, dir):
    input_files = [f"{dir}/fragments/{file_name}.tsv.gz" for file_name in file_names]
    adata_samples = [f"{dir}/anndata/{file_name}.h5ad" for file_name in file_names]

    return input_files, adata_samples


E18_file_names = ["E18_Hom_1", "E18_Hom_2", "E18_Con_1", "E18_Con_2"]
E18_input_files, E18_adata_samples = generate_file_paths(
    E18_file_names, E18_data_directory
)

E14_file_names = ["E14_Hom_1", "E14_Hom_2", "E14_Con_1", "E14_Con_2"]
E14_input_files, E14_adata_samples = generate_file_paths(
    E14_file_names, E14_data_directory
)


combined_file_names = [*E14_file_names, *E18_file_names]
combined_input_files = [*E14_input_files, *E18_input_files]
combined_adata_samples = [*E14_adata_samples, *E18_adata_samples]

# NOTE: ONLY set this to true if you have RAM mounted as a file system (like /dev/shm)
use_ram = False

# Genome to use
chrom_sizes = snap.genome.mm39.chrom_sizes


if use_ram:
    os.mkdir("/dev/shm/snap_tmp_dir")
    adatas = snap.pp.import_data(
        combined_input_files,
        sorted_by_barcode=False,
        file=combined_adata_samples,
        tempdir="/dev/shm/snap_tmp_dir",
        chrom_sizes=snap.genome.mm10,
        min_num_fragments=1000,
    )
else:
    adatas = snap.pp.import_data(
        combined_input_files,
        sorted_by_barcode=False,
        file=combined_adata_samples,
        tempdir=f"{data_directory}/tmp",
        chrom_sizes=snap.genome.mm10,
        min_num_fragments=1000,
    )

for adata, sample_name in zip(adatas, combined_file_names):
    adata.obs_names = [f"{sample_name}:{barcode}" for barcode in adata.obs_names]

atac_data_adataset = snap.AnnDataSet(
        adatas=[(name, adata) for name, adata in zip(combined_file_names, adatas)],
        filename=f"{data_directory}/snATAC-seq/atac_brain_data.h5ads"
    )

This code successfully renames each barcode within each sample and the renamed barcodes persist when I load it into an AnnDataSet object. When I try to do that same to an AnnDataSet object, the indexes reset when I reload it. I think there must be no synchronisation happening between obs_names and obs.index in your pyanndata. Perhaps it's a good idea to be able to rename the obs_names from an AnnDataSet object instead of having to load each AnnData and assign it there.

@newtonharry newtonharry changed the title obs_names resets to original barcodes within AnnDataset upon reloading or construction of AnnDataSet obs_names resets to original barcodes when assigned via AnnDataset Mar 30, 2024
@newtonharry newtonharry changed the title obs_names resets to original barcodes when assigned via AnnDataset obs_names resets to original barcodes when assigned via AnnDataSet Mar 30, 2024
@newtonharry newtonharry changed the title obs_names resets to original barcodes when assigned via AnnDataSet obs_names resets to original barcodes when assigned via AnnDataset Mar 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant