Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

normCytof failed for large number of samples #403

Open
jeffsun905 opened this issue Sep 20, 2024 · 5 comments
Open

normCytof failed for large number of samples #403

jeffsun905 opened this issue Sep 20, 2024 · 5 comments

Comments

@jeffsun905
Copy link

Hi,
We are trying to process over 150 samples and it failed at normCytof (critial step to make them comparable as the data came from multiple batches) with a segmentaiton error. Our guess is out of memery or something related. Is there any limit how many samples CATALYST can run and any suggestion to run a large project like this?

@SamGG
Copy link

SamGG commented Sep 20, 2024

You are problably trying to normalize the whole set at once. If you choose a reference FCS, the process just has to normalize against that reference, meaning that only 2 FCS are loaded in memory at the same time. The vignette is just advertizing how to use the functions. Instead of starting from scratch (or the vignette), you'll save a lot of energy by examining the (perhaps too many) published workflows such as https://github.com/prybakowska/CyTOF_analysis_Pipeline1/blob/master/pipeline.R at line 16 (and associated functions) or https://github.com/prybakowska/CytoQP/blob/master/CytoQP_script.R.
Alternatively, you may use the nearly original beads normalization given at https://biosurf.org/cytof_data_scientist.html#313_Performing_bead_normalization.
HTH

@jeffsun905
Copy link
Author

jeffsun905 commented Sep 22, 2024

Thank you very much for the hints. Yes, I was trying to do that. The function worked fine for ~50 samples so I thouhgt it should be fine as our system has ~1T memory. While working on the suggested pipleine doing one vs one normalization, I also tested spliting the big data into small batches and then provide one batch at time to normCytof function with a commone reference sample specificed. To my surprise, the line plot is very different from the one without specifing a reference (i.e., all provided samples together, here i just tested 5 samples) as the lines "after" are all close to 0 while the other one has the simiar values as the "before". I would greatly appreciate your insights about the differences.
Line plot with a reference specified:
image
Line plot without reference:
image

@HelenaLC
Copy link
Owner

Sth looks off in the 1st plot- could you provide relevant code how you did the normalization using a fixed references for split batches?

@jeffsun905
Copy link
Author

Here are the codes:

  1. refsam <- "fcs/myfcs1.fcs"
  2. sce.test5 <- prepData(c("fcs/myfcs1.fcs","fcs/myfcs2.fcs","fcs/myfcs3.fcs","fcs/myfcs4.fcs","fcs/myfcs5.fcs"),channelfile, md, transform = TRUE, truncate_max_range = FALSE) #here the channelfile has three columns of "antigen" "fcs_colname" "marker_class"; the md file has "file_name" "sample_id" "condition" "patient_id" "batch" "age" "sex"
  3. mynorm.ref <- normCytof(sce.test5, beads = "dvs", k = 50, norm_to = refsam, assays = c("counts", "exprs"), overwrite = TRUE, plot = T)
  4. mynorm.noref <- normCytof(sce.test5, beads = "dvs", k = 50, assays = c("counts", "exprs"), overwrite = TRUE, plot = T)
    The first plot is from line 3 with a reference fcs specified. The second is from line 4 (we have done this before and it looks pretty normal).
    Sessioninfo (only relavant ones):
    R version 4.3.2 (2023-10-31)
    CATALYST_1.26.1

@jeffsun905
Copy link
Author

jeffsun905 commented Sep 27, 2024

Are you able to replicate the issue? I suspected when the reference was specified, somehow the first figure may have used transformed data. I looked a bit more at the function and there are two lines of code to get "bl" depending on if a reference is provided. The scale of the two is very different (like 100 times, 2k vs 20 something), which might explain the odd line plot. Does this affect the one vs one normalization suggested from the first response post (https://github.com/prybakowska/CyTOF_analysis_Pipeline1/blob/master/pipeline.R), which seems using the function as well. Thank you for looking into this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants