normCytof failed for large number of samples #403

jeffsun905 · 2024-09-20T17:00:06Z

Hi,
We are trying to process over 150 samples and it failed at normCytof (critial step to make them comparable as the data came from multiple batches) with a segmentaiton error. Our guess is out of memery or something related. Is there any limit how many samples CATALYST can run and any suggestion to run a large project like this?

SamGG · 2024-09-20T20:30:47Z

You are problably trying to normalize the whole set at once. If you choose a reference FCS, the process just has to normalize against that reference, meaning that only 2 FCS are loaded in memory at the same time. The vignette is just advertizing how to use the functions. Instead of starting from scratch (or the vignette), you'll save a lot of energy by examining the (perhaps too many) published workflows such as https://github.com/prybakowska/CyTOF_analysis_Pipeline1/blob/master/pipeline.R at line 16 (and associated functions) or https://github.com/prybakowska/CytoQP/blob/master/CytoQP_script.R.
Alternatively, you may use the nearly original beads normalization given at https://biosurf.org/cytof_data_scientist.html#313_Performing_bead_normalization.
HTH

jeffsun905 · 2024-09-22T17:03:58Z

Thank you very much for the hints. Yes, I was trying to do that. The function worked fine for ~50 samples so I thouhgt it should be fine as our system has ~1T memory. While working on the suggested pipleine doing one vs one normalization, I also tested spliting the big data into small batches and then provide one batch at time to normCytof function with a commone reference sample specificed. To my surprise, the line plot is very different from the one without specifing a reference (i.e., all provided samples together, here i just tested 5 samples) as the lines "after" are all close to 0 while the other one has the simiar values as the "before". I would greatly appreciate your insights about the differences.
Line plot with a reference specified:

Line plot without reference:

HelenaLC · 2024-09-22T18:58:48Z

Sth looks off in the 1st plot- could you provide relevant code how you did the normalization using a fixed references for split batches?

jeffsun905 · 2024-09-23T01:52:38Z

Here are the codes:

refsam <- "fcs/myfcs1.fcs"
sce.test5 <- prepData(c("fcs/myfcs1.fcs","fcs/myfcs2.fcs","fcs/myfcs3.fcs","fcs/myfcs4.fcs","fcs/myfcs5.fcs"),channelfile, md, transform = TRUE, truncate_max_range = FALSE) #here the channelfile has three columns of "antigen" "fcs_colname" "marker_class"; the md file has "file_name" "sample_id" "condition" "patient_id" "batch" "age" "sex"
mynorm.ref <- normCytof(sce.test5, beads = "dvs", k = 50, norm_to = refsam, assays = c("counts", "exprs"), overwrite = TRUE, plot = T)
mynorm.noref <- normCytof(sce.test5, beads = "dvs", k = 50, assays = c("counts", "exprs"), overwrite = TRUE, plot = T)
The first plot is from line 3 with a reference fcs specified. The second is from line 4 (we have done this before and it looks pretty normal).
Sessioninfo (only relavant ones):
R version 4.3.2 (2023-10-31)
CATALYST_1.26.1

jeffsun905 · 2024-09-27T14:51:46Z

Are you able to replicate the issue? I suspected when the reference was specified, somehow the first figure may have used transformed data. I looked a bit more at the function and there are two lines of code to get "bl" depending on if a reference is provided. The scale of the two is very different (like 100 times, 2k vs 20 something), which might explain the odd line plot. Does this affect the one vs one normalization suggested from the first response post (https://github.com/prybakowska/CyTOF_analysis_Pipeline1/blob/master/pipeline.R), which seems using the function as well. Thank you for looking into this!

HelenaLC mentioned this issue Nov 29, 2024

Parallelised PrepData()? #413

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

normCytof failed for large number of samples #403

normCytof failed for large number of samples #403

jeffsun905 commented Sep 20, 2024

SamGG commented Sep 20, 2024

jeffsun905 commented Sep 22, 2024 •

edited

Loading

HelenaLC commented Sep 22, 2024

jeffsun905 commented Sep 23, 2024

jeffsun905 commented Sep 27, 2024 •

edited

Loading

normCytof failed for large number of samples #403

normCytof failed for large number of samples #403

Comments

jeffsun905 commented Sep 20, 2024

SamGG commented Sep 20, 2024

jeffsun905 commented Sep 22, 2024 • edited Loading

HelenaLC commented Sep 22, 2024

jeffsun905 commented Sep 23, 2024

jeffsun905 commented Sep 27, 2024 • edited Loading

jeffsun905 commented Sep 22, 2024 •

edited

Loading

jeffsun905 commented Sep 27, 2024 •

edited

Loading