Non negative matrix factorization for Oligo lineage scRNAseq data
This repository contains an R script (script.R
) for analyzing single-cell RNA sequencing (scRNA-seq) data using Non-negative Matrix Factorization (NMF) with Seurat and RcppML libraries. Below is a step-by-step guide on how to reproduce the analysis:
-
Load Required Libraries
- Seurat:
library(Seurat)
- dplyr:
library(dplyr)
- RcppML:
library(RcppML)
- Seurat:
-
Set Seed for Reproducibility
set.seed(200)
-
Load Seurat Object
- Replace
/data/nasser/Manuscript/processedobject/ODC35_woClus8_subclust3_res0.15_NK
with your own path. ol <- readRDS("/data/nasser/Manuscript/processedobject/ODC35_woClus8_subclust3_res0.15_NK")
- Replace
-
Set Cell Type Identities
Idents(ol) <- "CellType"
-
Subset the Data
- Choose specific cell types for analysis (
iODC
,iOPC
,iPPC_0
,iPPC_1
,iPPC_2
). - Uncomment and modify if subsetting by
iCEP
is required. pd <- subset(ol, idents = c("iODC", "iOPC", "iPPC_0", "iPPC_1", "iPPC_2"))
- Choose specific cell types for analysis (
-
Optional: Visualize Using DimPlot
- Uncomment
DimPlot(pd)
to visualize the subset data.
- Uncomment
-
Clean the Cluster (Optional)
- Clean UMAP coordinates to focus on specific areas (
umap1 > -2 & umap2 > 1
).
- Clean UMAP coordinates to focus on specific areas (
-
Extract Expression Matrix
- Extract the RNA expression matrix from the subsetted Seurat object.
expression_matrix <- LayerData(object = pd, assay = "RNA", layer = "data")
- Remove rows with NA or null values and those with all zero values.
-
Set Number of Clusters
- Determine the number of clusters based on unique cell types.
num_clusters <- length(unique(pd$CellType))
-
Perform NMF
- Apply NMF to the expression matrix.
nmf_result <- nmf(expression_matrix, k = num_clusters, tol = 1e-4, maxit = 500)
-
Extract Basis (W) and Coefficient (H) Matrices
- Retrieve the basis (W) and coefficient (H) matrices from the NMF result.
W <- nmf_result$w
H <- nmf_result$h
-
Identify Most Influential Genes
- Determine top influential genes for each cluster.
- Store results in
influential_genes
list.
-
Convert Results to Data Frame
- Convert the list of influential genes to a tidy data frame (
influential_genes_df
).
- Convert the list of influential genes to a tidy data frame (
-
Example Visualization
- Generate feature plots for each cluster using influential genes.
- Adjust paths (
readRDS
) and parameters (subset
,nmf
) based on your specific data and analysis requirements. - Ensure all necessary libraries (
Seurat
,dplyr
,RcppML
) are installed and loaded before running the script.