Analysis of an unknown single-cell dataset for Single Cell and Spatial Omics class of the QCB master
- QC Metrics: Evaluate number of genes detected per cell, total counts per cell, and percentage of reads mapping to mitochondrial genes or spike-in RNAs (ERCCs).
- Filtering: Remove cells with low counts, low gene detection, or high mitochondrial content (or high ERCC counts).
- Normalization: Apply
LogNormalize
to scale data, ensuring each cell has the same total expression. - Variable Feature Identification: Use
FindVariableFeatures
to identify highly variable genes. - Scaling: Standardize data to zero mean and unit variance using
ScaleData
.
- PCA: Perform Principal Component Analysis to reduce data dimensionality.
- UMAP/t-SNE: Further reduce dimensions for visualization using UMAP or t-SNE.
- Clustering: Group cells into clusters using the Louvain or Leiden algorithm based on gene expression patterns.
- Marker Genes: Conduct differential expression analysis (
FindMarkers
) to identify cluster-specific marker genes.
- Cell Cycle Scoring: Assign cell cycle phases (G1, S, G2/M) to each cell using known cell cycle markers.
- Cluster Annotation: Annotate clusters based on known marker genes or databases (e.g., CellMarker, PanglaoDB).
- Hypothesis Formation: Formulate a hypothesis about the tissue origin and cell types based on clusters and marker genes.
This analysis was conducted using the following R packages:
- Seurat: An R package designed for QC, analysis, and exploration of single-cell RNA-seq data.
- ggplot2: An R package for data visualization based on the grammar of graphics.
- tidyverse: A collection of R packages for data manipulation and visualization, including ggplot2.
- patchwork: An R package to combine separate ggplots into a single graphic.
- HGNChelper: Functions for identifying and correcting HGNC human gene symbols and MGI mouse gene symbols.
- ggraph: Visualization of graph networks using ggplot2.
- igraph: Network analysis and visualization.
- data.tree: Create and manipulate tree structures from hierarchical data.