You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
sanitize_datatable = function(df, ...) {
# remove dashes which cause wrapping
DT::datatable(df, ..., rownames=gsub("-", "_", rownames(df)),
colnames=gsub("-", "_", colnames(df)))
}
Overview
Project: r project
PI: r PI
Analyst: r analyst
Experiment: r experiment
Aim: r aim
Samples and metadata
###Main factor of interest###
# This will be used to order the samples and annotate the QC figures
FOI <- "tissue_type"
## May 8th, I edited the metadata read in to take the input csv from nf-core which MAY have duplicated lines in the case of dup lanes##
meta_df=read_csv(metadata_fn) %>% arrange(.data[[FOI]]) %>% distinct(sample, .keep_all = T)
order <- meta_df$sample
ggplot(meta_df, aes(.data[[FOI]], fill = .data[[FOI]])) +
geom_bar() + ylab("") + xlab("") +
scale_fill_cb_friendly()
#get min percent mapped reads for reference
min_pct_mapped <- round(min(metrics$mapped_reads/metrics$total_reads)*100,1)
max_pct_mapped <- round(max(metrics$mapped_reads/metrics$total_reads)*100,1)
Mapping rate
The genomic mapping rate represents the percentage of reads mapping to the reference genome. We want to see consistent mapping rates between samples and over 70% mapping. These samples have mapping rates (r min_pct_mapped - r max_pct_mapped%).
We expect consistency in the box plots here between the samples, i.e. the distribution of counts across the genes is similar
metrics_small <- metrics %>% dplyr::select(sample, .data[[FOI]])
metrics_small <- left_join(sample_names, metrics_small)
counts <-
assays(se)[["counts"]] %>%
as_tibble() %>%
filter(rowSums(.)!=0) %>%
gather(name, counts)
### Just for this analysis, not to be permanently added#####
counts$name <- gsub("X4","4", counts$name)
counts <- left_join(counts, metrics_small, by = c("name" = "sample"))
ggplot(counts, aes(factor(name, level = order),
log2(counts+1),
fill = .data[[FOI]])) +
geom_boxplot() +
scale_fill_cb_friendly() +
coord_flip() + xlab("") +
ggtitle("Counts per gene, all non-zero genes") +
scale_color_cb_friendly()
Sample similarity analysis
In this section, we look at how well the different groups in the dataset cluster with each other. Samples from the same group should ideally be clustering together. We use Principal Component Analysis (PCA).
Principal component analysis (PCA) {.tabset}
Principal Component Analysis (PCA) is a statistical technique used to simplify high-dimensional data by identifying patterns and reducing the number of variables. In the context of gene expression, PCA helps analyze large datasets containing information about the expression levels of thousands of genes across different samples (e.g., tissues, cells).
raw_counts <- assays(se)[["counts"]] %>% round() %>%
as_tibble() %>%
filter(rowSums(.)!=0) %>%
as.matrix()
### Just for this analysis, not to be permanently added#####
colnames(raw_counts) <- gsub("X4","4", colnames(raw_counts))
vst <- vst(raw_counts)
#fix samples names
coldat_for_pca <- as.data.frame(metrics)
rownames(coldat_for_pca) <- coldat_for_pca$sample
coldat_for_pca <- coldat_for_pca[colnames(raw_counts),]
pca1 <- degPCA(vst, coldat_for_pca,
condition = FOI, data = T)[["plot"]]
pca2 <- degPCA(vst, coldat_for_pca,
condition = FOI, data = T, pc1="PC3", pc2="PC4")[["plot"]]
pca1 + scale_color_cb_friendly()
pca2 + scale_color_cb_friendly()
## Hierarchical clustering
vst_cor <- cor(vst)
annotation_cols <- cb_friendly_pal('grey')(length(unique(coldat_for_pca[[FOI]])))
names(annotation_cols) <- unique(coldat_for_pca[[FOI]])
## Note am still unable to get the annotation colors with the CB palette to work##
pheatmap(vst_cor, annotation_col = coldat_for_pca[,FOI, drop=F], show_rownames = T, show_colnames = T, color = cb_friendly_pal('heatmap')(15))
R session
List and version of tools used for the QC report generation.
sessionInfo()
The text was updated successfully, but these errors were encountered:
Overview
r project
r PI
r analyst
r experiment
r aim
Samples and metadata
Read metrics {.tabset}
Total reads
Here, we want to see consistency and a minimum of 20 million reads.
Mapping rate
The genomic mapping rate represents the percentage of reads mapping to the reference genome. We want to see consistent mapping rates between samples and over 70% mapping. These samples have mapping rates (
r min_pct_mapped
-r max_pct_mapped
%).Number of genes detected
The number of genes represented in every sample is expected to be consistent and over 20K (blue line).
Gene detection saturation
This plot shows how complex the samples are. We expect samples with more reads to detect more genes.
Exonic mapping rate
Here we are looking for consistency, and exonic mapping rates around 70% or 75% (blue and red lines, respectively).
Intronic mapping rate
Here, we expect a low intronic mapping rate (≤ 15% - 20%)
Intergenic mapping rate
Here, we expect a low intergenic mapping rate, which is true for all samples.
rRNA mapping rate
Samples should have a ribosomal RNA (rRNA) "contamination" rate below 10%
5'->3' bias
There should be little bias, i.e. the values should be close to 1, or at least consistent among samples
Counts per gene - all genes
We expect consistency in the box plots here between the samples, i.e. the distribution of counts across the genes is similar
Sample similarity analysis
In this section, we look at how well the different groups in the dataset cluster with each other. Samples from the same group should ideally be clustering together. We use Principal Component Analysis (PCA).
Principal component analysis (PCA) {.tabset}
Principal Component Analysis (PCA) is a statistical technique used to simplify high-dimensional data by identifying patterns and reducing the number of variables. In the context of gene expression, PCA helps analyze large datasets containing information about the expression levels of thousands of genes across different samples (e.g., tissues, cells).
R session
List and version of tools used for the QC report generation.
The text was updated successfully, but these errors were encountered: