generated from RajLabMSSM/echoverseTemplate
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Fibril_Seurat.RMD
283 lines (212 loc) · 9.68 KB
/
Fibril_Seurat.RMD
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
---
title: "Parkinson Disease Stimulation Exp - Unstim Samples"
author: "Ashley Richardson"
date: "2024-01-22"
output: html_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
### Rmarkdown to pre-process scRNA seq from fibril stimulated PBMCs.
## Create the Seurat Object.
```{r Library Loading, warning=FALSE}
.libPaths(c("/hpc/packages/minerva-centos7/rpackages/4.2.0/site-library", "/hpc/packages/minerva-centos7/rpackages/bioconductor/3.15", .libPaths()))
library(sctransform, lib.loc = "/hpc/users/richaa21/.Rlib")
library(Seurat, lib.loc = "/hpc/users/richaa21/.Rlib")
library(patchwork)
library(dplyr)
```
```{r Create Seurat Object, paged.print=TRUE, warning=FALSE}
pbmc.data <- Read10X(data.dir = "/sc/arion/projects/ad-omics/ashley/data/aF/outs/per_sample_outs/aF/count/sample_filtered_feature_bc_matrix")
pbmc.f <- CreateSeuratObject(counts = pbmc.data, project = "fibril", min.cells = 3, min.features = 200)
pbmc.f
```
Lets check how many cells and features we are starting with.
```{r}
length(colnames(pbmc.f)) #number of cells
```
```{r}
length(rownames(pbmc.f)) #number of features
```
## Quality Control (QC)
a. QC mitochondiral genes. The PercentFeatureSet() calculates the % of counts originating from a set of features - here we are first looking at mitochondiral features, which are genes starting with MT.
b. By using [[ ]], I am adding columns to the pbmc matrix to store this QC data.
```{r QC: Mitochondria content, warning=FALSE}
pbmc.f[["percent.mt"]] <- PercentageFeatureSet(pbmc.f, pattern = "^MT-")
# Show QC metrics for the first 5 cells
head([email protected], 5)
```
```{r QC: VLNplot, warning = FALSE}
library(farver, lib.loc = "/hpc/packages/minerva-centos7/rpackages/4.2.0/site-library")
VlnPlot(pbmc.f, features = c("nFeature_RNA", "nCount_RNA", "percent.mt"), ncol = 3)
```
```{r QC: ScatterPLot, warning=FALSE}
plot1 <- FeatureScatter(pbmc.f, feature1 = "nCount_RNA", feature2 = "percent.mt")
plot2 <- FeatureScatter(pbmc.f, feature1 = "nCount_RNA", feature2 = "nFeature_RNA")
plot1
plot2
```
```{r QC: remove cells with high MT, warning = FALSE}
# I will select for %mt under 10% and features greater than 200 to capture alive healthy cells.
pbmc.f <- subset(pbmc.f, subset = nFeature_RNA > 200 & percent.mt < 10)
head([email protected])
```
## Normalize the data.
The data is normalized based on the feature (number of genes in a cell) by the total expression. This number is multiplied by 10,000 and then log transformed. The function to do this is "NormalizeData." The values specied below are the default values of this function.
```{r Normalize Data}
pbmc.f <- NormalizeData(pbmc.f, normalization.method = "LogNormalize", scale.factor = 10000)
```
## Identification of highly variable features
```{r Identify Highly Variable Features, warning = FALSE}
pbmc.f <- FindVariableFeatures(pbmc.f, selection.method = "vst", nfeatures = 2000)
# Identify the 10 most highly variable genes
top10 <- head(VariableFeatures(pbmc.f), 10)
# plot variable features with and without labels
plot1 <- VariableFeaturePlot(pbmc.f)
plot2 <- LabelPoints(plot = plot1, points = top10, repel = TRUE, xnudge = 0, ynudge = 0)
plot1
plot2
```
## Scaling the Data
The data is scaled by doing a linear transformation.
The ScaleData function does this by shifting the expression of genes so that the mean expression becomes 0 and the variance is 1.
By default, only variable genes are scaled. This is changed by features = all.genes
```{r Scale Data, warning=FALSE}
##Scaling RNA data, we only scale the variable features here for efficiency
all.genes <- rownames(pbmc.f)
pbmc.f <- ScaleData(pbmc.f, vars.to.regress = c("percent.mt"))
```
## Perform linear dimensional reduction
For the first principal components, RunPCA shows the most positive (correlated) and most negative (anticorrelated) genes
```{r Linear Dimensional Reduction, warning=FALSE}
pbmc.f <- RunPCA(pbmc.f, features = VariableFeatures(object = pbmc.f))
print(pbmc.f[["pca"]], dims = 1:5, nfeatures = 5)
```
The PCA results can be visualized in different ways.
```{r Dim Plot of PCs, warning=FALSE}
VizDimLoadings(pbmc.f, dims = 1:2, reduction = "pca")
```
```{r PC2 vs PC1 scatter plot, warning=FALSE}
DimPlot(pbmc.f, reduction = "pca") + NoLegend()
```
```{r Heatmap of PC1, warning=FALSE}
DimHeatmap(pbmc.f, dims = 1, cells = 500, balanced = TRUE)
```
```{r Heatmap of all PCs}
DimHeatmap(pbmc.f, dims = 1:15, cells = 500, balanced = TRUE)
```
## Determine the ‘dimensionality’ of the dataset
Cells will be clustered based on PCA. How many PC to use is dependent on many factors. For example, if trying to analyze a rare cell subset, you might want to add more PCs. Usually, the first 10 is good to see dimensionality of the data.
```{r ElbowPlot of PCs, warning=FALSE}
ElbowPlot(pbmc.f)
```
## Cluster the cells.
```{r Cluster Cells, warning=FALSE}
##We select the top 15 PCs for clustering and tSNE based on PCElbowPlot
pbmc.f <- FindNeighbors(pbmc.f, reduction = "pca", dims = 1:15)
pbmc.f <- FindClusters(pbmc.f, resolution = 0.5, verbose = FALSE)
```
```{r Head Clusters, warning=FALSE}
head(Idents(pbmc.f), 5)
```
## Run non-linear dimensional reduction (UMAP/tSNE)
```{r, warning=FALSE}
pbmc.f <- RunUMAP(pbmc.f, dims = 1:15)
```
```{r umap, warning=FALSE}
# note that you can set `label = TRUE` or use the LabelClusters function to help label
# individual clusters
DimPlot(pbmc.f, reduction = "umap", label = TRUE)
```
```{r}
pbmc.f <- RunTSNE(pbmc.f, reduction = "pca", dims = 1:15)
DimPlot(pbmc.f, reduction = "tsne", label = TRUE)
```
Lets check our metadata now of the seurat object to see what has been added.
```{r}
head([email protected])
#We now have seurat clusters and RNA_snn_res.0.5 columns added.
```
## Finding differentially expressed features (cluster biomarkers)
```{r markers for every cluster, warning=FALSE}
# find markers for every cluster compared to all remaining cells, report only the positive
# ones
markers_f <- FindAllMarkers(pbmc.f, only.pos = TRUE)
cluster5.markers <- FindMarkers(pbmc.f, ident.1 = 5, logfc.threshold = 0.25, test.use = "roc", only.pos = TRUE)
```
VlnPlot will display the differential expression across the clusters.
For example, I am looking here at CD8A and CD4 expression in the clusters.
```{r vlnplot of specified marker, warning=FALSE}
VlnPlot(pbmc.f, features = c("CD8A", "CD4"))
```
The raw counts can also be shown instead by adding some parameters.
```{r, warning=FALSE}
VlnPlot(pbmc.f, features = c("CD8A", "CD4"), slot = "counts", log = TRUE)
```
```{r umap feature plot, warning=FALSE}
FeaturePlot(pbmc.f, features = c("MS4A1", "GNLY", "CD3E", "CD14", "FCER1A", "FCGR3A", "LYZ", "PPBP",
"CD8A"))
```
### Combine Seurat object with demuxlet output.
## First, load and edit the .best demuxlet output file to make it more compatible.
```{r}
.libPaths(c("/hpc/packages/minerva-centos7/rpackages/4.2.0/site-library", "/hpc/packages/minerva-centos7/rpackages/bioconductor/3.15", .libPaths()))
library(patchwork)
library(tidyr)
f_demuxlet = read.delim("/sc/arion/projects/ad-omics/ashley/data/aF.best", header = T, stringsAsFactors = F, check.names = F)
head(f_demuxlet)
```
To edit: I will split the Best column into multiple columns.
```{r}
f_demuxlet_edit = f_demuxlet %>%
mutate(BEST = gsub("-01","", BEST)) %>%
separate(BEST, into=c("DMX_classification.global","DMX_maxID","DMX_secondID"), sep="-") %>%
separate(DMX_maxID, into=c("DMX_garbage1","DMX_garbage2","DMX_maxID"), sep ="_") %>%
separate(DMX_secondID, into=c("DMX_garbage3","DMX_garbage4","DMX_secondID"), sep ="_") %>%
select(-contains("garbage"))
head(f_demuxlet_edit)
```
```{r}
table(f_demuxlet_edit$DMX_classification.global) #num of singlets and doublets
```
```{r}
table(f_demuxlet_edit$DMX_maxID) #number of cells identified as each donor
```
```{r}
table(f_demuxlet_edit[,c("DMX_classification.global","DMX_maxID")]) #number of singlets or doublets identified as each donor
```
## Next, I will add this edited demuxlet to the Seurat object.
```{r}
f_demuxlet_edit.subset <- f_demuxlet_edit[f_demuxlet_edit$BARCODE %in% colnames(pbmc.f),]
[email protected] <- cbind([email protected],f_demuxlet_edit.subset$DMX_maxID,f_demuxlet_edit.subset$DMX_classification.global, f_demuxlet_edit.subset$BARCODE)
head([email protected])
```
## Pre-Processing the new object.
The Seurat Object has already been preprocessed in my case, so this should be clean)
```{r}
pbmc.f[["percent.mt"]] <- PercentageFeatureSet(pbmc.f, pattern = "^MT-")
library(cowplot)
# look at distribution of metrics by classification
plot_grid(VlnPlot(pbmc.f, features = c("nFeature_RNA", "nCount_RNA", "percent.mt"), ncol = 3, group.by = "f_demuxlet_edit.subset$DMX_maxID"))
```
```{r}
VlnPlot(pbmc.f, features = c("nFeature_RNA", "nCount_RNA", "percent.mt"), ncol = 3, group.by = "f_demuxlet_edit.subset$DMX_classification.global")
```
Now, select for MT content under 10% and for nfeatureRNA > 200 to ensure getting alive cells.
I previously did this, so should see no change in cell numnbers.
```{r}
pbmc.f <- subset(pbmc.f, subset = nFeature_RNA > 200 & percent.mt < 10)
length(colnames(pbmc.f))
length(rownames(pbmc.f))
```
Add column to meta data to identify seurat object as Basline condition. Also rename some columns for clarity purposes.
```{r}
[email protected]$condition <- 'Fibril'
names([email protected])[names([email protected]) == "f_demuxlet_edit.subset$DMX_classification.global"] <- "DMX_classification.global"
names([email protected])[names([email protected]) == "f_demuxlet_edit.subset$DMX_maxID"] <- "DMX_maxID"
names([email protected])[names([email protected]) == "f_demuxlet_edit.subset$BARCODE"] <- "Barcode"
head([email protected])
```
```{r Save as .finalRDS file}
saveRDS(pbmc.f, file = "/sc/arion/projects/ad-omics/ashley/PD_Stim/pbmc.fibril.final.RDS")
```