Gene sets and functions for working with them.
This package contains the following data sets:
smoking
: Blood transcriptome gene signatures that associate with cigarette smoking from the Huan et al. 2016 meta-analysis http://dx.doi.org/10.1093/hmg/ddw288stress
: Gene expression levels in blood found as signatures of stress in five different studies. See the in-package description?stress
for details.
devtools::install_github("3inar/geneset")
If you're working with gene sets from MSigDB, it's quite likely that you have files in the Gene Matrix Transposed format; the load_gmt()
function will read a .gmt
file into a gset
object:
library(geneset)
geneset <- load_gmt("tests/testthat/testgmt.gmt") # dummy .gmt file for testing
geneset
## $names
## [1] "set1" "set2" "name w space"
##
## $descriptions
## [1] "description 1" "description 2" "description3"
##
## $genesets
## $genesets[[1]]
## [1] "a" "b" "c"
##
## $genesets[[2]]
## [1] "d" "e" "f"
##
## $genesets[[3]]
## [1] "a" "b" "c" "d" "e" "f" "g"
##
##
## attr(,"class")
## [1] "gset"
You can subset gset
s like you would a vector. There is also a lenght
function for them that returns the number of sets in the gset
:
geneset[2]
## $names
## [1] "set2"
##
## $descriptions
## [1] "description 2"
##
## $genesets
## $genesets[[1]]
## [1] "d" "e" "f"
##
##
## attr(,"class")
## [1] "gset"
geneset[c(T, F, T)]
## $names
## [1] "set1" "name w space"
##
## $descriptions
## [1] "description 1" "description3"
##
## $genesets
## $genesets[[1]]
## [1] "a" "b" "c"
##
## $genesets[[2]]
## [1] "a" "b" "c" "d" "e" "f" "g"
##
##
## attr(,"class")
## [1] "gset"
length(geneset)
## [1] 3
Inevitably some gene sets will contain symbols that for one reason or another aren't present in the data set you're investigating. These can be removed by gsintersect()
:
mygenes <- c("a", "b", "d", "e", "f")
geneset <- gsintersect(geneset, mygenes); geneset
## $names
## [1] "set1" "set2" "name w space"
##
## $descriptions
## [1] "description 1" "description 2" "description3"
##
## $genesets
## $genesets[[1]]
## [1] "a" "b"
##
## $genesets[[2]]
## [1] "d" "e" "f"
##
## $genesets[[3]]
## [1] "a" "b" "d" "e" "f"
##
##
## attr(,"class")
## [1] "gset"
Perhaps a two-gene set is too small to be taken seriously for whatever reason, gsfilter()
will remove gene sets with cardinality outside of provided limits:
geneset <- gsfilter(geneset, min=3); geneset
## $names
## [1] "set2" "name w space"
##
## $descriptions
## [1] "description 2" "description3"
##
## $genesets
## $genesets[[1]]
## [1] "d" "e" "f"
##
## $genesets[[2]]
## [1] "a" "b" "d" "e" "f"
##
##
## attr(,"class")
## [1] "gset"