Skip to content
/ netGO Public

R/Shiny package for network-integrated pathway enrichment analysis

License

Notifications You must be signed in to change notification settings

unistbig/netGO

Repository files navigation

netGO is an R/Shiny package for network-integrated pathway enrichment analysis.
netGO provides user-interactive visualization of enrichment analysis results and related networks.

Currently, netGO supports analysis for four species (Human, Mouse, Arabidopsis thaliana,and Yeast)
These data are available from netGO-Data repository.

📋 Prerequisites

The R packages listed below are required to be installed before running netGO.(Alphabetical order)

devtools, doParallel, doSNOW, DT, foreach, googleVis, htmlwidgets, shiny, shinyCyJS, shinyjs, V8

  • Most of the packages are avaiable from CRAN, but shinyCyJS should be installed from github.

  • Linux user has to install V8 after installing the other packages.

  • Note that netGO is not supported for centOS 8, because V8 is not available in centOS 8.

On Debian / Ubuntu : libv8-dev or libnode-dev.
On Fedora : v8-devel
more information

The user may want to use the following codes to install the required packages.

install.packages('devtools') # 2.2.1
library(devtools) # check Rcpp package is installed.
install_github('unistbig/shinyCyJS')
install.packages('doParallel') # 1.0.15
install.packages('doSNOW') # 1.0.18
install.packages('DT') # 0.11
install.packages('foreach') # 1.4.7
install.packages('googleVis') # 0.6.4
install.packages('htmlwidgets') # 1.5.1
install.packages('shiny') # 1.4.0
install.packages('shinyjs') # 1.0
install.packages('V8') # 2.3

🔧 Running with an example data

Here are codes to run netGO for the breast tumor dataset (GEO GSE3744.)

library(devtools)
install_github('unistbig/netGO') # install netGO library

library(netGO) # load netGO library
DownloadExampleData() # Download and load the breast tumor data
obj = netGO(genes = brca[1:30], genesets, network, genesetV) 

# The user may also load the pre-calculated result using the following command
# load("brcaresult.RData")   

For custom data analysis,

library(netGO)
userGenesetV = BuildGenesetV(genesets = userGenesets, network = userNetwork)
obj = netGO(genes = userGenes, genesets = userGenesets, network = userNetwork, genesetV = userGenesetV)

Running this example takes 5 to 25 minutes depending on the system used. The analysis results of netGO is shown below.

The analysis result can be visualized using the following codes:

netGOVis(obj, genes = brca[1:30], genesets, network, R = 50, Q = 0.25 ) # visualize netGO's result

If user wants to access result without shinyweb-application, the following functions can be used to export the result as text files

# exportGraphTxt
table = exportGraphTxt(gene = brca[1:30], geneset = 
genesets[['SMID_BREAST_CANCER_NORMAL_LIKE_UP']], network) # table
head(table)

# exportGraph
graph = exportGraph(brca[1:30], geneset = 
genesets[['SMID_BREAST_CANCER_NORMAL_LIKE_UP']], network) # shinyCyJS graph object
shinyCyJS(graph)

# exportTable
table = exportTable(obj, R = 50, Q = 0.25) # table
head(table)

dtable = exportTable(obj, type='D', R = 50, Q = 0.25) # data.table
dtable

📝 Data

Example Datasets (netGO-Data repository)

Human

Data genes genesets network genesetV
Breast Tumor brca.RData c2gs.RData networkString.RData networkHumannet.RData genesetVString1,2.RData genesetVHumannet1,2.RData
P53 p53.RData c2gs.RData networkString.RData networkHumannet.RData genesetVString1,2.RData genesetVHumannet1,2.RData
Diabetes dg.RData cpGenesets.RData networkString.RData networkHumannet.RData cpgenesetV1,2.RData

The user can download the breast tumor data using DownloadExampleData function(Recommended)

Arabidopsis thaliana

Data genes genesets network genesetV
ShadowResponse Aragenes.RData KEGGara.RData networkAranet.RData AragenesetV.RData

Mouse & Yeast ( gene-set and networks available )

Species genesets network
Mouse KEGGmouse.Rdata networkMousenet.Rdata
Yeast KEGGyeast.Rdata networkYeastnet.Rdata

Data Formats

netGO requires the follwoing four data types.

  • genes : a character vector of input genes (e.g., differentially expressed genes).

  • genesets : a named list of gene-sets consisting of groups of genes to be tested.

  • network : a numeric matrix of network data. The network scores are normalized to the unit interval [0,1] by dividing each score by the maximum score

  • genesetV : A numeric matrix of pre-calculated interaction data between gene and gene-sets.
    The dimension of matrix must be [{number of genes} , {number of gene-sets}].
    It can be built by using BuildGenesetV function with network and genesets objects as the input arguments.

    genesetV = BuildGenesetV(network, genesets)

⚪ Functions


1. netGO

netGO function tests the significance of the gene-sets for the input gene list
and returns a data frame of gene-sets, their p-values, q-values derived from netGO+, Fisher’s exact test and netGO (optional) as well as the scores for the network interaction and overlap.

Input arguments

  • genes: a character vector of input genes (e.g., differentially expressed genes).

  • genesets: a list of gene-sets consisting of groups of genes.

  • network: A numeric matrix of network data. The network scores are normalized to the unit interval [0,1]. 1 represents strong interaction and 0 for no interaction

    A B C
    A 0 0.1 0.76
    B 0.1 0 0.324
    C 0.76 0.324 0
  • genesetV: a numeric matrix of pre-calculated interaction data between genes and gene-sets.
    This object can be built with BuildGenesetV function.

    Gene-set1 Gene-set2 Gene-set3
    A 0.837 1.647 0.074
    B 0 1.75 0.113
    C 0.464 0.486 2.442
  • alpha (optional): a numeric parameter ( ≥ 1; the default is 20) that weights the contribution of network connections in enrichment analysis.

  • beta (optional): a numeric parameter (∈[0,1]; the default is 0.5) that balances the weights between the relative and absolute network scores.

  • nperm (optional): a numeric parameter to determine the bin size (number of genes) to be used during resampling. The default is NULL which assigns approximately 2000 genes to each bin
  • pvalue (optional): a boolean parameter to determine whether to return Q-values only ( FALSE ) or both P-values and Q-values (TRUE)
  • plus (optional): a boolean parameter to determine whether to run both netGO and netGO+ (plus = FALSE) or netGO+ only ( plus = TRUE, default )
  • verbose (optional) : a boolean parameter whether to show more process of netGO as follows.

Notice the input genes should be represented in gene symbols when using the default networks and gene-sets (STRING and MSigDB).
Other types of gene names are also allowed if the corresponding customized data (networks and gene-set data) are used.


2. netGOVis

netGOVis function visualizes the analysis results on the web browser (google chrome is recommended).
The resulting graphs (svg format) and table are downloadable from the web browser.

Input arguments

  • obj: the data frame of analysis results obtained by running netGO function.
    It consists of multiple columns including
  1. gene-set name and p, q-values evaluated using netGO (optional), netGO+, and Fisher’s exact test as well as the scores for the overlap and networks.
  • genes, genesets, network: the same as those in the netGO function.
  • R (optional): gene-set rank threshold, The default is 50 (Top 50 gene-sets in either method will be shown).
  • Q (optional): Gene-set Q-value threshold, The default is 0.25. (gene-sets with Q-value ≤ 0.25 will be used)

After running the netGO function, the user may see the following logs in the R console.

and user's default web browser (netGO was built based on chrome environment) will return the following interactive visualization:


3. BuildGenesetV

BuildGenesetV function will build genesetV object using the given network and genesets.
genesetV is pre-calculated interaction files used to reduce the running time of netGO.

Input arguments

  • genesets, network: the same as those in the netGO function.

4. DownloadExampleData

This function will download example data in the user's working directory and load the data ( breast tumor, GSE3744 ) in user's R environment.
Note that, if objects exist in the working directory, this function will not download the data again, so we recommand removing and downloading them again if netGO package is updated.

Input arguments

  • none
  • R object named brca, genesets, genesetV, network, obj will be loaded.

5. exportGraph

exportGraph function will export network data from the netGO analsysis result as graph object that can be accessed using shinyCyJS function

Input arguments

  • genes, network : the same as those in the netGO function.

  • geneset : a character vector of gene symbols (e.g., member of genesets object in netGO).

for example,

geneset = genesets[['SMID_BREAST_CANCER_NORMAL_LIKE_UP']]
graph = exportGraph(brca[1:30], geneset = 
genesets[['SMID_BREAST_CANCER_NORMAL_LIKE_UP']], network) # shinyCyJS graph object

shinyCyJS(graph)

However, the default viewer of R (not web browser) will not use the layout functions as shown below.


6. exportGraphTxt

exportGraphTxt function will export network data from the netGO analysis result as table format.

Input arguments

  • genes, network, geneset : the same as those in the exportGraph function.

For example,

table = exportGraphTxt(brca[1:30], geneset, network)
head(table)

the exported data are shown as

geneA geneB strength type
A B 0.1 Inter
C D 0.82 Inner

'Inter' means geneB belongs to the intersection of genes and genesets. 'Inner' means geneB belongs to the differenced set genesetsgenes.


7. exportTable

exportTable will export the result object of netGO as table or data.table.

Input arguments

  • obj, R, Q : the same as those in the netGOVis function.

for example,

table = exportTable(obj, R = 50, Q = 0.25) # table
head(table)

dtable = exportTable(obj, type='D', R = 50, Q = 0.25) # data.table
dtable

The exported data have the format as follows:

geneset name netGO+ q-value Fisher q-value
genesetA 0.11 0.2

📘 Visualization and exploration of netGO analysis results

The netGO analysis results are visualized through three panels: interaction networks, list of significant gene-sets, and the bubble chart.

Interaction Network

  • The network panel displays the input genes, selected gene-set, and the network connections between the two.
  • #48dbfb Sky blue nodes represent input genes (e.g., differentially expressed genes)
  • #feca57 Yellow nodes represent genes in the selected gene-set
  • #1dd1a1 Green nodes represent the intersection of input genes and the gene-set.
  • The edge width represents the strength of interaction between two nodes.
  • Genes without edges will be not be displayed.
  • The gene-set can be selected by clicking on the gene-set name on the upper-right panel.
  • The user can download the graph image as SVG format.

Significant gene-sets

  • This panel contains the list of significant gene-sets as well as their Q-values ( or P-values ) evaluated from netGO, netGO+ and Fisher’s exact test. It is downloadable by clicking the ‘Download Table’ button in the upper right corner of the table

Bubble chart

  • This module plots the bubble chart of significant gene-sets for the netGO+ results.
  • The overlap (x-axis) and network (y-axis) scores of the significant gene-sets are represented.
  • The size of bubbles represents the significance level of each gene-set in -log10 scale (Qvalue).
  • Hovering/Click on each bubble will show corresponding statistical values.

😊 Contact

📝 License

This project is MIT licensed