Script 5 - getGmt Error #2

fi4sko · 2023-04-26T13:32:09Z

Some of you get errors while importing some of the PlantGSEA gmt files

> broadSet.C2.ALL <- getGmt("Osa.DetailInfo.csv", geneIdType=SymbolIdentifier())
Error in validObject(.Object) : 
  invalid class “GeneSetCollection” object: each setName must be distinct
In addition: Warning message:
In getGmt("Osa.DetailInfo.csv", geneIdType = SymbolIdentifier()) :
  5788 record(s) contain duplicate ids: 'DE_NOVO'_IMP_BIOSYNTHETIC_PROCESS, 'DE_NOVO'_PYRIMIDINE_NUCLEOBASE_BIOSYNTHETIC_PROCESS, ..., ZINC_ION_TRANSMEMBRANE_TRANSPORTER_ACTIVITY, ZINC_ION_TRANSPORT

The error is caused by duplicated names of some of the gene sets. I dont know why such duplicates occur in the file, they can be easily removed using R and the code below.

# Quick solution 
# 1. Add ".csv "extension to the downloaded file, here for rice, the file name is "Osa.DetailInfo" downloaded from PlantGSEA
# 2. Read the file
tmp = read.csv("Osa.DetailInfo.csv", header = F, sep = "\t")
# 3. make tibble
tmp = as.tibble(tmp)
# 4. remove Duplicates
tmp = tmp[!duplicated(tmp$V1), ]
# 5. write new file
write.table(tmp, "OsaUnique.csv", sep="\t",col.names = F,row.names = F)
# 6. read the file as Gmt
broadSet.Osa.Unique = getGmt("OsaUnique.csv", geneIdType=SymbolIdentifier())

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Script 5 - getGmt Error #2

Script 5 - getGmt Error #2

fi4sko commented Apr 26, 2023

Script 5 - getGmt Error #2

Script 5 - getGmt Error #2

Comments

fi4sko commented Apr 26, 2023