You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Some of you get errors while importing some of the PlantGSEA gmt files
> broadSet.C2.ALL <- getGmt("Osa.DetailInfo.csv", geneIdType=SymbolIdentifier())
Error in validObject(.Object) :
invalid class “GeneSetCollection” object: each setName must be distinct
In addition: Warning message:
In getGmt("Osa.DetailInfo.csv", geneIdType = SymbolIdentifier()) :
5788 record(s) contain duplicate ids: 'DE_NOVO'_IMP_BIOSYNTHETIC_PROCESS, 'DE_NOVO'_PYRIMIDINE_NUCLEOBASE_BIOSYNTHETIC_PROCESS, ..., ZINC_ION_TRANSMEMBRANE_TRANSPORTER_ACTIVITY, ZINC_ION_TRANSPORT
The error is caused by duplicated names of some of the gene sets. I dont know why such duplicates occur in the file, they can be easily removed using R and the code below.
# Quick solution
# 1. Add ".csv "extension to the downloaded file, here for rice, the file name is "Osa.DetailInfo" downloaded from PlantGSEA
# 2. Read the file
tmp = read.csv("Osa.DetailInfo.csv", header = F, sep = "\t")
# 3. make tibble
tmp = as.tibble(tmp)
# 4. remove Duplicates
tmp = tmp[!duplicated(tmp$V1), ]
# 5. write new file
write.table(tmp, "OsaUnique.csv", sep="\t",col.names = F,row.names = F)
# 6. read the file as Gmt
broadSet.Osa.Unique = getGmt("OsaUnique.csv", geneIdType=SymbolIdentifier())
The text was updated successfully, but these errors were encountered:
Some of you get errors while importing some of the PlantGSEA gmt files
The error is caused by duplicated names of some of the gene sets. I dont know why such duplicates occur in the file, they can be easily removed using R and the code below.
The text was updated successfully, but these errors were encountered: