Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Script 5 - getGmt Error #2

Open
fi4sko opened this issue Apr 26, 2023 · 0 comments
Open

Script 5 - getGmt Error #2

fi4sko opened this issue Apr 26, 2023 · 0 comments

Comments

@fi4sko
Copy link
Contributor

fi4sko commented Apr 26, 2023

Some of you get errors while importing some of the PlantGSEA gmt files

> broadSet.C2.ALL <- getGmt("Osa.DetailInfo.csv", geneIdType=SymbolIdentifier())
Error in validObject(.Object) : 
  invalid class “GeneSetCollection” object: each setName must be distinct
In addition: Warning message:
In getGmt("Osa.DetailInfo.csv", geneIdType = SymbolIdentifier()) :
  5788 record(s) contain duplicate ids: 'DE_NOVO'_IMP_BIOSYNTHETIC_PROCESS, 'DE_NOVO'_PYRIMIDINE_NUCLEOBASE_BIOSYNTHETIC_PROCESS, ..., ZINC_ION_TRANSMEMBRANE_TRANSPORTER_ACTIVITY, ZINC_ION_TRANSPORT

The error is caused by duplicated names of some of the gene sets. I dont know why such duplicates occur in the file, they can be easily removed using R and the code below.

# Quick solution 
# 1. Add ".csv "extension to the downloaded file, here for rice, the file name is "Osa.DetailInfo" downloaded from PlantGSEA
# 2. Read the file
tmp = read.csv("Osa.DetailInfo.csv", header = F, sep = "\t")
# 3. make tibble
tmp = as.tibble(tmp)
# 4. remove Duplicates
tmp = tmp[!duplicated(tmp$V1), ]
# 5. write new file
write.table(tmp, "OsaUnique.csv", sep="\t",col.names = F,row.names = F)
# 6. read the file as Gmt
broadSet.Osa.Unique = getGmt("OsaUnique.csv", geneIdType=SymbolIdentifier())
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant