Oryza sativa: Found GMT files use different gene codes from that used in BioMart #4

IngoGiebel · 2023-04-26T16:30:53Z

All these files do not fully adhere the GMT standard which states that the genes must be separated by tabs. In these file the genes are separated by ",". That issue can of course be tackled. When doing so, a knockout problem arises... The codes for the genes differ from the codes used in the reference genome file "https://ftp.ebi.ac.uk/ensemblgenomes/pub/release-56/plants/fasta/oryza_sativa/cdna/".

For example:
BioMart gene codes: Os12g0469300, Os07g0249200
MSU Rice Genome Annotation Project gene codes (used in the GMT files): LOC_Os01g07760, LOC_Os01g40630, LOC_Os03g59220

At http://plants.ensembl.org/Oryza_sativa/Location/Viewdb=core;g=Os03g0786000;r=3:32624612-32627796;t=Os03t0786000-01 I found the following information (and only there) when displaying the information for one of the genes:

Transcript LOC_Os01g02240.1.1
Gene LOC_Os01g02240
Protein product LOC_Os01g02240.1
Location Chromosome 1: 678,778-684,594
Gene type Msu gene
Strand Reverse
Base pairs 4,758
Amino acids 1,585
Analysis Genes (MSU)
Annotation method Gene annotation by MSU Rice Genome Annotation Project dated 2011-10-31. These genes are included alongside the IRGSP annotations, but are not included in Compara or BioMart. Read more...;

Genome Analysis
rGREAT: an R/bioconductor package for functional
enrichment on genomic regions

Unfortunately, I could not find any other suitable GMT files which use the BioMart gene codes (used with kallisto/reference genome file and the tximport).

fi4sko · 2023-04-26T19:39:32Z

It seems the MSU Rice Genome Annotation Project is very dated. I would not spend much time on it. What I would suggest is getting the annotations using mercator: https://www.plabipd.de/mercator_main.html
It requires fasta of peptides for the genome you used to map the reads and gets you annotation within minutes.
However, it will require a bit coding to turn it into gmt gene sets. I will try it tomorrow

fi4sko · 2023-04-26T19:42:09Z

dit it work with gost in gprofiler2?

IngoGiebel · 2023-04-26T20:35:59Z

gprofile2 works fine! Well, the graph could be nicer, but yes, it works.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Oryza sativa: Found GMT files use different gene codes from that used in BioMart #4

Oryza sativa: Found GMT files use different gene codes from that used in BioMart #4

IngoGiebel commented Apr 26, 2023

fi4sko commented Apr 26, 2023

fi4sko commented Apr 26, 2023

IngoGiebel commented Apr 26, 2023

Oryza sativa: Found GMT files use different gene codes from that used in BioMart #4

Oryza sativa: Found GMT files use different gene codes from that used in BioMart #4

Comments

IngoGiebel commented Apr 26, 2023

- GO (Gene Ontology) gene sets : http://structuralbiology.cau.edu.cn/PlantGSEA/database/Osa_GO

- Gene Family based gene sets : http://structuralbiology.cau.edu.cn/PlantGSEA/database/Osa_GFam

- KEGG gene sets : http://structuralbiology.cau.edu.cn/PlantGSEA/database/Osa_KEGG

- PO gene sets : http://structuralbiology.cau.edu.cn/PlantGSEA/database/Osa_PO

fi4sko commented Apr 26, 2023

fi4sko commented Apr 26, 2023

IngoGiebel commented Apr 26, 2023