You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
All these files do not fully adhere the GMT standard which states that the genes must be separated by tabs. In these file the genes are separated by ",". That issue can of course be tackled. When doing so, a knockout problem arises... The codes for the genes differ from the codes used in the reference genome file "https://ftp.ebi.ac.uk/ensemblgenomes/pub/release-56/plants/fasta/oryza_sativa/cdna/".
For example:
BioMart gene codes: Os12g0469300, Os07g0249200
MSU Rice Genome Annotation Project gene codes (used in the GMT files): LOC_Os01g07760, LOC_Os01g40630, LOC_Os03g59220
Transcript LOC_Os01g02240.1.1
Gene LOC_Os01g02240
Protein product LOC_Os01g02240.1
Location Chromosome 1: 678,778-684,594
Gene type Msu gene
Strand Reverse
Base pairs 4,758
Amino acids 1,585
Analysis Genes (MSU)
Annotation method Gene annotation by MSU Rice Genome Annotation Project dated 2011-10-31. These genes are included alongside the IRGSP annotations, but are not included in Compara or BioMart. Read more...;
Genome Analysis
rGREAT: an R/bioconductor package for functional
enrichment on genomic regions
Unfortunately, I could not find any other suitable GMT files which use the BioMart gene codes (used with kallisto/reference genome file and the tximport).
The text was updated successfully, but these errors were encountered:
It seems the MSU Rice Genome Annotation Project is very dated. I would not spend much time on it. What I would suggest is getting the annotations using mercator: https://www.plabipd.de/mercator_main.html
It requires fasta of peptides for the genome you used to map the reads and gets you annotation within minutes.
However, it will require a bit coding to turn it into gmt gene sets. I will try it tomorrow
Checked GMT files: http://structuralbiology.cau.edu.cn/PlantGSEA/download.php
- GO (Gene Ontology) gene sets : http://structuralbiology.cau.edu.cn/PlantGSEA/database/Osa_GO
- Gene Family based gene sets : http://structuralbiology.cau.edu.cn/PlantGSEA/database/Osa_GFam
- KEGG gene sets : http://structuralbiology.cau.edu.cn/PlantGSEA/database/Osa_KEGG
- PO gene sets : http://structuralbiology.cau.edu.cn/PlantGSEA/database/Osa_PO
All these files do not fully adhere the GMT standard which states that the genes must be separated by tabs. In these file the genes are separated by ",". That issue can of course be tackled. When doing so, a knockout problem arises... The codes for the genes differ from the codes used in the reference genome file "https://ftp.ebi.ac.uk/ensemblgenomes/pub/release-56/plants/fasta/oryza_sativa/cdna/".
For example:
BioMart gene codes: Os12g0469300, Os07g0249200
MSU Rice Genome Annotation Project gene codes (used in the GMT files): LOC_Os01g07760, LOC_Os01g40630, LOC_Os03g59220
At http://plants.ensembl.org/Oryza_sativa/Location/Viewdb=core;g=Os03g0786000;r=3:32624612-32627796;t=Os03t0786000-01 I found the following information (and only there) when displaying the information for one of the genes:
Transcript LOC_Os01g02240.1.1
Gene LOC_Os01g02240
Protein product LOC_Os01g02240.1
Location Chromosome 1: 678,778-684,594
Gene type Msu gene
Strand Reverse
Base pairs 4,758
Amino acids 1,585
Analysis Genes (MSU)
Annotation method Gene annotation by MSU Rice Genome Annotation Project dated 2011-10-31. These genes are included alongside the IRGSP annotations, but are not included in Compara or BioMart. Read more...;
Genome Analysis
rGREAT: an R/bioconductor package for functional
enrichment on genomic regions
Unfortunately, I could not find any other suitable GMT files which use the BioMart gene codes (used with kallisto/reference genome file and the tximport).
The text was updated successfully, but these errors were encountered: