Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AVE cannot parse valid GFF file #13

Open
mkempenaar opened this issue Jun 24, 2013 · 3 comments
Open

AVE cannot parse valid GFF file #13

mkempenaar opened this issue Jun 24, 2013 · 3 comments
Labels

Comments

@mkempenaar
Copy link

Firstly, because a proper GFF file should contain a version number in its header (like '##gff-version 3') which causes AVE to see this header as a chromosome ID.

Also, the validator that I use (an installable one, from http://genometools.org/) says that the attributes column cannot contain keys containing capitals since these are reserved for a number of keys thus failing for keys like 'Change', 'Strain', etc.

Furthermore, gene annotations are quoted strings that could contain semicolons which will crash ave_tools.

@mkuzak
Copy link
Member

mkuzak commented Jun 24, 2013

Hi Marcel,
I do agree for the possibility of commented line in gff file. I'll add this to the parser.

According to gff3 specification (http://www.sequenceontology.org/gff3.shtml) keys with capital letters are allowed.
"All attributes that begin with an uppercase letter are reserved for later use. Attributes that begin with a lowercase letter can be used freely by applications. "
I understand, that using upper case can in the future cause problems, in case attributes used by AVE will be added to gff specification. I decided to use upper case to stick to format used by 1001 genome project. I have to think what is better.

Can you give example of gene annotation with quoted string?

cheers,
Mateusz

@mkempenaar
Copy link
Author

Hi Mateusz,

Thanks for the quick response. I've removed the gff version header and replaced the semicolons from the files and that worked just fine. See a few example lines of a gff file that fails: https://gist.github.com/mkempenaar/eb1b3d341e0b0b09cd1a

The full file can be downloaded from: http://solanaceae.plantbiology.msu.edu/data/PGSC_DM_v3_2.1.11_pseudomolecule_annotation.gff.zip

Thanks!

@mkuzak
Copy link
Member

mkuzak commented Jun 25, 2013

thank you for this example, I'll see what I can do, so that the import works on those quoted attribute values,

Mateusz

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants