Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Genome Coordinates into mzTab proteomics #177

Open
ypriverol opened this issue May 14, 2019 · 1 comment
Open

Genome Coordinates into mzTab proteomics #177

ypriverol opened this issue May 14, 2019 · 1 comment
Assignees

Comments

@ypriverol
Copy link
Contributor

@andrewrobertjones @timosachsenberg @jgriss :

We have an ongoing project to map the Genome coordinates into ESENBML, we have been doing this for a while. Internally, PRIDE has moved into mztab long time ago. Then, our PSMs are in mztab for every project. We have a tool that read the mztab and tries to map the PSMs into Reference Genomes. However, we would like to keep that information also into the mzTab files as we did it in the mzIdentML 1.2. This is really important to us because we want to annotate our datasets.

I was checking the current implementation of mzid 1.2 this information is represented in the PeptideEvidence objects like:

<PeptideEvidence dBSequence_ref="dbseq_generic|A_ENSP00000471242.1|" peptide_ref="LALWEGR_" start="606" end="612" pre="R" post="S" isDecoy="false" id="LALWEGR_generic|A_ENSP00000471242.1|_606_612">
    <userParam name="psm_count" value="1"></userParam>
    <cvParam cvRef="PSI-MS" accession="MS:1002640" name="peptide end on chromosome" value="98424581"></cvParam>
    <cvParam cvRef="PSI-MS" accession="MS:1002641" name="peptide exon count" value="2"></cvParam>
    <cvParam cvRef="PSI-MS" accession="MS:1002642" name="peptide exon nucleotide sizes" value="11,10"></cvParam>
    <cvParam cvRef="PSI-MS" accession="MS:1002643" name="peptide start positions on chromosome" value="98412025,98424571"></cvParam>
  </PeptideEvidence>
  <PeptideEvidence dBSequence_ref="dbseq_generic|A_ENSP00000479861.1|" peptide_ref="GRLYPWGVVEVENPEHNDFLK_" start="290" end="310" pre="R" post="L" isDecoy="false" id="GRLYPWGVVEVENPEHNDFLK_generic|A_ENSP00000479861.1|_290_310">
    <userParam name="psm_count" value="2"></userParam>
    <cvParam cvRef="PSI-MS" accession="MS:1002640" name="peptide end on chromosome" value="241343880"></cvParam>
    <cvParam cvRef="PSI-MS" accession="MS:1002641" name="peptide exon count" value="1"></cvParam>
    <cvParam cvRef="PSI-MS" accession="MS:1002642" name="peptide exon nucleotide sizes" value="63"></cvParam>
    <cvParam cvRef="PSI-MS" accession="MS:1002643" name="peptide start positions on chromosome" value="241343819"></cvParam>
  </PeptideEvidence>

I like to reuse the Cvparam style used in mzid but we don't have in the mzTab the peptideEvidence concept. Then, this annotation should be added into the PSM section using optional cvparameters. With optional parameters, we don't need to change the schema of mztab. The problem is that because they are PSMs, they can map to multiple genome coordinates. Suggestions?

@andrewrobertjones
Copy link
Contributor

Hi @ypriverol, yes I understand the issue. There is no easy way to compress the info into one CV param, even for the case of a single mapping so you will need to use multiple CV params, if you go down this route. You would then also have to perform some complicated grouping for the case of multiple mappings.

Given that the mzTab files are largely for internal consumption (and proBed is absolutely designed for this case anyway), you should either convert to proBed properly, or just come up with a hacky userParam to cover this with your own fixed format e.g. "98424581:2:11,10:98412025,98424571;241343880:1:63:241343819" from the above example

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants