Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

define variable modifications in calculateFragments() #167

Open
adder opened this issue Oct 24, 2016 · 17 comments
Open

define variable modifications in calculateFragments() #167

adder opened this issue Oct 24, 2016 · 17 comments
Assignees

Comments

@adder
Copy link
Contributor

adder commented Oct 24, 2016

Hi,
It seems you can only define fixed modifications with calculateFragments().
Am I right? If not, how to do it?
I think a feature like this could be useful.
Posible ways of implementing this are:
1)
Provide a vector of masses/modifications with length = peptide sequence length.
The position in the vector corresponds to the position in the peptide sequence

use lettercode in the peptide sequence and link the lettercode to a modification wit a names list.
(much like fragmentpeptide() and PeptideSpectrum() in the OrgMassSpecR R package)

What do you guys think?

@sgibb
Copy link
Collaborator

sgibb commented Oct 24, 2016

Maybe I don't completely understand your request but at least your second suggestion is already possible:

library("MSnbase")

calculateFragments("ACE", modifications=c(C=57.02146))
# Modifications used: C=57.02146
#          mz ion type pos z seq
# 1  72.04439  b1    b   1 1   A
# 2 232.07504  b2    b   2 1  AC
# 3 361.11763  b3    b   3 1 ACE
# 4 148.06043  y1    y   1 1   E
# 5 308.09108  y2    y   2 1  CE
# 6 379.12819  y3    y   3 1 ACE
# 7 115.06278 y1_   y_   1 1   E
# 8 275.09343 y2_   y_   2 1  CE
# 9 346.13054 y3_   y_   3 1 ACE

calculateFragments("ACE", modifications=c(C=57.02146, A=-1))
# Modifications used: C=57.02146, A=-1
#          mz ion type pos z seq
# 1  71.04439  b1    b   1 1   A
# 2 231.07504  b2    b   2 1  AC
# 3 360.11763  b3    b   3 1 ACE
# 4 148.06043  y1    y   1 1   E
# 5 308.09108  y2    y   2 1  CE
# 6 378.12819  y3    y   3 1 ACE
# 7 115.06278 y1_   y_   1 1   E
# 8 275.09343 y2_   y_   2 1  CE
# 9 345.13054 y3_   y_   3 1 ACE

@sgibb sgibb self-assigned this Oct 24, 2016
@adder
Copy link
Contributor Author

adder commented Oct 24, 2016

If I get it correctly, calculateFragments("ACE", modifications=c(C=57.02146, A=-1)) still defines a fixed modification. (So all alanines in the peptide sequence (eg. ACAE) will have a -1 mass shift)
Sometimes you want to define a variable modification (eg. only the second A is modified)

You could this by giving a modification vector with the modification giving at the postion in the vector matching the position in the sequence
calculateFragments("ACAE", fixed.modifications=c(C=57.02146),variable.modifications=c(0,0,-1,0))

the OrgMassSpecR R package solved this by using a lettercode in the peptide sequence and link the lettercode to a modification with list. (see fragmentpeptide() and PeptideSpectrum() functions)
You could this by giving a modification vector with the modification giving at the postion in the vector matching the position in the sequence
eg.
calculateFragments("ACAmE", fixed.modifications=c(C=57.02146),variable.modifications=c(m = -1))

A more biological relevant use case would be phosphorylation. Eg, serine can get phosporylated but not all serines in the protein will be phosporylated.
Say that you have a spectra identified wiht a phosogroup at the 2nd serine in the sequence.
It would be nice to be able to label the spectrum using the identified peptide sequence with calculateFragments()
I hope I'm clear enough this time :)

@sgibb
Copy link
Collaborator

sgibb commented Oct 24, 2016

Ok, thanks for the clarification. Great idea. I like your second solution more than the first. For the first you have to know the length of the peptide sequence before (or count it manually).

@sgibb
Copy link
Collaborator

sgibb commented Oct 24, 2016

@lgatto we changed the way the modification argument was handled in MSnbase => 1.17.6. Currently the modification is added to the amino acid (before the modification replaced the amino acid).

In favor to handle the fixed.modifications and variable.modifications in the same way I would prefer to replace the original mass with the modification (I would use a single argument modification with upper letters for general/fixed amino acid modifications and lower letters for individual/variable modifications). I know this changes are not user-friendly. Or do you have another suggestion? I am sorry for breaking the API so often.

@sgibb
Copy link
Collaborator

sgibb commented Oct 24, 2016

Sry, we have to rethink the handling of modification once again. We changed to add instead of replace because we allow modifications to Cterm and Nterm: #47 (comment)

  • for Nterm/Cterm add would be useful (we don't know before which amino acid is at the N/C term, so replace would not be an option)
  • for variable modifications replace would be useful (otherwise we also need to know that e.g. p is the phosphorylated serine and have to calculate S + p which would require some preprocessing by the user)
  • for fixed modifications it doesn't really matter (currently add; we changed that in MSnbase 1.17.6; see comment above)

@adder
Copy link
Contributor Author

adder commented Oct 24, 2016

Hi,
Letters in the sequence to define modification are fine be me. :)
Maybe some extra comments:
1)
allow multi letter codes (so we are not restricted by the 24 letters in the alphabet to define PTMs)
eg, SphosAMPSoxE to define phospo on S and oxidation on second S

Not sure what you mean by adding or replacing the original mass with modification mass.
But it might be useful to know that modification masses are usually added to original masses.
eg the modification defined in the unimod database are defined as mass differences. Not sure if it make a big difference on your code internally

Maybe something to consider, nterm en cterm modification might also be variable.
If seen following notation in other software. (not sure if there is 'official nomenclature' for this)
no modification
H-SAMPLE-OH
nterm acetylation
ace-SAMPLE-OH
But since you only have 1 nterm or cterm in peptide sequence, putting it as a fixed modification would work.

Finally, since you can work with mzid format in MSnBase. MZidentML also has an specific way to handle modified peptide sequence as defined in MZidentML guidelines.

Actually you could use a PSM entry in the mzid format to calculate the fragments and label a spectra.
But this is probably something less straightforward to add to the current functions short-term :)

@lgatto
Copy link
Owner

lgatto commented Oct 27, 2016

@sgibb - trying to catch up (still travelling, until next week).

A few thoughts

  • Following mzIdentML conventions, as mentioned by @adder in the post above could be a winning strategy in the long run. Something to keep in mind for the future, especially as calculateFragments seems to be a much used function, based on the feature requests we have.
  • I don't like the upper/lower case convention (but I could live with it). I don't think it is a clean interface (probably error prone) and it relies on messy parsing code to disentangle fixed from variable mods. What about having two independent arguments, fixed.modifications and variable.modifications? Or one that defines which mods are variable (fixed = TRUE would be the default, and a user would set something like fixed = c(TRUE, FALSE, TRUE, FALSE).
  • Finally, we could work on a new calculateFragments2 function, with a new API that supports everything we need, without worrying about backwards compatibility. When that one becomes stable, we can start a formal deprecation/replacement process.

@adder
Copy link
Contributor Author

adder commented Oct 27, 2016

@lgatto A reply on your 2nd comment.

Or one that defines which mods are variable (fixed = TRUE would be the default, and a user would set something like fixed = c(TRUE, FALSE, TRUE, FALSE).

You also have to be able to specify where the variable modificiation is positioned. Eg. if you have a sequence with 2 serines. You can have a Phospo group on the first but not on the second serine.

@lgatto
Copy link
Owner

lgatto commented Oct 27, 2016

You also have to be able to specify where the variable modificiation is positioned. Eg. if you have a sequence with 2 serines. You can have a Phospo group on the first but not on the second serine.

Yes, of course. My point here is to handle variable and fixed modifications independently, rather than tangle them together using, for example upper and lower case. These two parameters could be could be lists with positions and masses, ... or anything that makes sense/is convenient in terms of user interface and coding.

@sgibb - what about a Modification class with slots aa, position, mass, ... We could then define a list of Modification (or a Modifications class) the user could choose from, or create their own, and then pass them to the calculateFragments function. I don't want us to over-engineer this, but it might be a reasonable possibility.

@sgibb
Copy link
Collaborator

sgibb commented Oct 28, 2016

Adding a new class would be surely the cleanest solution and even support unimod would be great but is that what the user really wants?

The mzIdentMl code is very similar to your class suggestion:

<Modification location="10" residues="K" monoisotopicMassDelta="138.0680796"> 
    <cvParam accession="XLMOD:02001" cvRef="XLMOD" name="DSS"/>
    <cvParam accession="MS:1002509" cvRef="PSI-MS" name="cross-link donor" value="11309529182388590588"/>
  </Modification> 

Would be something like this pseudocode be more useable?

mod <- Modification(residues="C", location=2, avgMassDelta=57.02146)
calculateFragments("ACE", modifications=mod)

Or for the to be written unimod package:

library(unimod)
m <- unimod::query(id=4)
calculateFragments("ACE", modifications=mod)

@sgibb
Copy link
Collaborator

sgibb commented Jul 1, 2017

Sorry for being so slow and quiet regarding this topic. I think the best way would be to provide a Modification class (with unimod support). That's why I started (the long planned) unimod package. @adder and @lgatto you are welcome to contribute.

The planned class: https://github.com/ComputationalProteomicsUnit/unimod/blob/master/R/AllClasses.R (nothing implemented yet).

BTW: @lgatto I wanted to develop it under the umbrella of the CPU but I don't have write access to the CPU/unimod repository. So I had to fork it. We could change this at any time.

EDIT: urls updated to CPU

@lgatto
Copy link
Owner

lgatto commented Jul 1, 2017

Excellent initiative! I have now given you admin rights to the CPU repo.

@yafeng
Copy link

yafeng commented Dec 14, 2017

upvote +1
I am also interested to know if this pseudocode has been implemented already?
mod <- Modification(residues="C", location=2, avgMassDelta=57.02146) calculateFragments("ACE", modifications=mod)

It can be very useful to define variable modifications when calculating fragment ions.

@yafeng
Copy link

yafeng commented Dec 15, 2017

For those people who wants to predict MS2 fragments with variable modifications, there is a temporary solution here. https://www.github.com/yafeng/SpectrumAI/
the function predict_MS2_spectrum is written in Spectra_functions.R script.
it works like this:
fragments = predict_ms2_spectrum("+229.163EGNNPAEN+0.985GDAK+229.163R", product_ion_charge = 1)
"+229.163EGNNPAEN+0.985GDAK+229.163R" is the Peptide sequence output from MSGFplus.
It only generates b, y ions (no neutral loss, up to charge state 2).

@sgibb
Copy link
Collaborator

sgibb commented Dec 15, 2017

@yafeng Unfortunately I had no time to implement it yet. I have to fix a few issues in MSnbase before. Next I would focus on unimod to provide a unique interface for modifications. Subsequently I am going to work on this issue. If you have interest in the unimod package help is very welcome.

@higsch
Copy link

higsch commented Nov 26, 2018

Hi there,
what is the current status of this issue? I recently had to implement mirror plotting of peptide spectra with variable modifications. E.g. only one out of several methionines oxidised. Is there any help needed?

@sgibb
Copy link
Collaborator

sgibb commented Nov 26, 2018

Unfortunately I have much to do with my real job. We started to implement the unimod modifications in the unimod package. Currently it can download and parse the data from https://unimod.org and can calculate the mass of given sequences with fixed modifications.

We would need this functionality in more than one package (MSnbase, Pbase, topdownr, ...). So your help would be very welcome. Please don't hesitate to create issues in the unimod repository if you want to help and/or need some explanation what we did.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants