define variable modifications in calculateFragments() #167

adder · 2016-10-24T09:31:59Z

Hi,
It seems you can only define fixed modifications with calculateFragments().
Am I right? If not, how to do it?
I think a feature like this could be useful.
Posible ways of implementing this are:
1)
Provide a vector of masses/modifications with length = peptide sequence length.
The position in the vector corresponds to the position in the peptide sequence

use lettercode in the peptide sequence and link the lettercode to a modification wit a names list.
(much like fragmentpeptide() and PeptideSpectrum() in the OrgMassSpecR R package)

What do you guys think?

sgibb · 2016-10-24T14:24:53Z

Maybe I don't completely understand your request but at least your second suggestion is already possible:

library("MSnbase")

calculateFragments("ACE", modifications=c(C=57.02146))
# Modifications used: C=57.02146
#          mz ion type pos z seq
# 1  72.04439  b1    b   1 1   A
# 2 232.07504  b2    b   2 1  AC
# 3 361.11763  b3    b   3 1 ACE
# 4 148.06043  y1    y   1 1   E
# 5 308.09108  y2    y   2 1  CE
# 6 379.12819  y3    y   3 1 ACE
# 7 115.06278 y1_   y_   1 1   E
# 8 275.09343 y2_   y_   2 1  CE
# 9 346.13054 y3_   y_   3 1 ACE

calculateFragments("ACE", modifications=c(C=57.02146, A=-1))
# Modifications used: C=57.02146, A=-1
#          mz ion type pos z seq
# 1  71.04439  b1    b   1 1   A
# 2 231.07504  b2    b   2 1  AC
# 3 360.11763  b3    b   3 1 ACE
# 4 148.06043  y1    y   1 1   E
# 5 308.09108  y2    y   2 1  CE
# 6 378.12819  y3    y   3 1 ACE
# 7 115.06278 y1_   y_   1 1   E
# 8 275.09343 y2_   y_   2 1  CE
# 9 345.13054 y3_   y_   3 1 ACE

adder · 2016-10-24T14:58:38Z

If I get it correctly, calculateFragments("ACE", modifications=c(C=57.02146, A=-1)) still defines a fixed modification. (So all alanines in the peptide sequence (eg. ACAE) will have a -1 mass shift)
Sometimes you want to define a variable modification (eg. only the second A is modified)

You could this by giving a modification vector with the modification giving at the postion in the vector matching the position in the sequence
calculateFragments("ACAE", fixed.modifications=c(C=57.02146),variable.modifications=c(0,0,-1,0))

the OrgMassSpecR R package solved this by using a lettercode in the peptide sequence and link the lettercode to a modification with list. (see fragmentpeptide() and PeptideSpectrum() functions)
You could this by giving a modification vector with the modification giving at the postion in the vector matching the position in the sequence
eg.
calculateFragments("ACAmE", fixed.modifications=c(C=57.02146),variable.modifications=c(m = -1))

A more biological relevant use case would be phosphorylation. Eg, serine can get phosporylated but not all serines in the protein will be phosporylated.
Say that you have a spectra identified wiht a phosogroup at the 2nd serine in the sequence.
It would be nice to be able to label the spectrum using the identified peptide sequence with calculateFragments()
I hope I'm clear enough this time :)

sgibb · 2016-10-24T16:47:30Z

Ok, thanks for the clarification. Great idea. I like your second solution more than the first. For the first you have to know the length of the peptide sequence before (or count it manually).

sgibb · 2016-10-24T17:06:21Z

@lgatto we changed the way the modification argument was handled in MSnbase => 1.17.6. Currently the modification is added to the amino acid (before the modification replaced the amino acid).

In favor to handle the fixed.modifications and variable.modifications in the same way I would prefer to replace the original mass with the modification (I would use a single argument modification with upper letters for general/fixed amino acid modifications and lower letters for individual/variable modifications). I know this changes are not user-friendly. Or do you have another suggestion? I am sorry for breaking the API so often.

sgibb · 2016-10-24T17:13:40Z

Sry, we have to rethink the handling of modification once again. We changed to add instead of replace because we allow modifications to Cterm and Nterm: #47 (comment)

for Nterm/Cterm add would be useful (we don't know before which amino acid is at the N/C term, so replace would not be an option)
for variable modifications replace would be useful (otherwise we also need to know that e.g. p is the phosphorylated serine and have to calculate S + p which would require some preprocessing by the user)
for fixed modifications it doesn't really matter (currently add; we changed that in MSnbase 1.17.6; see comment above)

adder · 2016-10-24T17:55:50Z

Hi,
Letters in the sequence to define modification are fine be me. :)
Maybe some extra comments:
1)
allow multi letter codes (so we are not restricted by the 24 letters in the alphabet to define PTMs)
eg, SphosAMPSoxE to define phospo on S and oxidation on second S

Not sure what you mean by adding or replacing the original mass with modification mass.
But it might be useful to know that modification masses are usually added to original masses.
eg the modification defined in the unimod database are defined as mass differences. Not sure if it make a big difference on your code internally

Maybe something to consider, nterm en cterm modification might also be variable.
If seen following notation in other software. (not sure if there is 'official nomenclature' for this)
no modification
H-SAMPLE-OH
nterm acetylation
ace-SAMPLE-OH
But since you only have 1 nterm or cterm in peptide sequence, putting it as a fixed modification would work.

Finally, since you can work with mzid format in MSnBase. MZidentML also has an specific way to handle modified peptide sequence as defined in MZidentML guidelines.

Actually you could use a PSM entry in the mzid format to calculate the fragments and label a spectra.
But this is probably something less straightforward to add to the current functions short-term :)

lgatto · 2016-10-27T06:43:31Z

@sgibb - trying to catch up (still travelling, until next week).

A few thoughts

Following mzIdentML conventions, as mentioned by @adder in the post above could be a winning strategy in the long run. Something to keep in mind for the future, especially as calculateFragments seems to be a much used function, based on the feature requests we have.
I don't like the upper/lower case convention (but I could live with it). I don't think it is a clean interface (probably error prone) and it relies on messy parsing code to disentangle fixed from variable mods. What about having two independent arguments, fixed.modifications and variable.modifications? Or one that defines which mods are variable (fixed = TRUE would be the default, and a user would set something like fixed = c(TRUE, FALSE, TRUE, FALSE).
Finally, we could work on a new calculateFragments2 function, with a new API that supports everything we need, without worrying about backwards compatibility. When that one becomes stable, we can start a formal deprecation/replacement process.

adder · 2016-10-27T12:40:45Z

@lgatto A reply on your 2nd comment.

Or one that defines which mods are variable (fixed = TRUE would be the default, and a user would set something like fixed = c(TRUE, FALSE, TRUE, FALSE).

You also have to be able to specify where the variable modificiation is positioned. Eg. if you have a sequence with 2 serines. You can have a Phospo group on the first but not on the second serine.

lgatto · 2016-10-27T15:19:59Z

You also have to be able to specify where the variable modificiation is positioned. Eg. if you have a sequence with 2 serines. You can have a Phospo group on the first but not on the second serine.

Yes, of course. My point here is to handle variable and fixed modifications independently, rather than tangle them together using, for example upper and lower case. These two parameters could be could be lists with positions and masses, ... or anything that makes sense/is convenient in terms of user interface and coding.

@sgibb - what about a Modification class with slots aa, position, mass, ... We could then define a list of Modification (or a Modifications class) the user could choose from, or create their own, and then pass them to the calculateFragments function. I don't want us to over-engineer this, but it might be a reasonable possibility.

sgibb · 2016-10-28T07:26:41Z

Adding a new class would be surely the cleanest solution and even support unimod would be great but is that what the user really wants?

The mzIdentMl code is very similar to your class suggestion:

<Modification location="10" residues="K" monoisotopicMassDelta="138.0680796"> 
    <cvParam accession="XLMOD:02001" cvRef="XLMOD" name="DSS"/>
    <cvParam accession="MS:1002509" cvRef="PSI-MS" name="cross-link donor" value="11309529182388590588"/>
  </Modification>

Would be something like this pseudocode be more useable?

mod <- Modification(residues="C", location=2, avgMassDelta=57.02146)
calculateFragments("ACE", modifications=mod)

Or for the to be written unimod package:

library(unimod)
m <- unimod::query(id=4)
calculateFragments("ACE", modifications=mod)

sgibb · 2017-07-01T14:46:50Z

Sorry for being so slow and quiet regarding this topic. I think the best way would be to provide a Modification class (with unimod support). That's why I started (the long planned) unimod package. @adder and @lgatto you are welcome to contribute.

The planned class: https://github.com/ComputationalProteomicsUnit/unimod/blob/master/R/AllClasses.R (nothing implemented yet).

~~BTW: @lgatto I wanted to develop it under the umbrella of the CPU but I don't have write access to the CPU/unimod repository. So I had to fork it. We could change this at any time.~~

EDIT: urls updated to CPU

lgatto · 2017-07-01T15:39:13Z

Excellent initiative! I have now given you admin rights to the CPU repo.

yafeng · 2017-12-14T17:22:54Z

upvote +1
I am also interested to know if this pseudocode has been implemented already?
mod <- Modification(residues="C", location=2, avgMassDelta=57.02146) calculateFragments("ACE", modifications=mod)

It can be very useful to define variable modifications when calculating fragment ions.

yafeng · 2017-12-15T14:54:00Z

For those people who wants to predict MS2 fragments with variable modifications, there is a temporary solution here. https://www.github.com/yafeng/SpectrumAI/
the function predict_MS2_spectrum is written in Spectra_functions.R script.
it works like this:
fragments = predict_ms2_spectrum("+229.163EGNNPAEN+0.985GDAK+229.163R", product_ion_charge = 1)
"+229.163EGNNPAEN+0.985GDAK+229.163R" is the Peptide sequence output from MSGFplus.
It only generates b, y ions (no neutral loss, up to charge state 2).

sgibb · 2017-12-15T19:47:36Z

@yafeng Unfortunately I had no time to implement it yet. I have to fix a few issues in MSnbase before. Next I would focus on unimod to provide a unique interface for modifications. Subsequently I am going to work on this issue. If you have interest in the unimod package help is very welcome.

higsch · 2018-11-26T08:20:10Z

Hi there,
what is the current status of this issue? I recently had to implement mirror plotting of peptide spectra with variable modifications. E.g. only one out of several methionines oxidised. Is there any help needed?

sgibb · 2018-11-26T13:34:51Z

Unfortunately I have much to do with my real job. We started to implement the unimod modifications in the unimod package. Currently it can download and parse the data from https://unimod.org and can calculate the mass of given sequences with fixed modifications.

We would need this functionality in more than one package (MSnbase, Pbase, topdownr, ...). So your help would be very welcome. Please don't hesitate to create issues in the unimod repository if you want to help and/or need some explanation what we did.

sgibb self-assigned this Oct 24, 2016

sgibb added the feature request label Oct 24, 2016

sgibb mentioned this issue Jul 2, 2017

How to create own Modifications with specific location? rformassspectrometry/unimod#2

Open

sgibb mentioned this issue Feb 11, 2019

New unimod developments rformassspectrometry/unimod#7

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

define variable modifications in calculateFragments() #167

define variable modifications in calculateFragments() #167

adder commented Oct 24, 2016

sgibb commented Oct 24, 2016

adder commented Oct 24, 2016

sgibb commented Oct 24, 2016

sgibb commented Oct 24, 2016

sgibb commented Oct 24, 2016

adder commented Oct 24, 2016

lgatto commented Oct 27, 2016

adder commented Oct 27, 2016

lgatto commented Oct 27, 2016

sgibb commented Oct 28, 2016

sgibb commented Jul 1, 2017 •

edited

Loading

lgatto commented Jul 1, 2017

yafeng commented Dec 14, 2017

yafeng commented Dec 15, 2017 •

edited

Loading

sgibb commented Dec 15, 2017

higsch commented Nov 26, 2018

sgibb commented Nov 26, 2018

define variable modifications in calculateFragments() #167

define variable modifications in calculateFragments() #167

Comments

adder commented Oct 24, 2016

sgibb commented Oct 24, 2016

adder commented Oct 24, 2016

sgibb commented Oct 24, 2016

sgibb commented Oct 24, 2016

sgibb commented Oct 24, 2016

adder commented Oct 24, 2016

lgatto commented Oct 27, 2016

adder commented Oct 27, 2016

lgatto commented Oct 27, 2016

sgibb commented Oct 28, 2016

sgibb commented Jul 1, 2017 • edited Loading

lgatto commented Jul 1, 2017

yafeng commented Dec 14, 2017

yafeng commented Dec 15, 2017 • edited Loading

sgibb commented Dec 15, 2017

higsch commented Nov 26, 2018

sgibb commented Nov 26, 2018

sgibb commented Jul 1, 2017 •

edited

Loading

yafeng commented Dec 15, 2017 •

edited

Loading