quantms icon indicating copy to clipboard operation
quantms copied to clipboard

Support for Phosphoproteomics datasets.

Open ypriverol opened this issue 2 years ago • 9 comments

quantms 1.4 phosphoproteomics

In version 1.4 of quantms phosphoproteomics experiments can be done using the Luciphor2 tool. Phosphoproteomics experiments are analysed in the following way:

  • Every search engine reports the phoshposite identified.
  • Luciphor2 is used to assess the phosphosite localization computing the FLR, apply filters 1% FDR at msrun level

The major issue we have found during the first iterations of the use of the tool:

Luciphor2 split each PSMs in groups by charge state and to perform the model, it needs at least 50 PSM by category (this parameter is configurable but the recommended is at least 50).

  • [ ] Note: This triggers major issues for categories like 4,5 charge state where the algorithm may get a small number of PSMs and the model fails to produce the statistics for each phosphosite.
  • [ ] Even if we reduce the number of PSMs, we will need to test the impact on the results when a lower number is used.

Because of this issue, we may need to explore how to group multiple runs into one to increase the number of psms by charge state, we have two options here:

  • [ ] Perform Luciphor2 at the level of the experiment, grouping all the msruns from all samples altogether. This is the most accurate approach statistically because the FLRs will be computed at the level of the entire experiment. However, it will be the most computationally expensive and for large CPTAC datasets probably not possible with Luciphor2, which may not be able to handle more than 100 RAW files and millions of PSMs. Luciphor2 has been tested in quantms only with less than 10 RAW files.
  • [ ] Group the msruns by sample, only RAW files from the same sample will be grouped, this may be difficult in TMT experiments but in LFQ it may be a solution between single msruns and the entire experiment.

Alternatives approaches:

We should explore the recent Alanine decoy approach, recently published in JPR. By adding decoy phosphosites we may be able to construct a TDA (target-decoy approach) with any of the available probability scores systems of OpenMS such as AScore or PhosphoRS.

  • [ ] Review the phospho scoring methods in OpenMS and benchmark them.
  • [ ] Enable Alanine as a possible phospho site. Definition here is needed in the OpenMS modification database.
  • [ ] Develop the TDA approach in OpenMS
  • [ ] Benchmark the approach against Luciphor2 strategy already existing in quantms 1.2.

Additionally, to develop a new TDA approach based on ARScore or PhosphoRS, we can adopt PTMPhrophet algorithm in the tool which will provide the framework for the to compute the Probabilities + a model for LFR approach. The disadvantage is that we may need to work with the PTMPhrophet team to make the tool a standalone tool to be included it in bioconda/biocontainers, also we may need to do the adapters for OpenMS.

Benchmark and results

We have a large collection of CPTAC datasets in phosphoproteomics well annotated that can be used to perform a reanalysis and generate a phospho map by tumor and cancer types. The focus of the benchmark will be pure technical:

  • [ ] Speed and performance, the final strategy in quantms 1.X must be faster and more scalable than the current Luciphor2 approach.
  • [ ] Results must be accurate and comparable with Luciphor2 approach and other tools such as MQ.

ypriverol avatar Jan 09 '24 09:01 ypriverol

When I was testing the PhosphoScoring algorithm in OpenMS, I found that the scores of (PhosphoDecoy) are all -1, and none of them exceed 20.

<PeptideIdentification score_type="PhosphoScore" higher_score_better="true" significance_threshold="0.0" MZ="572.7957763671875" RT="5197.272600000000239" spectrum_reference="controllerType=0 controllerNumber=1 scan=12260" >
            <PeptideHit score="-1.0" sequence="ALLSLWYA(PhosphoDecoy)K" charge="2" aa_before="K" aa_after="A" start="9540" end="9548" protein_refs="PH_14022" >
                <UserParam type="string" name="target_decoy" value="target"/>
                <UserParam type="int" name="MS:1002049" value="41"/>
                <UserParam type="int" name="MS:1002050" value="84"/>
                <UserParam type="float" name="MS:1002052" value="8.0197225e-08"/>
                <UserParam type="float" name="MS:1002053" value="2.368002"/>
                <UserParam type="string" name="AssumedDissociationMethod" value="HCD"/>
                <UserParam type="string" name="CTermIonCurrentRatio" value="0.17897747"/>
                <UserParam type="string" name="ExplainedIonCurrentRatio" value="0.43809128"/>
                <UserParam type="string" name="MS2IonCurrent" value="1223758.9"/>
                <UserParam type="string" name="MeanErrorAll" value="8.748462"/>
                <UserParam type="string" name="MeanErrorTop7" value="8.748462"/>
                <UserParam type="string" name="MeanRelErrorAll" value="8.748462"/>
                <UserParam type="string" name="MeanRelErrorTop7" value="8.748462"/>
                <UserParam type="string" name="NTermIonCurrentRatio" value="0.2591138"/>
                <UserParam type="string" name="NumMatchedMainIons" value="7"/>
                <UserParam type="string" name="StdevErrorAll" value="2.5852308"/>
                <UserParam type="string" name="StdevErrorTop7" value="2.5852308"/>
                <UserParam type="string" name="StdevRelErrorAll" value="2.5852308"/>
                <UserParam type="string" name="StdevRelErrorTop7" value="2.5852308"/>
                <UserParam type="float" name="calcMZ" value="572.79376220703125"/>
                <UserParam type="int" name="pass_threshold" value="1"/>
                <UserParam type="int" name="start" value="9541"/>
                <UserParam type="int" name="end" value="9549"/>
                <UserParam type="string" name="isotope_error" value="0"/>
                <UserParam type="string" name="protein_references" value="unique"/>
                <UserParam type="float" name="MSGF:ScoreRatio" value="0.488095238095238"/>
                <UserParam type="float" name="MSGF:Energy" value="43.0"/>
                <UserParam type="float" name="MSGF:lnEValue" value="-0.862046561615997"/>
                <UserParam type="float" name="MSGF:lnExplainedIonCurrentRatio" value="-0.825099751607999"/>
                <UserParam type="float" name="MSGF:lnNTermIonCurrentRatio" value="-1.350102075180515"/>
                <UserParam type="float" name="MSGF:lnCTermIonCurrentRatio" value="-1.719936773473298"/>
                <UserParam type="float" name="MSGF:lnMS2IonCurrent" value="14.017437745527683"/>
                <UserParam type="float" name="MSGF:MeanErrorTop7" value="8.748462"/>
                <UserParam type="float" name="MSGF:sqMeanErrorTop7" value="76.535587365444002"/>
                <UserParam type="float" name="MSGF:StdevErrorTop7" value="2.5852308"/>
                <UserParam type="float" name="SpecEValue_score" value="8.0197225e-08"/>
            </PeptideHit>

There is also the situation where both (Phospho) and (PhosphoDecoy) exist, but scoring is only done for (Phospho). It seems like it did not score (PhosphoDecoy).

<PeptideIdentification score_type="PhosphoScore" higher_score_better="true" significance_threshold="0.0" MZ="582.263427734375" RT="5157.471299999999246" spectrum_reference="controllerType=0 controllerNumber=1 scan=12160" >
            <PeptideHit score="43.996780554197159" sequence="TVLT(Phospho)ELQA(PhosphoDecoy)K" charge="2" aa_before="K K K" aa_after="I I I" start="636 636 635" end="644 644 643" protein_refs="PH_8954 PH_9066 PH_9169" >
                <UserParam type="string" name="target_decoy" value="target"/>
                <UserParam type="int" name="MS:1002049" value="20"/>
                <UserParam type="int" name="MS:1002050" value="66"/>
                <UserParam type="float" name="MS:1002052" value="1.7487604e-07"/>
                <UserParam type="float" name="MS:1002053" value="5.163605"/>
                <UserParam type="string" name="AssumedDissociationMethod" value="HCD"/>
                <UserParam type="string" name="CTermIonCurrentRatio" value="0.049498077"/>
                <UserParam type="string" name="ExplainedIonCurrentRatio" value="0.07711499"/>
                <UserParam type="string" name="MS2IonCurrent" value="1221309.8"/>
                <UserParam type="string" name="MeanErrorAll" value="9.154647"/>
                <UserParam type="string" name="MeanErrorTop7" value="9.154647"/>
                <UserParam type="string" name="MeanRelErrorAll" value="9.154647"/>
                <UserParam type="string" name="MeanRelErrorTop7" value="9.154647"/>
                <UserParam type="string" name="NTermIonCurrentRatio" value="0.027616916"/>
                <UserParam type="string" name="NumMatchedMainIons" value="5"/>
                <UserParam type="string" name="StdevErrorAll" value="5.3728747"/>
                <UserParam type="string" name="StdevErrorTop7" value="5.3728747"/>
                <UserParam type="string" name="StdevRelErrorAll" value="5.3728747"/>
                <UserParam type="string" name="StdevRelErrorTop7" value="5.3728747"/>
                <UserParam type="float" name="calcMZ" value="581.761474609375"/>
                <UserParam type="int" name="pass_threshold" value="1"/>
                <UserParam type="int" name="start" value="636"/>
                <UserParam type="int" name="end" value="644"/>
                <UserParam type="string" name="isotope_error" value="1"/>
                <UserParam type="string" name="protein_references" value="non-unique"/>
                <UserParam type="float" name="MSGF:ScoreRatio" value="0.303030303030303"/>
                <UserParam type="float" name="MSGF:Energy" value="46.0"/>
                <UserParam type="float" name="MSGF:lnEValue" value="-1.641634978966627"/>
                <UserParam type="float" name="MSGF:lnExplainedIonCurrentRatio" value="-2.561161669815697"/>
                <UserParam type="float" name="MSGF:lnNTermIonCurrentRatio" value="-3.585712366261032"/>
                <UserParam type="float" name="MSGF:lnCTermIonCurrentRatio" value="-3.003803216164127"/>
                <UserParam type="float" name="MSGF:lnMS2IonCurrent" value="14.015434447363456"/>
                <UserParam type="float" name="MSGF:MeanErrorTop7" value="16.274927999999999"/>
                <UserParam type="float" name="MSGF:sqMeanErrorTop7" value="470.885833609215865"/>
                <UserParam type="float" name="MSGF:StdevErrorTop7" value="9.551777244444445"/>
                <UserParam type="float" name="SpecEValue_score" value="1.7487604e-07"/>
                <UserParam type="string" name="search_engine_sequence" value="TVLT(Phospho)ELQA(PhosphoDecoy)K"/>
                <UserParam type="float" name="AScore_pep_score" value="44.5564621785445"/>
                <UserParam type="float" name="AScore_1" value="43.996780554197159"/>
            </PeptideHit>

weizhongchun avatar Mar 21 '25 08:03 weizhongchun

I only made AScore not fail ;) I think one needs to also consider PhosphoDecoy for scoring in the AScore algorithm. This needs to be added.

timosachsenberg avatar Mar 21 '25 10:03 timosachsenberg

Hi, thank you for your response. I previous thought Ascore was also supposed to score PhosphoDecoy and found this situation.

weizhongchun avatar Mar 21 '25 11:03 weizhongchun

hi @weizhongchun I gave it a quick try: https://github.com/OpenMS/OpenMS/pull/7933 . Can you test it?

timosachsenberg avatar Mar 21 '25 19:03 timosachsenberg

Hi @timosachsenberg I tested the modified Ascore method, but the result file idXML still seems the same, without scoring PhosphoDecoy. I was wondering if I made a mistake. Do I need to provide the input test file for you to test?

weizhongchun avatar Mar 22 '25 06:03 weizhongchun

Thanks for reporting. I need to do a small fix. Actually it should work if you set the new add_decoys flag. Can you send me a test file (idXML) if that is not the case?

timosachsenberg avatar Mar 22 '25 07:03 timosachsenberg

OK, Thank you for your response. I'll send it on Slack.

weizhongchun avatar Mar 22 '25 07:03 weizhongchun

I think some of them have a score higher than -1 now.

<PeptideHit score="19.221235956407195" sequence="ALLSLITA(PhosphoDecoy)K"

Can you check? I am not 100% familiar with the algorithm, though. PhosphoScoring currently seem to assign the best scoring permutation (e.g., it changes the output of the original search engine). Can you guys guide me what you need as output and I might be able to assist.

timosachsenberg avatar Mar 22 '25 20:03 timosachsenberg

That should be correct, thank you, just score (PhosphoDecoy) as normal as (Phospho), determine whether the site is credible according to Ascore.

weizhongchun avatar Mar 23 '25 00:03 weizhongchun