Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

QC03 with FragPipe + RT + EICExtractor test #233

Open
rolivella opened this issue May 27, 2024 · 42 comments
Open

QC03 with FragPipe + RT + EICExtractor test #233

rolivella opened this issue May 27, 2024 · 42 comments
Assignees

Comments

@rolivella
Copy link
Contributor

rolivella commented May 27, 2024

Pseudocode

  • Search QC03 with FragPipe. The one I have at toy datasets.
  • Get apex RT.
  • Run EICExtractor and check.
@rolivella rolivella self-assigned this May 27, 2024
@rolivella
Copy link
Contributor Author

QC03 - ALL JOBS DONE IN 3.8 MINUTES

@rolivella
Copy link
Contributor Author

rolivella commented Jun 4, 2024

Peptide area

It has been done with only EICExtractor, so without identifying first the peptide: #200 (comment)

Protein and peptide counting

FragPipe

proteomics@hipnos6:~/mysoftware/FragPipe-22-test-qc03/output-22.0-run1/1_1$ cat peptide.tsv | wc -l
4258
proteomics@hipnos6:~/mysoftware/FragPipe-22-test-qc03/output-22.0-run1/1_1$ cat protein.tsv | wc -l
834
proteomics@hipnos6:~/mysoftware/FragPipe-22-test-qc03/output-22.0-run1/1_1$ cat psm.tsv | wc -l
11026

OpenMS

qcloud@nextflow2:/users/pr/qcloud/test/toy_dataset/190215_Q_QC03_01_04_25ng_6583a564-93dd-4500-a101-b2fe56496b25_QC03_2bf4293c4d1c8c891fab774cf973f7e9$ cat 6583a564-93dd-4500-a101-b2fe56496b25_QC03_2bf4293c4d1c8c891fab774cf973f7e9_hcd_QC_1002011.json
{
  "file" : {
    "checksum" : "2bf4293c4d1c8c891fab774cf973f7e9"
  },
  "data" : [ {
    "parameter" : {
      "qCCV" : "QC:9000001"
    },
    "values" : [ {
      "value" : "1463",
      "contextSource" : "QC:1002011"
    } ]
  } ]
}

qcloud@nextflow2:/users/pr/qcloud/test/toy_dataset/190215_Q_QC03_01_04_25ng_6583a564-93dd-4500-a101-b2fe56496b25_QC03_2bf4293c4d1c8c891fab774cf973f7e9$ cat 6583a564-93dd-4500-a101-b2fe56496b25_QC03_2bf4293c4d1c8c891fab774cf973f7e9_hcd_QC_1002010.json
{
  "file" : {
    "checksum" : "2bf4293c4d1c8c891fab774cf973f7e9"
  },
  "data" : [ {
    "parameter" : {
      "qCCV" : "QC:9000001"
    },
    "values" : [ {
      "value" : "3884",
      "contextSource" : "QC:1002010"
    } ]
  } ]
}

[Term]
id: QC:1002012
name: total number of PSM HCD
def: "total number of PSM HCD"
is:a: QC:9000001

PSM HCD: 4024

@rolivella rolivella changed the title QC03 with FragPipe + RT + EICExtractor QC03 with FragPipe + RT + EICExtractor test Jun 6, 2024
@rolivella
Copy link
Contributor Author

rolivella commented Jun 6, 2024

Protein and peptide counting

It should be done by type of fragmentation: msfragger.activation_types=HCD

@rolivella
Copy link
Contributor Author

rolivella commented Jun 6, 2024

Processed file: QC03_2bf4293c4d1c8c891fab774cf973f7e9.raw

FragPipe

FragPipe version: 22.0
FragPipe workflow: see proteomics@hipnos6:~/mysoftware/FragPipe-22-test-qc03/output-22.0-run2-hcd

Summary

HCD FragPipe --prot 1.0 OpenMS
PSMs 4309 4024
peptides 4026 3884
proteins 882 1463
CID FragPipe --prot 1.0 OpenMS
PSMs 4182 3817
peptides 3935 3683
proteins 859 1383
ETHCD FragPipe --prot 1.0 OpenMS
PSMs 724 2604
peptides 574 2549
proteins 328 1172
ETCID FragPipe --prot 1.0 OpenMS
PSMs 0 3224
peptides 0 3110
proteins 0 1301

Because QC03_2bf4293c4d1c8c891fab774cf973f7e9.pepXML is empty for ETCID...
Asked why: Nesvilab/FragPipe#1617 --> I should reprocess but before separate them by activation type

@rolivella
Copy link
Contributor Author

rolivella commented Jun 6, 2024

Could we search with PD 2.5 this QC03 HCD, etc.? Asked to Cristina.

@rolivella
Copy link
Contributor Author

rolivella commented Jun 6, 2024

QC03

File Parmeter Parsing method
QC03 # Peptides CID FragPipe
QC03 # Peptides HCD FragPipe
QC03 # Proteins ETCID FragPipe
QC03 # Peptides ETHCD FragPipe
QC03 # Proteins CID FragPipe
QC03 # Proteins HCD FragPipe
QC03 # Proteins ETCID FragPipe
QC03 # Proteins ETHCD FragPipe
QC03 Mass Accuracy (ppm) OpenMS EICExtractor
QC03 TIC (sum) x1e10 mzML
QC03 MIT time MS1 mzML
QC03 MIT MS2 mzML
QC03 RT Drift (min) OpenMS EICExtractor
QC03 Peak Area YVY 100, 10, 1, 0.1, 0.01 Only OpenMS EICExtractor
QC03 Peak Area LLS 100, 10, 1, 0.1, 0.01 Only OpenMS EICExtractor
QC03 Peak Area LGF 100, 10, 1, 0.1, 0.01 Only OpenMS EICExtractor
QC03 Peak Area VTS 100, 10, 1, 0.1, 0.01 Only OpenMS EICExtractor
QC03 Peak Area VVG 100, 10, 1, 0.1, 0.01 Only OpenMS EICExtractor
QC03 Peak Area LAS 100, 10, 1, 0.1, 0.01 Only OpenMS EICExtractor

References:

@rolivella
Copy link
Contributor Author

As FragPipe does not work with several activation methods, we should filter as:

  • CID: remove_activation=Electron transfer dissociation, select_activation=Collision-induced dissociation
  • HCD: QExactive and Velos remove_activation=none, select_activation=none. Non Qexactive, remove_activation=Electron transfer dissociation, select_activation=High-energy collision-induced dissociation
  • ETCID: select_activation=Electron transfer dissociation, select_activation=Collision-induced dissociation
  • ETHCD:s elect_activation=Electron transfer dissociation, select_activation=High-energy collision-induced dissociation

@rolivella
Copy link
Contributor Author

rolivella commented Jun 10, 2024

  • CID: singularity exec ~/mygit/atlas-imgs/ghcr.io-openms-openms-executables-3.1.0.img FileFilter -in ~/mysoftware/openms/QC03/6583a564-93dd-4500-a101-b2fe56496b25_QC03_2bf4293c4d1c8c891fab774cf973f7e9.ok.mzML -spectra:remove_activation "Electron transfer dissociation" -spectra:select_activation "Collision-induced dissociation" -out QC03_2bf4293c4d1c8c891fab774cf973f7e9_CID.mzML
  • HCD: singularity exec ~/mygit/atlas-imgs/ghcr.io-openms-openms-executables-3.1.0.img FileFilter -in ~/mysoftware/openms/QC03/6583a564-93dd-4500-a101-b2fe56496b25_QC03_2bf4293c4d1c8c891fab774cf973f7e9.ok.mzML -spectra:remove_activation "Electron transfer dissociation" -spectra:select_activation "High-energy collision-induced dissociation" -out QC03_2bf4293c4d1c8c891fab774cf973f7e9_HCD.mzML
  • ETCID: singularity exec ~/mygit/atlas-imgs/ghcr.io-openms-openms-executables-3.1.0.img FileFilter -in ~/mysoftware/openms/QC03/6583a564-93dd-4500-a101-b2fe56496b25_QC03_2bf4293c4d1c8c891fab774cf973f7e9.ok.mzML -spectra:select_activation "Electron transfer dissociation" -spectra:select_activation "Collision-induced dissociation" -out QC03_2bf4293c4d1c8c891fab774cf973f7e9_ETCID.mzML
  • ETHCD: singularity exec ~/mygit/atlas-imgs/ghcr.io-openms-openms-executables-3.1.0.img FileFilter -in ~/mysoftware/openms/QC03/6583a564-93dd-4500-a101-b2fe56496b25_QC03_2bf4293c4d1c8c891fab774cf973f7e9.ok.mzML -spectra:select_activation "Electron transfer dissociation" -spectra:select_activation "High-energy collision-induced dissociation" -out QC03_2bf4293c4d1c8c891fab774cf973f7e9_ETHCD.mzML

@rolivella
Copy link
Contributor Author

If I test HCD, I get:

Failed in checking /home/tmp/QC03_2bf4293c4d1c8c891fab774cf973f7e9_HCD.mzML. Will ignore it.
null
There are corrupted files. Please remove those files and re-start the task.

@rolivella
Copy link
Contributor Author

If I test ETCID:

ProteinProphet [Work dir: /home/proteomics/mysoftware/FragPipe-22-test-qc03/output-22.0-run4-etcid]
/fragpipe_bin/fragPipe-22.0/fragpipe/tools/Philosopher/philosopher-v5.1.1 proteinprophet --maxppmdiff 2000000 --output combined /home/proteomics/mysoftware/FragPipe-22-test-qc03/output-22.0-run4-etcid/filelist_proteinprophet.txt
time="09:46:21" level=info msg="Executing ProteinProphet  v5.1.1"
time="09:46:21" level=error msg="Cannot execute program. there was an error with ProteinProphet, please check your parameters and input files"

@rolivella
Copy link
Contributor Author

rolivella commented Jun 11, 2024

  • Let's start with what I did at the QCloud for ETHCD:

    • First OpenMS FileFilter with select_activation = Electron transfer dissociation
    • Second OpenMS FileFilter with select_activation = High-energy collision-induced dissociation
  • ETHCD:

    • electron transfer dissociation
    • supplemental beam-type collision-induced dissociation
  • ETCID:

    • electron transfer dissociation
    • supplemental collision-induced dissociation
  • CID: collision-induced dissociation

  • HCD: beam-type collision-induced dissociation

@rolivella
Copy link
Contributor Author

rolivella commented Jun 11, 2024

OpenMS FileFilter

First CID and HCD

CID
singularity exec ~/mygit/atlas-imgs/ghcr.io-openms-openms-executables-3.1.0.img FileFilter -in QC03_2bf4293c4d1c8c891fab774cf973f7e9.mzML -spectra:select_activation "Collision-induced dissociation" -out QC03_2bf4293c4d1c8c891fab774cf973f7e9_CID.mzML

HCD
singularity exec ~/mygit/atlas-imgs/ghcr.io-openms-openms-executables-3.1.0.img FileFilter -in QC03_2bf4293c4d1c8c891fab774cf973f7e9.mzML -spectra:select_activation "beam-type collision-induced dissociation" -out QC03_2bf4293c4d1c8c891fab774cf973f7e9_HCD.mzML

Second ETCID and ETHCD

I can't because: OpenMS/OpenMS#7499

@rolivella
Copy link
Contributor Author

FragPipe again for HCD

And again the same error:

Checking spectral files...
Failed in checking /home/tmp/QC03_2bf4293c4d1c8c891fab774cf973f7e9_HCD.mzML. Will ignore it.
null
There are corrupted files. Please remove those files and re-start the task.
/home/tmp/QC03_2bf4293c4d1c8c891fab774cf973f7e9_HCD.mzML: Scans = 0; ITMS: false; FTMS: false; Isolation sizes = []
Process 'MSFragger' finished, exit code: 1
Process returned non-zero exit code, stopping

@rolivella
Copy link
Contributor Author

Opened compomics/ThermoRawFileParser#182 by suggestion of Timo S.

@rolivella
Copy link
Contributor Author

rolivella commented Jun 12, 2024

Proteowizard mzML conversion + FragPipe

Conversion

For CID, docker run -it --rm -e WINEDEBUG=-all -v /home/proteomics/mysoftware/proteowizard:/data chambm/pwiz-skyline-i-agree-to-the-vendor-licenses wine msconvert /data/QC03_2bf4293c4d1c8c891fab774cf973f7e9.raw --filter "peakPicking true 1-" --filter "activation cid" --outfile QC03_2bf4293c4d1c8c891fab774cf973f7e9_filtered_cid_centroided.mzML

For HCD, docker run -it --rm -e WINEDEBUG=-all -v /home/proteomics/mysoftware/proteowizard:/data chambm/pwiz-skyline-i-agree-to-the-vendor-licenses wine msconvert /data/QC03_2bf4293c4d1c8c891fab774cf973f7e9.raw --filter "peakPicking true 1-" --filter "activation hcd" --outfile QC03_2bf4293c4d1c8c891fab774cf973f7e9_filtered_hcd_centroided.mzML

But I don't know for ETCID and ETHCD as refered in the doc:

activation <precursor_activation_type>
Keeps only spectra whose precursors have the specifed activation type.  It doesn't affect non-MS spectra, and doesn't affect MS1 spectra. Use it to create output files containing only ETD or CID MSn data where both activation modes have been interleaved within a given input vendor data file (eg: Thermo's Decision Tree acquisition mode).
   <precursor_activation_type> is any one of: ETD CID SA HCD HECID BIRD ECD IRMPD PD PSD PQD SID or SORI.

FragPipe

proteomics@hipnos6:~/mysoftware/FragPipe-22-test-qc03/output-22.0-run3-cid/1_1$ cat peptide.tsv | wc -l
3982
proteomics@hipnos6:~/mysoftware/FragPipe-22-test-qc03/output-22.0-run3-cid/1_1$ cat protein.tsv | wc -l
867


proteomics@hipnos6:~/mysoftware/FragPipe-22-test-qc03/output-22.0-run2-hcd/1_1$ cat protein.tsv | wc -l
901
proteomics@hipnos6:~/mysoftware/FragPipe-22-test-qc03/output-22.0-run2-hcd/1_1$ cat peptide.tsv | wc -l
4172

@rolivella
Copy link
Contributor Author

rolivella commented Jun 12, 2024

Using PD 2.5

Done by Cristina:

QC03_2bf4293c4d1c8c891fab774cf973f7e9_PeptideGroups.zip
QC03_2bf4293c4d1c8c891fab774cf973f7e9_PSMs.zip
QC03_2bf4293c4d1c8c891fab774cf973f7e9-master-high_Proteins.zip

Filtering only the accession from both "Accession" (Proteins file) and "Protein accessions" + "activation" (PSM file):

QC03_2bf4293c4d1c8c891fab774cf973f7e9-master-high_Proteins_only_accession.csv
QC03_2bf4293c4d1c8c891fab774cf973f7e9-PSM-only-acession.csv

Counting proteins:

proteomics@hipnos6:~/mysoftware/bash/merge_csv$ mlr --fs ";" --csv join -j Accession -f QC03_2bf4293c4d1c8c891fab774cf973f7e9-master-high_Proteins_only_accession.csv QC03_2bf4293c4d1c8c891fab774cf973f7e9-PSM-only-acession.csv | grep CID | wc -l
2977
proteomics@hipnos6:~/mysoftware/bash/merge_csv$ mlr --fs ";" --csv join -j Accession -f QC03_2bf4293c4d1c8c891fab774cf973f7e9-master-high_Proteins_only_accession.csv QC03_2bf4293c4d1c8c891fab774cf973f7e9-PSM-only-acession.csv | grep HCD | wc -l
3125
proteomics@hipnos6:~/mysoftware/bash/merge_csv$ mlr --fs ";" --csv join -j Accession -f QC03_2bf4293c4d1c8c891fab774cf973f7e9-master-high_Proteins_only_accession.csv QC03_2bf4293c4d1c8c891fab774cf973f7e9-PSM-only-acession.csv | grep ETD | wc -l
2964
proteomics@hipnos6:~/mysoftware/bash/merge_csv$ mlr --fs ";" --csv join -j Accession -f QC03_2bf4293c4d1c8c891fab774cf973f7e9-master-high_Proteins_only_accession.csv QC03_2bf4293c4d1c8c891fab774cf973f7e9-PSM-only-acession.csv | grep EThcD | wc -l
2463

Counting peptides:

proteomics@hipnos6:~/mysoftware/bash/merge_csv$ mlr --fs ";" --csv join -j Accession -f QC03_2bf4293c4d1c8c891fab774cf973f7e9_PD25_Peptides.csv QC03_2bf4293c4d1c8c891fab774cf973f7e9-PSM-only-acession.csv | grep CID | wc -l
25574
proteomics@hipnos6:~/mysoftware/bash/merge_csv$ mlr --fs ";" --csv join -j Accession -f QC03_2bf4293c4d1c8c891fab774cf973f7e9_PD25_Peptides.csv QC03_2bf4293c4d1c8c891fab774cf973f7e9-PSM-only-acession.csv | grep HCD | wc -l
27016
proteomics@hipnos6:~/mysoftware/bash/merge_csv$ mlr --fs ";" --csv join -j Accession -f QC03_2bf4293c4d1c8c891fab774cf973f7e9_PD25_Peptides.csv QC03_2bf4293c4d1c8c891fab774cf973f7e9-PSM-only-acession.csv | grep ETD | wc -l
25458
proteomics@hipnos6:~/mysoftware/bash/merge_csv$ mlr --fs ";" --csv join -j Accession -f QC03_2bf4293c4d1c8c891fab774cf973f7e9_PD25_Peptides.csv QC03_2bf4293c4d1c8c891fab774cf973f7e9-PSM-only-acession.csv | grep EThcD | wc -l
21422

@edunivers
Copy link

High-energy collision-induced dissociation - HCD
Electron transfer dissociation - ETD
Collision-induced dissociation - CID

@rolivella
Copy link
Contributor Author

rolivella commented Jun 17, 2024

"Eduard" filtering strategy with FileFilter:

  • CID: Select "Collision-induced dissociation" + Remove "Electron transfer dissociation"
  • HCD: Select "High-energy collision-induced dissociation" + Remove "Electron transfer dissociation"
  • EThcD: Select "High-energy collision-induced dissociation" + Select "Electron transfer dissociation"
  • EtciD: Select "Collision-induced dissociation" + Select "Electron transfer dissociation"

@rolivella
Copy link
Contributor Author

Following Eduard's strategy does not work:

Checking spectral files...
Failed in checking /home/tmp/QC03_2bf4293c4d1c8c891fab774cf973f7e9_HCD.mzML. Will ignore it.
null
There are corrupted files. Please remove those files and re-start the task.
/home/tmp/QC03_2bf4293c4d1c8c891fab774cf973f7e9_HCD.mzML: Scans = 0; ITMS: false; FTMS: false; Isolation sizes = []
Process 'MSFragger' finished, exit code: 1
Process returned non-zero exit code, stopping

@rolivella
Copy link
Contributor Author

New version of FileFilter should be available in short (if not now): OpenMS/OpenMS#7499 (comment). Should be checked.

@rolivella
Copy link
Contributor Author

rolivella commented Jun 19, 2024

Why HCD filtering is not working with present FileFilter version?

After doing the first selection, this is the only avtivation type left:

<activation>
        <cvParam cvRef="MS" accession="MS:1000044" name="dissociation method" />
</activation>

@rolivella
Copy link
Contributor Author

rolivella commented Jun 19, 2024

From the source code of FileFilter: https://github.com/OpenMS/OpenMS/blob/2c47a4dad7d14a1a2e3681ff94142da72fee9b64/src/openms/source/FORMAT/HANDLERS/MzMLHandler.cpp

We have that for CID:

 else if (accession == "MS:1000133") //collision-induced dissociation
          {
            spec_.getPrecursors().back().getActivationMethods().insert(Precursor::CID);
          }

For HCD:

 else if (accession == "MS:1000422") //beam-type collision-induced dissociation / HCD
          {
            spec_.getPrecursors().back().getActivationMethods().insert(Precursor::HCD);
          }

For ETCID and ETHCD:

       else if (accession == "MS:1003182"  //electron transfer and collision-induced dissociation
            || accession == "MS:1002679")  // workaround: supplemental collision-induced dissociation (see https://github.com/compomics/ThermoRawFileParser/issues/182)
          {
            spec_.getPrecursors().back().getActivationMethods().insert(Precursor::ETciD);
          }
          else if (accession == "MS:1002631" //electron transfer and higher-energy collision dissociation
            || accession == "MS:1002678") // workaround: supplemental beam-type collision-induced dissociation (see https://github.com/compomics/ThermoRawFileParser/issues/182)
          {
            spec_.getPrecursors().back().getActivationMethods().insert(Precursor::EThcD);
          }

@rolivella
Copy link
Contributor Author

rolivella commented Jun 19, 2024

I test again CID:

File: QC03_2bf4293c4d1c8c891fab774cf973f7e9_select_activation_cid.mzML
FileFilter command line: singularity exec ~/mygit/atlas-imgs/ghcr.io-openms-openms-executables-3.1.0.img FileFilter -in QC03_2bf4293c4d1c8c891fab774cf973f7e9.mzML -spectra:select_activation "Collision-induced dissociation" -out QC03_2bf4293c4d1c8c891fab774cf973f7e9_select_activation_cid.mzML
FragPipe command line: singularity exec -e --bind /home/proteomics/mysoftware/FragPipe-22-test-qc03:/home/tmp docker://proteomicsunitcrg/fragpipe:22.0 /fragpipe_bin/fragPipe-22.0/fragpipe/bin/fragpipe --headless --config-tools-folder /home/proteomics/mysoftware/FragPipe-extra-tools-22.0 --workflow /home/proteomics/mysoftware/FragPipe-22-test-qc03/output-22.0-run3-cid/fragpipe-qc03-run3-cid.workflow --manifest /home/proteomics/mysoftware/FragPipe-22-test-qc03/output-22.0-run3-cid/fragpipe-files.fp-manifest --workdir /home/proteomics/mysoftware/FragPipe-22-test-qc03/output-22.0-run3-cid

Output: 3033 peptides, 797 proteins

For HCD:

File: QC03_2bf4293c4d1c8c891fab774cf973f7e9_select_activation_hcd.mzML
FileFilter command line: singularity exec ~/mygit/atlas-imgs/ghcr.io-openms-openms-executables-3.1.0.img FileFilter -in QC03_2bf4293c4d1c8c891fab774cf973f7e9.mzML -spectra:select_acti vation "beam-type collision-induced dissociation" -out QC03_2bf4293c4d1c8c891fab774cf973f7e9_select_activation_hcd.mzML
FragPipe command line: singularity exec -e --bind /home/proteomics/mysoftware/FragPipe-22-test-qc03:/home/tmp docker://proteomicsunitcrg/fragpipe:22.0 /fragpipe_bin/fragPipe-22.0/fragpipe/bin/fragpipe --headless --config-tools-folder /home/proteomics/mysoftware/FragPipe-extra-tools-22.0 --workflow /home/proteomics/mysoftware/FragPipe-22-test-qc03/output-22.0-run2-hcd/fragpipe-qc03-run2-hcd.workflow --manifest /home/proteomics/mysoftware/FragPipe-22-test-qc03/output-22.0-run2-hcd/fragpipe-files.fp-manifest --workdir /home/proteomics/mysoftware/FragPipe-22-test-qc03/output-22.0-run2-hcd

Output: 2945 peptides, 774proteins

But for both I got:

2024-06-19 15:23:21 [ERROR] - There are only 0 MS1 scans in /home/tmp/QC03_2bf4293c4d1c8c891fab774cf973f7e9_select_activation_hcd.mzML.
Process 'IonQuant' finished, exit code: 1
Process returned non-zero exit code, stopping

But for instance I have:

        <run id="ru_0" defaultInstrumentConfigurationRef="ic_0" sampleRef="sa_0" startTimeStamp="2019-02-28T10:13:26" defaultSourceFileRef="sf_ru_0">
                <userParam name="mzml_id" type="xsd:string" value="QC03_2bf4293c4d1c8c891fab774cf973f7e9"/>
                <spectrumList count="6356" defaultDataProcessingRef="dp_sp_0">
                        <spectrum id="controllerType=0 controllerNumber=1 scan=3" index="0" defaultArrayLength="424" dataProcessingRef="dp_sp_0">
                                <cvParam cvRef="MS" accession="MS:1000127" name="centroid spectrum" />
                                <cvParam cvRef="MS" accession="MS:1000511" name="ms level" value="2" />

@rolivella
Copy link
Contributor Author

rolivella commented Jun 26, 2024

Done by cchiva using PD 2.5:

File Counts Description
QC03_2bf4293c4d1c8c891fab774cf973f7e9-CID-master_Proteins.txt 815 CID Master Proteins
QC03_2bf4293c4d1c8c891fab774cf973f7e9-CID_PeptideGroups.txt 3376 CID Peptide Groups
QC03_2bf4293c4d1c8c891fab774cf973f7e9-CID_PSMs.txt 3557 CID Peptide Spectrum Matches
QC03_2bf4293c4d1c8c891fab774cf973f7e9-ETD-master_Proteins.txt 822 ETD Master Proteins
QC03_2bf4293c4d1c8c891fab774cf973f7e9-ETD_PeptideGroups.txt 3361 ETD Peptide Groups
QC03_2bf4293c4d1c8c891fab774cf973f7e9-ETD_PSMs.txt 3548 ETD Peptide Spectrum Matches
QC03_2bf4293c4d1c8c891fab774cf973f7e9-EThcD-master_Proteins.txt 754 EThcD Master Proteins
QC03_2bf4293c4d1c8c891fab774cf973f7e9-EThcD_PeptideGroups.txt 2787 EThcD Peptide Groups
QC03_2bf4293c4d1c8c891fab774cf973f7e9-EThcD_PSMs.txt 2925 EThcD Peptide Spectrum Matches
QC03_2bf4293c4d1c8c891fab774cf973f7e9-HCD-master_Proteins.txt 832 HCD Master Proteins
QC03_2bf4293c4d1c8c891fab774cf973f7e9-HCD_PeptideGroups.txt 3587 HCD Peptide Groups
QC03_2bf4293c4d1c8c891fab774cf973f7e9-HCD_PSMs.txt 3785 HCD Peptide Spectrum Matches

@rolivella
Copy link
Contributor Author

Installed last FileFilter OpenMS version:

proteomics@hipnos6:~/mysoftware/openms/build/latest$ singularity exec ./ghcr.io-openms-openms-executables-latest.img FileFilter

Version: 3.1.0-pre-develop-2024-06-20 Jun 20 2024, 14:45:36, Revision: e4c490f
-spectra:select_activation <activation>                      Retain MSn scans where any of its precursors features a certain activation method (valid: 'Collision-induced dissociation', 'Post-source decay',
                                                               'Plasma desorption', 'Surface-induced dissociation', 'Blackbody infrared radiative dissociation', 'Electron capture dissociation', 'Infrared mult
                                                               iphoton dissociation', 'Sustained off-resonance irradiation', 'High-energy collision-induced dissociation', 'Low-energy collision-induced dissoci
                                                               ation', 'Photodissociation', 'Electron transfer dissociation', 'Electron transfer and collision-induced dissociation', 'Electron transfer and
                                                               higher-energy collision dissociation', 'Pulsed q dissociation', 'trap-type collision-induced dissociation', 'beam-type collision-induced dissocia
                                                               tion', 'in-source collision-induced dissociation', 'Bruker proprietary method')

@rolivella
Copy link
Contributor Author

rolivella commented Jun 27, 2024

ETCID:

time="16:14:09" level=info msg="Executing ProteinProphet  v5.1.1"
time="16:14:09" level=error msg="Cannot execute program. there was an error with ProteinProphet, please check your parameters and input files"

ETHCD:

2024-06-27 14:23:44 [ERROR] - There are only 0 MS1 scans in /home/tmp/QC03_2bf4293c4d1c8c891fab774cf973f7e9_select_activation_ethcd.mzML.
Process 'IonQuant' finished, exit code: 1
Process returned non-zero exit code, stopping

@rolivella
Copy link
Contributor Author

After FileFilter MS1 and FileMerger with the FileFilter output:

2024-06-27 14:38:58,566 ERROR - Manifest file contained some badly formatted lines /home/tmp/QC03_2bf4293c4d1c8c891fab774cf973f7e9_select_activation_cid_final.mzML 1 1 DDA

@rolivella
Copy link
Contributor Author

Asked to Timo: OpenMS/OpenMS#7499 (comment)

@rolivella
Copy link
Contributor Author

rolivella commented Jun 28, 2024

Reply: "can you try to remove the unwanted activation? -spectra:remove_activation
I think this should then also keep the MS1"

This seems a second version of Eduard's startegy:

  • CID: Select "Collision-induced dissociation" + Remove "Electron transfer dissociation"
  • HCD: Select "High-energy collision-induced dissociation" + Remove "Electron transfer dissociation"
  • EThcD: Select "High-energy collision-induced dissociation" + Select "Electron transfer dissociation"
  • EtciD: Select "Collision-induced dissociation" + Select "Electron transfer dissociation"

I cannot do it like this because the "Select" removes all MS1 spectra and I cannot add them later with FileMerger.

So I do this strategy: remove all except CID:

CID by removing all the rest: 

singularity exec ~/mysoftware/openms/build/latest/ghcr.io-openms-openms-executables-latest.img FileFilter -in QC03_2bf4293c4d1c8c891fab774cf973f7e9.mzML -spectra:remove_activation "Electron transfer and higher-energy collision dissociation" -out QC03_2bf4293c4d1c8c891fab774cf973f7e9_select_activation_ethcd.mzML

singularity exec ~/mysoftware/openms/build/latest/ghcr.io-openms-openms-executables-latest.img FileFilter -in QC03_2bf4293c4d1c8c891fab774cf973f7e9_removed_ethcd.mzML -spectra:remove_activation "Electron transfer and collision-induced dissociation" -out QC03_2bf4293c4d1c8c891fab774cf973f7e9_removed_ethcd_etcid.mzML

singularity exec ~/mysoftware/openms/build/latest/ghcr.io-openms-openms-executables-latest.img FileFilter -in QC03_2bf4293c4d1c8c891fab774cf973f7e9_removed_ethcd_etcid.mzML -spectra:remove_activation "beam-type collision-induced dissociation" -out QC03_2bf4293c4d1c8c891fab774cf973f7e9_removed_ethcd_etcid_hcd_is_CID.mzML

I ran FragPipe and obtained [799,3038]

For HCD:

singularity exec ~/mysoftware/openms/build/latest/ghcr.io-openms-openms-executables-latest.img FileFilter -in QC03_2bf4293c4d1c8c891fab774cf973f7e9_removed_ethcd_etcid.mzML -spectra:remove_activation "Collision-induced dissociation" -out QC03_2bf4293c4d1c8c891fab774cf973f7e9_removed_ethcd_etcid_cid.mzML

Obtained: [785,3024]

For ETCID:

singularity exec ~/mysoftware/openms/build/latest/ghcr.io-openms-openms-executables-latest.img FileFilter -in QC03_2bf4293c4d1c8c891fab774cf973f7e9_removed_ethcd.mzML -spectra:remove_activation "Collision-induced dissociation" -out QC03_2bf4293c4d1c8c891fab774cf973f7e9_removed_ethcd_cid.mzML

singularity exec ~/mysoftware/openms/build/latest/ghcr.io-openms-openms-executables-latest.img FileFilter -in QC03_2bf4293c4d1c8c891fab774cf973f7e9_removed_ethcd_cid.mzML -spectra:remove_activation "beam-type collision-induced dissociation" -out QC03_2bf4293c4d1c8c891fab774cf973f7e9_removed_ethcd_cid_hcd_is_ETCID.mzML

Does not work:

time="11:36:33" level=info msg="Executing ProteinProphet  v5.1.1"
time="11:36:33" level=error msg="Cannot execute program. there was an error with ProteinProphet, please check your parameters and input files"
ProteinProphet (C++) by Insilicos LLC and LabKey Software, after the original Perl by A. Keller (TPP v6.0.0-rc15 Noctilucent, Build 202105021430-exported (Linux-x86_64))
 (no FPKM) (no groups) (using degen pep info)
Reading in /home/proteomics/mysoftware/FragPipe-22-test-qc03/output-22.0-run4-etcid/1_1/interact-QC03_2bf4293c4d1c8c891fab774cf973f7e9_ETCID.pep.xml...
did not find any PeptideProphet results in input data!  Did you forget to run PeptideProphet?
...read in 0 1+, 0 2+, 0 3+, 0 4+, 0 5+, 0 6+, 0 7+ spectra with min prob 0.05

WARNING: no data - output file will be empty
Process 'ProteinProphet' finished, exit code: 1
Process returned non-zero exit code, stopping

I checked the mzML with FileInfo and:

Activation methods
    MS-Level 2 & ETD (Electron transfer dissociation): 6356
    MS-Level 2 & ETciD (Electron transfer and collision-induced dissociation): 6356

For instance:

<activation>
	<cvParam cvRef="MS" accession="MS:1000598" name="electron transfer dissociation" />
	<cvParam cvRef="MS" accession="MS:1003182" name="electron transfer and collision-induced dissociation" />
	<cvParam cvRef="MS" accession="MS:1000045" name="collision energy" value="65.076530456542997" unitAccession="UO:0000266" unitName="electronvolt" unitCvRef="UO"/>
</activation>

If I remove "electron transfer dissociation" I remove both MS:1000598 and MS:1003182 and the resulting file has no data for MS2. What can I do?

Asked to Timo: OpenMS/OpenMS#7499 (comment)

@rolivella
Copy link
Contributor Author

Eduard: in the meanwhile, test Exploris data.

@rolivella
Copy link
Contributor Author

rolivella commented Jul 5, 2024

On going discussion about CV Terms for ETciD and EThcD: HUPO-PSI/psi-ms-CV#285

@rolivella
Copy link
Contributor Author

rolivella commented Jul 8, 2024

Meanwhile, I will end the test with Q Exactive data (only CID and HCD). Steps:

  • Create FragPipe test folders at 23.
  • Copy test file to 23.
  • Check spectra (CID and HCD).
  • Configure and run FragPipe.
  • Extract RT apex.
  • Run EICextractor.

Results for the file QC03_c33c611fca710ce686c0ab821692d2e7_qexactive

proteomics@hipnos6:~/mysoftware/FragPipe-22-test-qc03/output-22.0-run7-qexactive-hcd/1_1$ cat protein.tsv | wc -l
3149
proteomics@hipnos6:~/mysoftware/FragPipe-22-test-qc03/output-22.0-run7-qexactive-hcd/1_1$ cat peptide.tsv | wc -l
17996
proteomics

But I couldn't find the isotopologues:

proteomics@hipnos6:~/mysoftware/FragPipe-22-test-qc03/output-22.0-run7-qexactive-hcd$ cat */* | grep YVYVADVAAK
proteomics@hipnos6:~/mysoftware/FragPipe-22-test-qc03/output-22.0-run7-qexactive-hcd$ cat */* | grep LLSLGAGEFK
proteomics@hipnos6:~/mysoftware/FragPipe-22-test-qc03/output-22.0-run7-qexactive-hcd$ cat */* | grep LGFTDLFSK
proteomics@hipnos6:~/mysoftware/FragPipe-22-test-qc03/output-22.0-run7-qexactive-hcd$ cat */* | grep VTSGSTSTSR
proteomics@hipnos6:~/mysoftware/FragPipe-22-test-qc03/output-22.0-run7-qexactive-hcd$ cat */* | grep LASVSVSR

@rolivella
Copy link
Contributor Author

Eduard: check light isotopologues at OMSSA. Search for them and take RT apex as the reference for all isotopologues.

@rolivella
Copy link
Contributor Author

Asked Cristina for the fasta used to search QC03 on PD.

@rolivella
Copy link
Contributor Author

Ask Eduard how to configure FragPipe GUI at 100:

Lys Heavy 13C6-15N2-K
Arg Heavy 13C6-15N4
Ala Heavy  13C3-15N
Val Heavy 13C5-15N
Leu Heavy  13C6-15N
Phe Heavy 13C9-15N
Ser Heavy 13C3-15N
Thr Heavy 13C4-15N

And mass shift at UNIMOD

@rolivella
Copy link
Contributor Author

After Eduard meeting:

  • Add istopologues light aminoacid sequences to FASTA.
  • Check new workflow file with the critical part msfragger.table.var-mods:
# Workflow: qc03


# Please edit the following path to point to the correct location.
# In Windows, please replace single '\' with '\\'
database.db-path=C\:\\rolivella\\database\\2024-05-16-decoys-reviewed-contam-UP000009136.fas

crystalc.run-crystalc=false
database.decoy-tag=rev_
diann.fragpipe.cmd-opts=
diann.generate-msstats=true
diann.heavy=
diann.library=
diann.light=
diann.medium=
diann.q-value=0.01
diann.quantification-strategy=3
diann.quantification-strategy-2=QuantUMS (high accuracy)
diann.run-dia-nn=false
diann.run-dia-plex=false
diann.run-specific-protein-q-value=false
diann.unrelated-runs=false
diann.use-predicted-spectra=false
diatracer.corr-threshold=0.3
diatracer.delta-apex-im=0.01
diatracer.delta-apex-rt=3
diatracer.mass-defect-filter=true
diatracer.mass-defect-offset=0.1
diatracer.rf-max=500
diatracer.run-diatracer=false
diatracer.write-intermediate-files=false
diaumpire.AdjustFragIntensity=true
diaumpire.BoostComplementaryIon=false
diaumpire.CorrThreshold=0
diaumpire.DeltaApex=0.2
diaumpire.ExportPrecursorPeak=false
diaumpire.Q1=true
diaumpire.Q2=true
diaumpire.Q3=true
diaumpire.RFmax=500
diaumpire.RPmax=25
diaumpire.RTOverlap=0.3
diaumpire.SE.EstimateBG=false
diaumpire.SE.IsoPattern=0.3
diaumpire.SE.MS1PPM=10
diaumpire.SE.MS2PPM=20
diaumpire.SE.MS2SN=1.1
diaumpire.SE.MassDefectFilter=true
diaumpire.SE.MassDefectOffset=0.1
diaumpire.SE.NoMissedScan=1
diaumpire.SE.SN=1.1
diaumpire.run-diaumpire=false
fpop.fpop-tmt=false
fpop.label_control=
fpop.label_fpop=
fpop.region_size=1
fpop.run-fpop=false
fpop.subtract-control=false
freequant.mz-tol=10
freequant.rt-tol=0.4
freequant.run-freequant=false
ionquant.excludemods=
ionquant.heavy=
ionquant.imtol=0.05
ionquant.ionfdr=0.01
ionquant.light=
ionquant.locprob=0.75
ionquant.maxlfq=1
ionquant.mbr=1
ionquant.mbrimtol=0.05
ionquant.mbrmincorr=0
ionquant.mbrrttol=1
ionquant.mbrtoprun=10
ionquant.medium=
ionquant.minfreq=0
ionquant.minions=1
ionquant.minisotopes=2
ionquant.minscans=3
ionquant.mztol=10
ionquant.normalization=1
ionquant.peptidefdr=1
ionquant.proteinfdr=1
ionquant.requantify=1
ionquant.rttol=0.4
ionquant.run-ionquant=true
ionquant.tp=0
ionquant.uniqueness=0
ionquant.use-labeling=false
ionquant.use-lfq=true
ionquant.writeindex=0
msbooster.find-best-rt-model=false
msbooster.find-best-spectra-model=false
msbooster.koina-url=
msbooster.predict-rt=true
msbooster.predict-spectra=true
msbooster.rt-model=DIA-NN
msbooster.run-msbooster=false
msbooster.spectra-model=DIA-NN
msfragger.Y_type_masses=
msfragger.activation_types=all
msfragger.allowed_missed_cleavage_1=2
msfragger.allowed_missed_cleavage_2=2
msfragger.analyzer_types=all
msfragger.calibrate_mass=2
msfragger.check_spectral_files=true
msfragger.clip_nTerm_M=true
msfragger.deisotope=1
msfragger.delta_mass_exclude_ranges=(-1.5,3.5)
msfragger.deneutralloss=1
msfragger.diagnostic_fragments=
msfragger.diagnostic_intensity_filter=0
msfragger.digest_max_length=50
msfragger.digest_min_length=7
msfragger.fragment_ion_series=b,y
msfragger.fragment_mass_tolerance=20
msfragger.fragment_mass_units=1
msfragger.group_variable=0
msfragger.intensity_transform=0
msfragger.ion_series_definitions=
msfragger.isotope_error=0/1/2/3
msfragger.labile_search_mode=off
msfragger.localize_delta_mass=false
msfragger.mass_diff_to_variable_mod=0
msfragger.mass_offsets=0
msfragger.mass_offsets_detailed=
msfragger.max_fragment_charge=2
msfragger.max_variable_mods_combinations=5000
msfragger.max_variable_mods_per_peptide=3
msfragger.min_fragments_modelling=2
msfragger.min_matched_fragments=4
msfragger.min_sequence_matches=2
msfragger.minimum_peaks=15
msfragger.minimum_ratio=0.01
msfragger.misc.fragger.clear-mz-hi=0
msfragger.misc.fragger.clear-mz-lo=0
msfragger.misc.fragger.digest-mass-hi=5000
msfragger.misc.fragger.digest-mass-lo=500
msfragger.misc.fragger.enzyme-dropdown-1=stricttrypsin
msfragger.misc.fragger.enzyme-dropdown-2=null
msfragger.misc.fragger.precursor-charge-hi=4
msfragger.misc.fragger.precursor-charge-lo=1
msfragger.misc.fragger.remove-precursor-range-hi=1.5
msfragger.misc.fragger.remove-precursor-range-lo=-1.5
msfragger.misc.slice-db=1
msfragger.num_enzyme_termini=2
msfragger.output_format=pepXML_pin
msfragger.output_max_expect=50
msfragger.output_report_topN=1
msfragger.output_report_topN_dda_plus=5
msfragger.output_report_topN_dia1=5
msfragger.override_charge=false
msfragger.precursor_mass_lower=-20
msfragger.precursor_mass_mode=selected
msfragger.precursor_mass_units=1
msfragger.precursor_mass_upper=20
msfragger.precursor_true_tolerance=20
msfragger.precursor_true_units=1
msfragger.remainder_fragment_masses=
msfragger.remove_precursor_peak=1
msfragger.report_alternative_proteins=true
msfragger.require_precursor=true
msfragger.restrict_deltamass_to=all
msfragger.reuse_dia_fragment_peaks=false
msfragger.run-msfragger=true
msfragger.search_enzyme_cut_1=KR
msfragger.search_enzyme_cut_2=
msfragger.search_enzyme_name_1=stricttrypsin
msfragger.search_enzyme_name_2=null
msfragger.search_enzyme_nocut_1=
msfragger.search_enzyme_nocut_2=
msfragger.search_enzyme_sense_1=C
msfragger.search_enzyme_sense_2=C
msfragger.table.fix-mods=0.0,C-Term Peptide,true,-1; 0.0,N-Term Peptide,true,-1; 0.0,C-Term Protein,true,-1; 0.0,N-Term Protein,true,-1; 0.0,G (glycine),true,-1; 0.0,A (alanine),true,-1; 0.0,S (serine),true,-1; 0.0,P (proline),true,-1; 0.0,V (valine),true,-1; 0.0,T (threonine),true,-1; 57.02146,C (cysteine),true,-1; 0.0,L (leucine),true,-1; 0.0,I (isoleucine),true,-1; 0.0,N (asparagine),true,-1; 0.0,D (aspartic acid),true,-1; 0.0,Q (glutamine),true,-1; 0.0,K (lysine),true,-1; 0.0,E (glutamic acid),true,-1; 0.0,M (methionine),true,-1; 0.0,H (histidine),true,-1; 0.0,F (phenylalanine),true,-1; 0.0,R (arginine),true,-1; 0.0,Y (tyrosine),true,-1; 0.0,W (tryptophan),true,-1; 0.0,B ,true,-1; 0.0,J,true,-1; 0.0,O,true,-1; 0.0,U,true,-1; 0.0,X,true,-1; 0.0,Z,true,-1
msfragger.table.var-mods=15.9949,M,true,3; 42.0106,[^,true,1; 79.96633,STY,false,3; -17.0265,nQnC,false,1; -18.0106,nE,false,1; 4.025107,K,false,2; 6.020129,R,false,2; 8.014199,K,true,2; 10.008269,R,true,2; 6.020129,KR,false,2; 4.007099,S,true,2; 5.010454,T,true,2; 4.007099,A,true,2; 6.013809,V,true,2; 7.017164,L,true,2; 10.027228,F,true,2
msfragger.track_zero_topN=0
msfragger.use_all_mods_in_first_search=false
msfragger.use_detailed_offsets=false
msfragger.use_topN_peaks=150
msfragger.write_calibrated_mzml=false
msfragger.write_uncalibrated_mgf=false
msfragger.zero_bin_accept_expect=0
msfragger.zero_bin_mult_expect=1
opair.activation1=HCD
opair.activation2=ETD
opair.filterOxonium=true
opair.glyco_db=
opair.max_glycans=4
opair.max_isotope_error=2
opair.min_isotope_error=0
opair.ms1_tol=20
opair.ms2_tol=20
opair.oxonium_filtering_file=
opair.oxonium_minimum_intensity=0.05
opair.reverse_scan_order=false
opair.run-opair=false
opair.single_scan_type=false
peptide-prophet.cmd-opts=--decoyprobs --ppm --accmass --nonparam --expectscore
peptide-prophet.combine-pepxml=false
peptide-prophet.run-peptide-prophet=true
percolator.cmd-opts=--only-psms --no-terminate --post-processing-tdc
percolator.keep-tsv-files=false
percolator.min-prob=0.5
percolator.run-percolator=false
phi-report.dont-use-prot-proph-file=false
phi-report.filter=--sequential --picked --prot 0.01
phi-report.pep-level-summary=false
phi-report.print-decoys=false
phi-report.prot-level-summary=true
phi-report.remove-contaminants=false
phi-report.run-report=true
protein-prophet.cmd-opts=--maxppmdiff 2000000
protein-prophet.run-protein-prophet=true
ptmprophet.cmdline=NOSTACK KEEPOLD STATIC EM\=1 NIONS\=b M\:15.9949,n\:42.0106 MINPROB\=0.5
ptmprophet.run-ptmprophet=false
ptmshepherd.adv_params=false
ptmshepherd.annotation-common=false
ptmshepherd.annotation-custom=false
ptmshepherd.annotation-glyco=false
ptmshepherd.annotation-unimod=true
ptmshepherd.annotation_file=
ptmshepherd.annotation_tol=0.01
ptmshepherd.cap_y_ions=
ptmshepherd.decoy_type=1
ptmshepherd.diag_ions=
ptmshepherd.diagmine_diagMinFoldChange=3.0
ptmshepherd.diagmine_diagMinSpecDiff=00.2
ptmshepherd.diagmine_fragMinFoldChange=3.0
ptmshepherd.diagmine_fragMinPropensity=00.1
ptmshepherd.diagmine_fragMinSpecDiff=00.1
ptmshepherd.diagmine_minIonsPerSpec=2
ptmshepherd.diagmine_minPeps=25
ptmshepherd.diagmine_pepMinFoldChange=3.0
ptmshepherd.diagmine_pepMinSpecDiff=00.2
ptmshepherd.glyco_fdr=1.00
ptmshepherd.glyco_isotope_max=3
ptmshepherd.glyco_isotope_min=-1
ptmshepherd.glyco_ppm_tol=50
ptmshepherd.glycodatabase=
ptmshepherd.histo_smoothbins=2
ptmshepherd.iontype_a=false
ptmshepherd.iontype_b=true
ptmshepherd.iontype_c=true
ptmshepherd.iontype_x=false
ptmshepherd.iontype_y=true
ptmshepherd.iontype_z=true
ptmshepherd.iterloc_maxEpoch=100
ptmshepherd.iterloc_mode=false
ptmshepherd.localization_allowed_res=
ptmshepherd.n_glyco=true
ptmshepherd.normalization-psms=true
ptmshepherd.normalization-scans=false
ptmshepherd.output_extended=false
ptmshepherd.peakpicking_mass_units=0
ptmshepherd.peakpicking_minPsm=10
ptmshepherd.peakpicking_promRatio=0.3
ptmshepherd.peakpicking_width=0.002
ptmshepherd.precursor_mass_units=0
ptmshepherd.precursor_tol=0.01
ptmshepherd.print_decoys=false
ptmshepherd.print_full_glyco_params=false
ptmshepherd.prob_mass=0.5
ptmshepherd.remainder_masses=
ptmshepherd.remove_glycan_delta_mass=true
ptmshepherd.run-shepherd=true
ptmshepherd.run_diagextract_mode=false
ptmshepherd.run_diagmine_mode=false
ptmshepherd.run_glyco_mode=false
ptmshepherd.spectra_maxfragcharge=2
ptmshepherd.spectra_ppmtol=20
ptmshepherd.varmod_masses=
quantitation.run-label-free-quant=false
run-psm-validation=true
run-validation-tab=true
saintexpress.fragpipe.cmd-opts=
saintexpress.max-replicates=10
saintexpress.run-saint-express=false
saintexpress.virtual-controls=100
skyline.run-skyline=false
skyline.skyline=true
skyline.skyline-custom=false
skyline.skyline-custom-path=
skyline.skyline-daily=false
skyline.skyline-mode=0
skyline.skyline-mods-mode=Default
speclibgen.convert-pepxml=true
speclibgen.convert-psm=false
speclibgen.easypqp.extras.max_delta_ppm=15
speclibgen.easypqp.extras.max_delta_unimod=0.02
speclibgen.easypqp.extras.max_glycan_qval=1
speclibgen.easypqp.extras.rt_lowess_fraction=0
speclibgen.easypqp.fragment.a=false
speclibgen.easypqp.fragment.b=true
speclibgen.easypqp.fragment.c=false
speclibgen.easypqp.fragment.x=false
speclibgen.easypqp.fragment.y=true
speclibgen.easypqp.fragment.z=false
speclibgen.easypqp.ignore_unannotated=false
speclibgen.easypqp.im-cal=Automatic selection of a run as reference IM
speclibgen.easypqp.labile_mode=Regular (not glyco)
speclibgen.easypqp.neutral_loss=false
speclibgen.easypqp.rt-cal=ciRT
speclibgen.easypqp.select-file.text=
speclibgen.easypqp.select-im-file.text=
speclibgen.keep-intermediate-files=false
speclibgen.run-speclibgen=false
tab-run.delete_calibrated_mzml=false
tab-run.delete_temp_files=false
tab-run.sub_mzml_prob_threshold=0.5
tab-run.write_sub_mzml=false
tmtintegrator.add_Ref=-1
tmtintegrator.aggregation_method=0
tmtintegrator.allow_overlabel=true
tmtintegrator.allow_unlabeled=true
tmtintegrator.best_psm=true
tmtintegrator.channel_num=TMT-6
tmtintegrator.extraction_tool=IonQuant
tmtintegrator.glyco_qval=-1
tmtintegrator.groupby=0
tmtintegrator.log2transformed=true
tmtintegrator.max_pep_prob_thres=0
tmtintegrator.min_ntt=0
tmtintegrator.min_pep_prob=0.9
tmtintegrator.min_percent=0.05
tmtintegrator.min_purity=0.5
tmtintegrator.min_site_prob=-1
tmtintegrator.mod_tag=none
tmtintegrator.ms1_int=true
tmtintegrator.outlier_removal=true
tmtintegrator.philosopher-msstats=false
tmtintegrator.print_RefInt=false
tmtintegrator.prot_exclude=none
tmtintegrator.prot_norm=0
tmtintegrator.psm_norm=false
tmtintegrator.quant_level=2
tmtintegrator.ref_tag=Bridge
tmtintegrator.run-tmtintegrator=false
tmtintegrator.tolerance=20
tmtintegrator.top3_pep=true
tmtintegrator.unique_gene=0
tmtintegrator.unique_pep=false
tmtintegrator.use_glycan_composition=false
workflow.input.data-type.im-ms=false
workflow.input.data-type.regular-ms=true
workflow.misc.save-sdrf=true
workflow.saved-with-ver=22.0

@rolivella
Copy link
Contributor Author

I get again and again: java.util.concurrent.ExecutionException: java.lang.OutOfMemoryError: Java heap space
I'll have to look at it closer.

@rolivella
Copy link
Contributor Author

rolivella commented Jul 19, 2024

Run on Makarov for 45 min and taking punctually 40-50GB of memory. Test done at proteomics@makarov:~/mysoftware/FragPipe-22-test-qc03/output-22.0-run7-qexactive-hcd.

I cannot find the combined_ion.tsv with the apex RT and the ionquant is true, but at least I can find the isotopologues at peptides.tsv:

Peptide	Prev AA	Next AA	Peptide Length	Protein Start	Protein End	Charges	Probability	Spectral Count	Intensity	Assigned Modifications	Observed Modifications	Protein	Protein ID	Entry Name	Gene	Protein Description	Mapped Genes	Mapped Proteins
YVYVADVAAK      K       N       10      233     242     2       0.9996  2       0.000000        10K(8.0142), 4V(6.0138), 7V(6.0138), 8A(4.0071), 9A(4.0071)             sp|Q15166|PON3_HUMAN    Q15166  PON3_HUMAN      PON3    Serum paraoxonase/lactonase 3           sp|YVYISO|YVYISO_HUMAN
LGFTDLFSK	R	W	9	1	9	2	0.9997	2	0.000000	1L(7.0171), 6L(7.0171), 9K(8.0142)		sp|LGFISO|LGFISO_HUMAN	LGFISO				SERPINA4	sp|P29622|KAIN_HUMAN

@rolivella
Copy link
Contributor Author

Ask fengchao: how to search istopoe mix with human proteome so we only search for the mass shifts for the this peptides and not all the peptides in the sample. Also, understand why we cannot see the combined_ion.tsv.

@rolivella
Copy link
Contributor Author

Asked: Nesvilab/FragPipe#1686

@rolivella
Copy link
Contributor Author

According Fen Chao, BJZX have zero mass. O's mass is 237.14773 Da, and U's mass is 150.95363 Da.

@rolivella
Copy link
Contributor Author

rolivella commented Jul 31, 2024

Rationale

  • We have this sequence YVYVADVAAK(Heavy). It has a heavy lysine (K). As it's not advisable to put a K with it's mass shift at Fragpipe (it takes a lot of time an resources), so we replace the letter K for another one that is not being used for any other sequence: Z. So the mass shift only will be applied to the sequences of our peptides of interest, the isotopologues, and not to all the peptides in the FASTA.
  • In order to configure Fragpipe with the correct mass shifts, in this case, the K(Heavy) must have the same mass as the Z plus some delta mass that we don't know, so: K + heavy = Z + delta. This is we have to tell Fragpipe the exact mass to search but we are mocking it.
  • From this equation, we know: K mass (lysine mass) residue (it's important to note that is a residue, and the mass is tabulated at the PEAKS blue table, highlighted in blue). So the K residue mass is according to the PEAKS table: 128.0950 Da. For computing the mass of the heavy K, we should know the molecular formula of this aminoacid, which is C(6)H(14)N(2)O(2), and substitute all the carbons, which are 12C, for 16C, and all the nitrogens 14N for 15N. If we do this, teh result is that K(Heavy) mass is probably 154.1199 Da (done with https://chatgpt.com/share/c445bb15-017c-4f5c-8813-18d5b533927f).
  • We know also that at fragpipe Z=0 (according to Fengchao), so the delta should be K(heavy), so delta = 154.1199 Da.
  • This number delta is what we should put in variable modifications at Fragpipe for the aminoacid Z with mass shift of 154.1199 Da.
  • Note: this computation should be validated.

To do

  • The previous calculation should be done for:
Column 1 Column 2
V B
T J
R X
K Z
L U/O
  • At the previous section I made the K and Z. So I have to do the others, and I will have a list of the right-side letters with it's mass shift.
  • Whenever I have this list, I have to add them at the configuration file of Fragpipe.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants