Skip to content

Commit df3ceda

Browse files
Merge pull request #142 from compomics/timsRescore
TIMS²Rescore merge into the main
2 parents 54c54b5 + 77e7c1b commit df3ceda

31 files changed

+1023
-287
lines changed

.github/workflows/publish.yml

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@ jobs:
1414
- uses: actions/checkout@v4
1515

1616
- name: Set up Python
17-
uses: actions/setup-python@v4
17+
uses: actions/setup-python@v5
1818
with:
1919
python-version: "3.11"
2020

@@ -29,7 +29,7 @@ jobs:
2929
3030
- name: Test built package
3131
run: |
32-
pip install dist/ms2rescore-*.whl
32+
pip install --only-binary :all: dist/ms2rescore-*.whl
3333
# pytest
3434
ms2rescore --help
3535
@@ -47,14 +47,14 @@ jobs:
4747
steps:
4848
- uses: actions/checkout@v4
4949

50-
- uses: actions/setup-python@v4
50+
- uses: actions/setup-python@v5
5151
with:
5252
python-version: "3.11"
5353

5454
- name: Install package and dependencies
5555
run: |
5656
python -m pip install --upgrade pip
57-
pip install . pyinstaller
57+
pip install --only-binary :all: . pyinstaller
5858
5959
- name: Install Inno Setup
6060
uses: crazy-max/ghaction-chocolatey@v3

.github/workflows/test.yml

Lines changed: 4 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -24,18 +24,14 @@ jobs:
2424
- name: Install dependencies
2525
run: |
2626
python -m pip install --upgrade pip
27-
pip install flake8
27+
pip install ruff
2828
29-
- name: Lint with flake8
30-
run: |
31-
# stop the build if there are Python syntax errors or undefined names
32-
flake8 . --count --select=E9,F63,F7,F82 --show-source --statistics
33-
# exit-zero treats all errors as warnings. The GitHub editor is 127 chars wide
34-
flake8 . --count --exit-zero --max-complexity=10 --max-line-length=127 --statistics
29+
- name: Run Ruff
30+
run: ruff check --output-format=github .
3531

3632
- name: Build and install ms2rescore package
3733
run: |
38-
pip install .[dev]
34+
pip install --only-binary :all: .[dev]
3935
4036
- name: Test with pytest
4137
run: |

Dockerfile

Lines changed: 6 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,10 @@
1-
FROM ubuntu:focal
1+
FROM python:3.11
2+
3+
# ARG DEBIAN_FRONTEND=noninteractive
24

35
LABEL name="ms2rescore"
46

5-
ENV LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/ms2rescore
7+
# ENV LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/ms2rescore
68

79
ADD pyproject.toml /ms2rescore/pyproject.toml
810
ADD LICENSE /ms2rescore/LICENSE
@@ -11,8 +13,7 @@ ADD MANIFEST.in /ms2rescore/MANIFEST.in
1113
ADD ms2rescore /ms2rescore/ms2rescore
1214

1315
RUN apt-get update \
14-
&& apt-get install --no-install-recommends -y python3-pip procps libglib2.0-0 libsm6 libxrender1 libxext6 \
15-
&& rm -rf /var/lib/apt/lists/* \
16-
&& pip3 install ms2rescore/
16+
&& apt install -y procps \
17+
&& pip install /ms2rescore --only-binary :all:
1718

1819
ENTRYPOINT [""]

docs/source/config_schema.md

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,7 @@
1010
- **`deeplc`**: Refer to *[#/definitions/deeplc](#definitions/deeplc)*.
1111
- **`maxquant`**: Refer to *[#/definitions/maxquant](#definitions/maxquant)*.
1212
- **`ionmob`**: Refer to *[#/definitions/ionmob](#definitions/ionmob)*.
13+
- **`im2deep`**: Refer to *[#/definitions/im2deep](#definitions/im2deep)*.
1314
- **`rescoring_engine`** *(object)*: Rescoring engine to use and its configuration. Leave empty to skip rescoring and write features to file. Default: `{"mokapot": {}}`.
1415
- **`.*`**: Refer to *[#/definitions/rescoring_engine](#definitions/rescoring_engine)*.
1516
- **`percolator`**: Refer to *[#/definitions/percolator](#definitions/percolator)*.
@@ -47,7 +48,17 @@
4748
- **One of**
4849
- *string*
4950
- *null*
51+
- **`psm_id_rt_pattern`**: Regex pattern to extract retention time from PSM identifier. Requires at least one capturing group. Default: `null`.
52+
- **One of**
53+
- *string*
54+
- *null*
55+
- **`psm_id_im_pattern`**: Regex pattern to extract ion mobility from PSM identifier. Requires at least one capturing group. Default: `null`.
56+
- **One of**
57+
- *string*
58+
- *null*
5059
- **`lower_score_is_better`** *(boolean)*: Bool indicating if lower score is better. Default: `false`.
60+
- **`max_psm_rank_input`** *(number)*: Maximum rank of PSMs to use as input for rescoring. Minimum: `1`. Default: `10`.
61+
- **`max_psm_rank_output`** *(number)*: Maximum rank of PSMs to return after rescoring, before final FDR calculation. Minimum: `1`. Default: `1`.
5162
- **`modification_mapping`** *(object)*: Mapping of modification labels to each replacement label. Default: `{}`.
5263
- **`fixed_modifications`** *(object)*: Mapping of amino acids with fixed modifications to the modification name. Can contain additional properties. Default: `{}`.
5364
- **`processes`** *(number)*: Number of parallel processes to use; -1 for all available. Minimum: `-1`. Default: `-1`.
@@ -57,6 +68,7 @@
5768
- *string*
5869
- *null*
5970
- **`write_report`** *(boolean)*: Write an HTML report with various QC metrics and charts. Default: `false`.
71+
- **`profile`** *(boolean)*: Write a txt report using cProfile for profiling. Default: `false`.
6072
## Definitions
6173

6274
- <a id="definitions/feature_generator"></a>**`feature_generator`** *(object)*: Feature generator configuration. Can contain additional properties.
@@ -75,7 +87,10 @@
7587
- **`ionmob_model`** *(string)*: Path to Ionmob model directory. Default: `"GRUPredictor"`.
7688
- **`reference_dataset`** *(string)*: Path to Ionmob reference dataset file. Default: `"Meier_unimod.parquet"`.
7789
- **`tokenizer`** *(string)*: Path to tokenizer json file. Default: `"tokenizer.json"`.
90+
- <a id="definitions/im2deep"></a>**`im2deep`** *(object)*: Ion mobility feature generator configuration using IM2Deep. Can contain additional properties. Refer to *[#/definitions/feature_generator](#definitions/feature_generator)*.
91+
- **`reference_dataset`** *(string)*: Path to IM2Deep reference dataset file. Default: `"Meier_unimod.parquet"`.
7892
- <a id="definitions/mokapot"></a>**`mokapot`** *(object)*: Mokapot rescoring engine configuration. Additional properties are passed to the Mokapot brew function. Can contain additional properties. Refer to *[#/definitions/rescoring_engine](#definitions/rescoring_engine)*.
93+
- **`train_fdr`** *(number)*: FDR threshold for training Mokapot. Minimum: `0`. Maximum: `1`. Default: `0.01`.
7994
- **`write_weights`** *(boolean)*: Write Mokapot weights to a text file. Default: `false`.
8095
- **`write_txt`** *(boolean)*: Write Mokapot results to a text file. Default: `false`.
8196
- **`write_flashlfq`** *(boolean)*: Write Mokapot results to a FlashLFQ-compatible file. Default: `false`.

docs/source/userguide/configuration.rst

Lines changed: 59 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -240,6 +240,65 @@ expression pattern that extracts the decoy status from the protein name:
240240
decoy_pattern = "DECOY_"
241241
242242
243+
Multi-rank rescoring
244+
====================
245+
246+
Some search engines, such as MaxQuant, report multiple candidate PSMs for the same spectrum.
247+
MS²Rescore can rescore multiple candidate PSMs per spectrum. This allows for lower-ranking
248+
candidate PSMs to become the top-ranked PSM after rescoring. This behavior can be controlled with
249+
the ``max_psm_rank_input`` option.
250+
251+
To ensure a correct FDR control after rescoring, MS²Rescore filters out lower-ranking PSMs before
252+
final FDR calculation and writing the output files. To allow for lower-ranking PSMs to be included
253+
in the final output - for instance, to consider chimeric spectra - the ``max_psm_rank_output``
254+
option can be used.
255+
256+
For example, to rescore the top 5 PSMs per spectrum and output the best PSM after rescoring,
257+
the following configuration can be used:
258+
259+
.. tab:: JSON
260+
261+
.. code-block:: json
262+
263+
"max_psm_rank_input": 5
264+
"max_psm_rank_output": 1
265+
266+
.. tab:: TOML
267+
268+
.. code-block:: toml
269+
270+
max_psm_rank_input = 5
271+
max_psm_rank_output = 1
272+
273+
274+
Configuring rescoring engines
275+
=============================
276+
277+
MS²Rescore supports multiple rescoring engines, such as Mokapot and Percolator. The rescoring
278+
engine can be selected and configured with the ``rescoring_engine`` option. For example, to use
279+
Mokapot with a custom train_fdr of 0.1%, the following configuration can be used:
280+
281+
.. tab:: JSON
282+
283+
.. code-block:: json
284+
285+
"rescoring_engine": {
286+
"mokapot": {
287+
"train_fdr": 0.001
288+
}
289+
290+
.. tab:: TOML
291+
292+
.. code-block:: toml
293+
294+
[ms2rescore.rescoring_engine.mokapot]
295+
train_fdr = 0.001
296+
297+
298+
All options for the rescoring engines can be found in the :ref:`ms2rescore.rescoring_engines`
299+
section.
300+
301+
243302
244303
All configuration options
245304
=========================

docs/source/userguide/input-files.rst

Lines changed: 17 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -5,23 +5,31 @@ Input files
55
PSM file(s)
66
===========
77

8-
The peptide-spectrum match (PSM) file is generally the output from a proteomics search engine.
9-
This file serves as the main input to MS²Rescore. One or multiple PSM files can be provided at
10-
once. Note that merging PSMs from different MS runs could have an impact on the correctness of
11-
the FDR control.
8+
The **peptide-spectrum match (PSM) file** is generally the output from a proteomics search engine.
9+
This file serves as the main input to MS²Rescore.
1210

13-
Various PSM file types are supported. The type can be specified with the ``psm_file_type`` option.
14-
Check the list of :py:mod:`psm_utils` tags in the
15-
:external+psm_utils:ref:`supported file formats <supported file formats>` section. Depending on the
16-
file extension, the file type can also be inferred from the file name. In that case,
17-
``psm_file_type`` option can be set to ``infer``.
11+
The PSM file should contain **all putative identifications** made by the search engine, including
12+
both target and decoy PSMs. Ensure that the search engine was configured to include decoy entries
13+
in the search database and was operated with **target-decoy competition** enabled (i.e.,
14+
considering both target and decoy sequences simultaneously during the search).
1815

1916
.. attention::
2017
As a general rule, MS²Rescore always needs access to **all target and decoy PSMs, without any
2118
FDR-filtering**. For some search engines, this means that the FDR-filter should be disabled or
2219
set to 100%.
2320

2421

22+
One or multiple PSM files can be provided at once. Note that merging PSMs from different MS runs
23+
could have an impact on the correctness of the FDR control. Combining multiple PSM files should
24+
generally only be done for LC-fractionated mass spectrometry runs.
25+
26+
Various PSM file types are supported. The type can be specified with the ``psm_file_type`` option.
27+
Check the list of :py:mod:`psm_utils` tags in the
28+
:external+psm_utils:ref:`supported file formats <supported file formats>` section. Depending on the
29+
file extension, the file type can also be inferred from the file name. In that case,
30+
``psm_file_type`` option can be set to ``infer``.
31+
32+
2533
Spectrum file(s)
2634
================
2735

docs/source/userguide/output-files.rst

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -52,8 +52,8 @@ Rescoring engine files:
5252
| ``<prefix>.<mokapot/percolator>.weights.txt`` | Feature weights, showing feature usage in the rescoring run |
5353
+-------------------------------------------------------------+-------------------------------------------------------------+
5454

55-
If no rescoring engine is selected (or if Percolator was selected), the following files will also
56-
be written:
55+
If no rescoring engine is selected, if Percolator was selected, or in DEBUG mode, the following
56+
files will also be written:
5757

5858
+-------------------------------------------------------------+-----------------------------------------------------------+
5959
| File | Description |
Lines changed: 61 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,61 @@
1+
.. _timsrescore:
2+
3+
TIMS²Rescore User Guide
4+
=======================
5+
6+
Introduction
7+
------------
8+
9+
The `TIMS²Rescore` tool is a DDA-PASEF adapted version of `ms2rescore` that allows users to perform rescoring of peptide-spectrum matches (PSMs) acquired on Bruker instruments. This guide provides an overview of how to use `timsrescore` in `ms2rescore` effectively.
10+
11+
Installation
12+
------------
13+
14+
Before using `timsrescore`, ensure that you have `ms2rescore` installed on your system. You can install `ms2rescore` using the following command:
15+
16+
.. code-block:: bash
17+
18+
pip install ms2rescore
19+
20+
Usage
21+
-----
22+
23+
To use `timsrescore`, follow these steps:
24+
25+
1. Prepare your input files:
26+
- Ensure that you have the necessary input files, including the PSM file spectrum files
27+
- Make sure that the PSM file format from a supported search engine or a standard format like .mzid(:external+psm_utils:ref:`supported file formats <supported file formats>`).
28+
- Spectrum files can directly be given as .d or minitdf files from Bruker instruments or first converted to .mzML format.
29+
30+
2. Run `timsrescore`:
31+
- Open a terminal or command prompt.
32+
- Navigate to the directory where your input files are located.
33+
- Execute the following command:
34+
35+
.. code-block:: bash
36+
37+
timsrescore -p <path_to_psm_file> -s <path_to_spectrum_file> -o <path_to_output_file>
38+
39+
Replace `<path_to_psm_file>`, `<path_to_tims_file>`, and `<path_to_output_file>` with the actual paths to your input and output files.
40+
_NOTE_ By default timsTOF specific models will be used for predictions. Optionally you can further configure settings through a configuration file. For more information on configuring `timsrescore`, refer to the :doc:`configuration` tab in the user guide.
41+
42+
3. Review the results:
43+
- Once the `timsrescore` process completes, you will find the rescoring results in the specified output file or if not specified in the same directory as the input files
44+
- If you want a detailed overview of the performance, you can either give the set `write_report` to `True` in the configuration file, use the `--write_report` option in the command line or run the following command:
45+
46+
.. code-block:: bash
47+
48+
ms2rescore-report <output_prefix>
49+
50+
Replace `<output_prefix>` with the actual output prefix of the result files to the output file.
51+
52+
Additional Options
53+
------------------
54+
55+
`ms2rescore` provides additional options to customize the `timsrescore` process. You can explore these options by running the following command:
56+
57+
.. code-block:: bash
58+
59+
timsrescore --help
60+
61+

ms2rescore/__init__.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
"""MS²Rescore: Sensitive PSM rescoring with predicted MS² peak intensities and RTs."""
22

3-
__version__ = "3.0.3"
3+
__version__ = "3.1.0-dev9"
44

55
from warnings import filterwarnings
66

0 commit comments

Comments
 (0)