Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add brackets and hyphens to PTMs #201

Open
wsnoble opened this issue Jun 28, 2023 · 5 comments
Open

Add brackets and hyphens to PTMs #201

wsnoble opened this issue Jun 28, 2023 · 5 comments
Labels
enhancement New feature or request

Comments

@wsnoble
Copy link
Contributor

wsnoble commented Jun 28, 2023

I think we should modify how Casanovo reports PTMs in its predicted sequences. Ideally, we would be compatible with the Proforma specification:

https://github.com/HUPO-PSI/ProForma/blob/master/SpecDocument/ProForma_v2_draft15_February2022.pdf

But even if we don't adopt unimod IDs in our spec, we should at least make our formatting compatible with the mass difference approach. The only change is that each modification mass is enclosed in square brackets, and N-term and C-term mods are followed or preceded by a hyphen. So for example, -17.027C+57.021PSEC+57.021TC+57.021LDTVVR gets turned into [-17.027]-C[+57.021]PSEC[+57.021]TC[+57.021]LDTVVR.

@wsnoble wsnoble added the enhancement New feature or request label Jun 28, 2023
@wfondrie
Copy link
Collaborator

One of the updates I made in the recent depthcharge update is first-class support for proforma.

Even mskb-style mods are now handled as proforma internally.

@wsnoble
Copy link
Contributor Author

wsnoble commented Jun 28, 2023

Does that mean we can close this issue? I.e., will this get handled automatically once the new depthcharge changes are in place, or are additional changes on the casanovo side going to be necessary?

@wfondrie
Copy link
Collaborator

Yes, but let's leave the issue open and close it with that upgrade.

@bittremieux
Copy link
Collaborator

Note though that according to the official mzTab specification, the sequence column in the output file should be the unmodified sequence, and modifications should be reported in a modification column (page 15–16). This is because mzTab predates ProForma. We currently violate this part of the spec as well because it would be annoying to retokenize the predicted peptide sequences.

@wsnoble
Copy link
Contributor Author

wsnoble commented Jul 17, 2023

Comet (and Tide) solves this by producing three columns: one with the raw sequence, one with the sequence decorated with mods, and one with just the mods. I think this might be a nice thing to do in Casanovo.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants