Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize beam search #269

Draft
wants to merge 4 commits into
base: dev
Choose a base branch
from
Draft

Optimize beam search #269

wants to merge 4 commits into from

Conversation

bittremieux
Copy link
Collaborator

  • Move some initialization code for discarding invalid beams (i.e. which tokens are neutral losses or N-terminal residues) to __init__ to only do it when initializing the model instead of every time a batch of beams is checked for termination.
  • Move peptide detokenizing to the point where it's absolutely needed. This is still a slow step (currently done in DepthCharge) that could maybe be skipped by calculating masses from tokens directly? @wfondrie What do you think?
  • Only calculate the peptide mass once and then subtract masses of neutral losses instead of adding the neutral losses to the peptide and then calculating the mass of the entire peptide.
  • Early stopping when comparing delta masses for different isotopes.
  • Numba JIT compilation of mass error calculation. Although this requires moving the precursor m/z and charge tensors to CPU, so I don't know whether it's actually beneficial. Any thoughts @wfondrie @melihyilmaz?

- Move some initialization code for discarding invalid beams (i.e. which tokens are neutral losses or N-terminal residues) to `__init__` to only do it when initializing the model instead of every time a batch of beams is checked for termination.
- Move peptide detokenizing to the point where it's absolutely needed. This is still a slow step (currently done in DepthCharge) that could maybe be skipped by calculating masses from tokens directly?
- Only calculate the peptide mass once and then subtract masses of neutral losses instead of adding the neutral losses to the peptide and then calculating the mass of the entire peptide.
- Early stopping when comparing delta masses for different isotopes.
- Numba JIT compilation of mass error calculation.
Because the stop token is always predicted last, we don't need to take peptide order into account.
Copy link

codecov bot commented Dec 6, 2023

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (235420f) 89.43% compared to head (bc56c49) 88.71%.

Additional details and impacted files
@@            Coverage Diff             @@
##              dev     #269      +/-   ##
==========================================
- Coverage   89.43%   88.71%   -0.73%     
==========================================
  Files          12       12              
  Lines         909      913       +4     
==========================================
- Hits          813      810       -3     
- Misses         96      103       +7     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@wfondrie
Copy link
Collaborator

wfondrie commented Dec 6, 2023

Move peptide detokenizing to the point where it's absolutely needed. This is still a slow step (currently done in DepthCharge) that could maybe be skipped by calculating masses from tokens directly? @wfondrie What do you think?

I'm working on a bigger PR for depthcharge right now and can add support for this 👍

Numba JIT compilation of mass error calculation. Although this requires moving the precursor m/z and charge tensors to CPU, so I don't know whether it's actually beneficial. Any thoughts @wfondrie @melihyilmaz?

That's a good question. I don't know either 🤔. I guess the alternative is to move the calculated m/z to the GPU and perform calculations using pytorch.

@bittremieux
Copy link
Collaborator Author

bittremieux commented Dec 6, 2023

I guess the alternative is to move the calculated m/z to the GPU and perform calculations using pytorch.

Speeding this up with Pytorch would be ideal of course. However, it would require some significant refactoring to make efficient. Currently each beam is processed consecutively, it's not immediate obvious how to vectorize/parallelize that to harness the GPU.

The device changes between initialization and actual running, hence doing this in __init__ is insufficient.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants