-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize beam search #269
base: dev
Are you sure you want to change the base?
Optimize beam search #269
Conversation
- Move some initialization code for discarding invalid beams (i.e. which tokens are neutral losses or N-terminal residues) to `__init__` to only do it when initializing the model instead of every time a batch of beams is checked for termination. - Move peptide detokenizing to the point where it's absolutely needed. This is still a slow step (currently done in DepthCharge) that could maybe be skipped by calculating masses from tokens directly? - Only calculate the peptide mass once and then subtract masses of neutral losses instead of adding the neutral losses to the peptide and then calculating the mass of the entire peptide. - Early stopping when comparing delta masses for different isotopes. - Numba JIT compilation of mass error calculation.
Because the stop token is always predicted last, we don't need to take peptide order into account.
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## dev #269 +/- ##
==========================================
- Coverage 89.43% 88.71% -0.73%
==========================================
Files 12 12
Lines 909 913 +4
==========================================
- Hits 813 810 -3
- Misses 96 103 +7 ☔ View full report in Codecov by Sentry. |
I'm working on a bigger PR for depthcharge right now and can add support for this 👍
That's a good question. I don't know either 🤔. I guess the alternative is to move the calculated m/z to the GPU and perform calculations using pytorch. |
Speeding this up with Pytorch would be ideal of course. However, it would require some significant refactoring to make efficient. Currently each beam is processed consecutively, it's not immediate obvious how to vectorize/parallelize that to harness the GPU. |
The device changes between initialization and actual running, hence doing this in __init__ is insufficient.
__init__
to only do it when initializing the model instead of every time a batch of beams is checked for termination.