Skip to content

optimize performance of array_to_qualitystring#1363

Open
jchorl wants to merge 1 commit intopysam-developers:masterfrom
jchorl:jchorl/perf
Open

optimize performance of array_to_qualitystring#1363
jchorl wants to merge 1 commit intopysam-developers:masterfrom
jchorl:jchorl/perf

Conversation

@jchorl
Copy link
Copy Markdown

@jchorl jchorl commented Oct 2, 2025

I was profiling some code and found the majority of time is spent in array_to_qualitystring. This is particularly impactful on huge files with tons of reads.

The culprit is the allocation, copying, and computation in python. This optimization should allow the logic to all be compiled down to C.

Bench results:

Before:

---------------------------------------------------------- benchmark: 1 tests ----------------------------------------------------------
Name (time in us)                           Min       Max     Mean  StdDev   Median     IQR   Outliers  OPS (Kops/s)  Rounds  Iterations
----------------------------------------------------------------------------------------------------------------------------------------
test_fasta_iteration_long_sequences     75.7550  126.3460  78.9250  2.1202  78.4110  0.8720  1160;1541       12.6703   11453           1
----------------------------------------------------------------------------------------------------------------------------------------

After:

-------------------------------------------------------- benchmark: 1 tests -------------------------------------------------------
Name (time in us)                          Min      Max    Mean  StdDev  Median     IQR  Outliers  OPS (Kops/s)  Rounds  Iterations
-----------------------------------------------------------------------------------------------------------------------------------
test_fasta_iteration_long_sequences     1.2620  14.7180  1.3264  0.1447  1.3130  0.0200  409;1397      753.9372   45268           1
-----------------------------------------------------------------------------------------------------------------------------------

@jmarshall
Copy link
Copy Markdown
Member

Thanks, this looks like a good approach.

Eventually I want to add entry points to HTSlib so that we can just call HTSlib's SIMD-optimised versions of these conversions, but this is a big win in the meantime.

@jchorl
Copy link
Copy Markdown
Author

jchorl commented Oct 14, 2025

@jmarshall what would be the process to get this merged/released?

@jchorl
Copy link
Copy Markdown
Author

jchorl commented Feb 5, 2026

@jmarshall what would be the process to get this merged/released?

@jmarshall I was just profiling a process and again found this to be a bottleneck. Any chance we can get this merged?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants