-
Notifications
You must be signed in to change notification settings - Fork 582
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
samtools 1.15 become very slow when I set maxcnt to 8,000,000 #1654
Comments
As NEWS file for 1.15 says:
I don't think anyone tried it on extremely high depth data. |
I'm struggling to reproduce the claimed speed increase in tview. #1548 shows the data it was tested on. When I did the testing at the time, I didn't realise that tview had a maximum depth limit given that it has no option to change this depth. So it was high depth, but capped at 8000 so not really high. The test was vertically scrolling in a large pileup. Repeating that test with the default 8000 I see:
So that's 27.7s to 25.1s going from introsort to splaysort. With a With a single test rather than scrolling, doing just However in both cases performance is, frankly, pretty tragic. I certainly don't see any enormous changes here. What sort of speed differences were you seeing? Clearly with this test, vertical scrolling, it ought to be almost instant. Every vertical scroll causes a complete reload of the entire data. I don't understand why having laid out the records it doesn't just redisplay one line lower down! I guess it's discarding the pileup data. It's probably worth figuring that out too as it'd have a much larger usability improvement for deep data. However it's a different issue and unrelated to sorting functions. |
I want to use samtools to handle extremely high depth data. So I changed
iter->maxcnt = 8000;
in sam.c to 8000000.
Then it becomes very slow that it becomes not unusable.
I changed the same place in samtools 1.12 and it works like a charm.
Upon doing some debugging and compare the differences between 1.12 and 1.15,
I noticed that
ks_introsort(node, tv->n_nodes, tv->aux);
in bam_lpileup.c of 1.12 is replaced by
splaysort(node, tv->n_nodes, tv->aux);
in 1.15. After rename splaysort with
ks_introsort, 1.15 works as expected.
So my conclusion is that splaysort doesn't work
when maxcnt is large. Is this a bug or is it expected?
Why was ks_introsort replaced by splaysort in 1.15?
The text was updated successfully, but these errors were encountered: