-
Notifications
You must be signed in to change notification settings - Fork 72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Basecalling/demux+modbases: std::bad_alloc #1213
Comments
Hi @sklages, Can you repro this with a small dataset e.g. ~10k reads? Best regards, |
@HalfPhoton - I tried two different 10K subsets and both succeeded .. I will re-run the large dataset with both v0.8.3 and current v0.9.0 in order to see if I can actually reproduce the issue and if both version behave differently .. |
@HalfPhoton - Using the Version
Any idea where to start looking for the problem? It may be dataset-specific, system-specific .. Both with small memory footprint (less than 20G) and plenty of free diskspace. I will run a different dataset which worked before, to exclude the latter .. |
I ran both versions on another (smaller) dataset, both finished successfully .. so it seems somehow dataset(size)-related .. |
` [2025-01-19 18:57:20.000] [info] Running: "basecaller" "/models/[email protected]" "." "--modified-bases-models" "/models/[email protected]_5mCG_5hmCG@v3" "--device" "cuda:all" [2025-01-19 18:57:20.148] [info] > Creating basecall pipeline I’m encountering the same issue on my end. Certain datasets cause a crash with std::bad_alloc (not enough memory), despite having over 500GB of free RAM and plenty of disk space available. This behavior only occurs with some datasets. I’m still investigating whether there’s a pattern. |
@MueFab - that looks like a bug handling corrupt/invalid data. We're trying to allocate 18446744073709551552 bytes (~18500 PB!) which makes me think we've got a small negative number (that value is Is your dataset both small and something you're able to share with us? |
Hi, to get back to this issue : unfortunately I, cannot share the dataset as it is sensitive patient data. However, I did some debugging and experimented with different sets of parameters. The pod5 files don't seem to be damaged, at least I am able to open them with the pod5 python library without any issues. I also tried different settings for the batch size unsuccessfully. Currently, it seems to me like it might somehow related to modified basecalling with the latest model. I was using [email protected] + [email protected]_5mCG_5hmCG@v3 when I experienced the crashes. After downgrading to [email protected] + [email protected][email protected] the issue doesn't seem to appear any longer. |
@MueFab, |
I have a strange issue with
v0.8.3
when basecalling/demux in sup mode with modbases on a Nvidia A100/40G... directly after basecalling has finished (after appr 53h) ..
That happened with two datasets, short insert libraries, many reads. Never seen this with
dorado
before.What did
dorado
cause to crash immediately after basecalling has finished?Result files seem to be complete though, e.g.:
Any idea what is going wrong here, what is causing the
std::bad_alloc
?The text was updated successfully, but these errors were encountered: