-
Notifications
You must be signed in to change notification settings - Fork 72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Does Dorado give high false positive rate for 6mA? #1234
Comments
Hi @Pombeboy, What do you mean by "showed very high percentage of A being called as modified 6mA"? Does this simply include all bases marked in the MM/ML tags, or are you applying any filtering? Dorado will output modification probabilities for any A-base that matches the context (for |
Hi Malton-ONT, Thanks for your message. I will use different thresholds for basecalling. But for my sample that shouldn't contain 6mA, for a single read, it still contains quite a lot of A showing high likelihood of methylation such as >80% or even 1. I am new to this. Is this normal? The top one is the sample treated with 6mA enzyme while the bottom one is the sample that shouldn't contains any 6mA. But as you can see in the default setting, lots of 6mA are called in the untreated samples. I assume that usually we would need high coverage so in a uniform sample, we can tell whether this base is modified or not. But if in this case, I assume the efficiency of 6mA methylation is not 100% so I can't use coverage to really gain confidence on whether certain A is modified. In that case, I would rely on the baseball likelihood on individual reads to know whether this region is frequently modified. But under the current circumstance, because unmodified samples showed high likelihood of 6mA in each read, the background become really high. I wonder whether you have any suggestions on dealing with this? Thank you! Threshold: 0.05 (Top: modified sample, Bottom: unmodified) Threshold: 0.7 (Top: modified sample, Bottom: unmodified. As you can see the background of is very high for individual reads) Best, |
Hello @Pombeboy, Could you tell me which basecaller/modification models you used? A couple things you might want to try:
The If you have treated/untreated samples, the DMR function in Modkit will handle the fact that at your exogenous 6mA won't be 100% at any given position. |
I'm not clear on the difference between "space-saving optimization" and "remove low confidence calls" - if the value is below threshold then that base is marked as a skip in the MM tags and is therefore absent from the ML tag - I don't know how dorado generates a probability for each modification in the model and therefore, implicitly, a probability for the canonical base as well. Bases are "skipped" if the modification probability is below the specified threshold - for a single-mod model like 6mA a low modification probability is equivalent to a high-confidence canonical base, but for multi-mod models (like 5mC_5hmC, for example) it would be possible for one modification to very low probability (and therefore skipped) while the other remains above the threshold (e.g. m = 0.03, h = 0.85, C = 0.12 would still include the |
Hey @malton-ont, I think there is a potential sharp edge here.
Correct, a low probability of 6mA corresponds to a high unmodified probability, so concretely say you have Under the default settings, the probability for 6mA (0.3) would be emitted in the ML tag, but say you set Remember that when a call FAILs, we don't use it at all in the calculation of the %-modification at a genomic location. So I suppose you could use Lastly, in the example here:
|
Thanks @ArtRand, that's a nice explanation! So I guess the advice to users here is "leave the dorado threshold alone if you're planning to filter in modkit". |
Hi ONT community,
I am doing some 6mA detection after in situ DNA methylation in an organism that is suppose to have no 6mA. However, after peak calling, my control sample (without 6mA enzyme treatment) showed very high percentage of A being called as modified 6mA. Have you experienced something like this before? Thanks.
Best,
J
The text was updated successfully, but these errors were encountered: