-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Filtering bedmethyl file and DMR analysis #364
Comments
Hello @baibhav-bioinfo,
You don't actually need to filter your input data for DMR. The model won't assign a high score or significant p-value to sites with very low coverage. You can find details about the model in the documentation. That being said, you may want to simply ignore positions with low valid coverage so you don't have them in the output, there is a
You do not have to perform any normalization, however there are
You can find the details of the model in the documentation |
(1) what if we have more number of reads in one sample than other. Then the --min-valid-coverage cutoff might get biased towards the sample with more overall depth. so, isnt it better to normalise? (2) like you said the comparison is only done if site is present in atleast one replicae of both condition, then what about the sites which are only present in one condition, those should be interesting to see too. |
Hello @baibhav-bioinfo,
I think the "balanced MAP-based p-value" and "balanced effect size" are similar to what you're looking for. I've described how this works in another issue. If one replica has low valid coverage, you don't really want it to have a equal influence on the overall scoring of a position since there's likely always going to be some sampling bias. By comparing the two values as @kylepalos has done here, you may find some positions that should be investigated.
These positions aren't output right now. But I agree that you may want to see them. For example, maybe there is a C>D event that drops a site out of a replica or condition. I'll see about adding these sites to the output. |
Thankyou so much for the detailed reply. So, if i want to analyse the sites which are only present in one of the conditions, can i use any other method manually. |
Do you mean looking for intra-condition variability? I.e. differentially methylated regions between replicates? You can use |
Hello @baibhav-bioinfo, Another user discovered a bug where some samples don't have alignments to a contig it will cause the whole contig to fail. I have posted a build on that issue. |
Hello,
I am using modkit to analyse the results from Dorado.
(1) I have generated the bedmethyl file from bam file. Now i need a filter criteria for "coverage" and "mod_rate" to get rid of noisy predictions.
can we directly use the filter on column "Nvalid_cov" as >=20 reads? or do we need to normalise it for per million reads?
(2) for Differential methylation analysis between conditions i am using dmr pair, following command
modkit dmr pair -a c6_r1.bed.gz -a c6_r2.bed.gz -a c6_r3.bed.gz -b dr6_r1.bed.gz -b dr6_r2.bed.gz -b dr6_r3.bed.gz -o dmr_result --ref Genome.fa --base A --threads 96 --log-filepath dmr_result.log
(i) I wonder how the modkit makes the unified list of sites from both conditions with replicates
(ii) how the modkit tools handles the sites which are present in one condition and not in another.
(iii) also what kind of test modkit applies to get the DMR sites
Thanks
The text was updated successfully, but these errors were encountered: