Filtering bedmethyl file and DMR analysis #364

baibhav-bioinfo · 2025-02-03T15:40:41Z

Hello,
I am using modkit to analyse the results from Dorado.

(1) I have generated the bedmethyl file from bam file. Now i need a filter criteria for "coverage" and "mod_rate" to get rid of noisy predictions.

can we directly use the filter on column "Nvalid_cov" as >=20 reads? or do we need to normalise it for per million reads?
(2) for Differential methylation analysis between conditions i am using dmr pair, following command
modkit dmr pair -a c6_r1.bed.gz -a c6_r2.bed.gz -a c6_r3.bed.gz -b dr6_r1.bed.gz -b dr6_r2.bed.gz -b dr6_r3.bed.gz -o dmr_result --ref Genome.fa --base A --threads 96 --log-filepath dmr_result.log

(i) I wonder how the modkit makes the unified list of sites from both conditions with replicates
(ii) how the modkit tools handles the sites which are present in one condition and not in another.
(iii) also what kind of test modkit applies to get the DMR sites

Thanks

ArtRand · 2025-02-04T19:39:28Z

Hello @baibhav-bioinfo,

(1) I have generated the bedmethyl file from bam file. Now i need a filter criteria for "coverage" and "mod_rate" to get rid of noisy predictions.

You don't actually need to filter your input data for DMR. The model won't assign a high score or significant p-value to sites with very low coverage. You can find details about the model in the documentation. That being said, you may want to simply ignore positions with low valid coverage so you don't have them in the output, there is a --min-valid-coverage option for that.

can we directly use the filter on column "Nvalid_cov" as >=20 reads? or do we need to normalise it for per million reads?

You do not have to perform any normalization, however there are --max-coverages and --cap-coverages options if you have very imbalanced data. With your command, the replicates are matched (meaning you have 3 of each), so you will see the balanced output as well.

(i) I wonder how the modkit makes the unified list of sites from both conditions with replicates
A site must be present in at least 1 replicate from each condition
(ii) how the modkit tools handles the sites which are present in one condition and not in another.
If a site is not present in any of the replicates in one condition, it will not be scored (there's nothing to compare!).
(iii) also what kind of test modkit applies to get the DMR sites

You can find the details of the model in the documentation

baibhav-bioinfo · 2025-02-04T19:58:17Z

(1) what if we have more number of reads in one sample than other. Then the --min-valid-coverage cutoff might get biased towards the sample with more overall depth. so, isnt it better to normalise?

(2) like you said the comparison is only done if site is present in atleast one replicae of both condition, then what about the sites which are only present in one condition, those should be interesting to see too.

ArtRand · 2025-02-07T00:38:50Z

Hello @baibhav-bioinfo,

(1) what if we have more number of reads in one sample than other. Then the --min-valid-coverage cutoff might get biased towards the sample with more overall depth. so, isnt it better to normalise?

I think the "balanced MAP-based p-value" and "balanced effect size" are similar to what you're looking for. I've described how this works in another issue. If one replica has low valid coverage, you don't really want it to have a equal influence on the overall scoring of a position since there's likely always going to be some sampling bias. By comparing the two values as @kylepalos has done here, you may find some positions that should be investigated.

(2) like you said the comparison is only done if site is present in atleast one replicae of both condition, then what about the sites which are only present in one condition, those should be interesting to see too.

These positions aren't output right now. But I agree that you may want to see them. For example, maybe there is a C>D event that drops a site out of a replica or condition. I'll see about adding these sites to the output.

baibhav-bioinfo · 2025-02-07T00:49:36Z

Thankyou so much for the detailed reply.

So, if i want to analyse the sites which are only present in one of the conditions, can i use any other method manually.
such as EdgeR in a way that it supports our modification data.

ArtRand · 2025-02-07T02:29:10Z

@baibhav-bioinfo

if i want to analyse the sites which are only present in one of the conditions

Do you mean looking for intra-condition variability? I.e. differentially methylated regions between replicates? You can use dmr multi for that.

ArtRand · 2025-02-18T18:39:42Z

Hello @baibhav-bioinfo,

Another user discovered a bug where some samples don't have alignments to a contig it will cause the whole contig to fail. I have posted a build on that issue.

baibhav-bioinfo changed the title ~~Filtering bedmethyl file result based on coverage~~ Filtering bedmethyl file and DMR analysis Feb 4, 2025

ArtRand added question Looking for clarification on inputs and/or outputs DMR modkit dmr labels Feb 4, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Filtering bedmethyl file and DMR analysis #364

Filtering bedmethyl file and DMR analysis #364

baibhav-bioinfo commented Feb 3, 2025 •

edited

Loading

ArtRand commented Feb 4, 2025 •

edited

Loading

baibhav-bioinfo commented Feb 4, 2025 •

edited

Loading

ArtRand commented Feb 7, 2025

baibhav-bioinfo commented Feb 7, 2025

ArtRand commented Feb 7, 2025 •

edited

Loading

ArtRand commented Feb 18, 2025

Filtering bedmethyl file and DMR analysis #364

Filtering bedmethyl file and DMR analysis #364

Comments

baibhav-bioinfo commented Feb 3, 2025 • edited Loading

ArtRand commented Feb 4, 2025 • edited Loading

baibhav-bioinfo commented Feb 4, 2025 • edited Loading

ArtRand commented Feb 7, 2025

baibhav-bioinfo commented Feb 7, 2025

ArtRand commented Feb 7, 2025 • edited Loading

ArtRand commented Feb 18, 2025

baibhav-bioinfo commented Feb 3, 2025 •

edited

Loading

ArtRand commented Feb 4, 2025 •

edited

Loading

baibhav-bioinfo commented Feb 4, 2025 •

edited

Loading

ArtRand commented Feb 7, 2025 •

edited

Loading