Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do you use modkit dmr? #361

Open
ArtRand opened this issue Jan 31, 2025 · 2 comments
Open

Do you use modkit dmr? #361

ArtRand opened this issue Jan 31, 2025 · 2 comments
Labels
good first issue Good for newcomers

Comments

@ArtRand
Copy link
Contributor

ArtRand commented Jan 31, 2025

Hello everyone.

I'd like to know if you're having issues with modkit dmr either in the pair or multi variety.

If you're not using it (but you are doing some kind of differential methylation analysis), why not?

Are the outputs hard to interpret, not helpful, or not compatible to other methods?
Is it too slow or (worse) are there bugs?

One thing that's on my immediate roadmap is to compare an open dataset to a published tool such as DSS. I'm also experimenting with a method to get p-values for regions so you could find significantly differentially methylated regions.

If you're using it and liking it, throw a 👍 on here for fun. But don't hold back if there are things that could be better. Of course I'm not promising I can get to all of them.

@ArtRand ArtRand added the good first issue Good for newcomers label Jan 31, 2025
@kylepalos
Copy link

I've been using DMR quite a bit and it has been fast and intuitive! Thanks to the devs for making Modkit a very user-friendly tool!

I do have two very minor questions that I couldn't really find answers to elsewhere.
In both cases, I usually perform paired site specific analyses, such as:

modkit dmr pair \
-a sample1_rep1.bed.gz -a sample1_rep2.bed.gz \
-b sample2_rep1.bed.gz -b sample2_rep2.bed.gz \
-o DMR.bed \
--ref reference.fasta \
--base A --base T \
--min-valid-coverage 10
  1. When analyzing the outputs with balanced replicates, would you recommend always analyzing the balanced effect sizes and p-values (rather than the un-balanced/raw values)? The effect sizes seem to be agreeable b/w raw and balanced, but p-values agree less, see attached scatter plots below. I'm not sure if this is expected behavior or if something about my analysis may be off.

  2. This one is ever more minor. I often analyze modification mutants where the effects are quite strong and a substantial fraction of my p-values (balanced or raw) == 0. I realize the exact p-value past a certain point isn't very interesting/informative, but I was wondering whether the range of reporting could/should be expanded beyond ~1e-50? This would just allow me to not have a massive clump of points at a very similar -log10(p-value) on volcano plots and similar graphics. Again, extremely minor and not actually a Modkit issue.

Image

Image

Thanks a lot!

@ArtRand
Copy link
Contributor Author

ArtRand commented Feb 8, 2025

@kylepalos Thanks for this!

When analyzing the outputs with balanced replicates, would you recommend always analyzing the balanced effect sizes and p-values (rather than the un-balanced/raw values)? The effect sizes seem to be agreeable b/w raw and balanced, but p-values agree less, see attached scatter plots below. I'm not sure if this is expected behavior or if something about my analysis may be off.

Let me take a look into this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

2 participants