-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dmr pair segment misses clearly differentially methylated region #367
Comments
Hello @Aarhus-kga, I tried to reproduce the result using the colo829 open dataset from epi2me. Here are the steps:
#!/bin/bash
set -eu
mkdir -p data
region="chr11:2,687,019-2,713,072"
echo "> first"
samtools view -bh https://ont-open-data.s3.amazonaws.com/colo829_2024.03/basecalls/colo829bl/sup/PAU59807.d052sup4305mCG_5hmCGvHg38.bam ${region} > data/PAU59807.d052sup4305mCG_5hmCGvHg38_region.bam
samtools index data/PAU59807.d052sup4305mCG_5hmCGvHg38_region.bam
echo "> second"
samtools view -bh https://ont-open-data.s3.amazonaws.com/colo829_2024.03/basecalls/colo829/sup/PAU59949.d052sup4305mCG_5hmCGvHg38.bam ${region} > data/PAU59949.d052sup4305mCG_5hmCGvHg38_region.bam
samtools index data/PAU59949.d052sup4305mCG_5hmCGvHg38_region.bam
echo "> done"
#!/bin/bash
set -euo pipefail
mkdir -p output
samples=(
PAU59807.d052sup4305mCG_5hmCGvHg38_region
PAU59949.d052sup4305mCG_5hmCGvHg38_region
)
reference=path/to/hg38.fasta
ls ${reference} > /dev/null
region="chr11:2,687,019-2,713,072"
for sample in "${samples[@]}"; do
echo "> ${sample}"
modkit pileup data/${sample}.bam stdout --preset traditional --ref ${reference} -t 30 --region ${region} | \
tee >(modkit bm tobigwig stdin output/${sample}_m.bw --mod-codes m -g ${reference}.fai --suppress-progress) \
| bgzip -c > output/${sample}.bed.gz
tabix output/${sample}.bed.gz
done
echo "> done"
set -eu
reference=again/the/path/to/hg38.fasta
normal=output/PAU59807.d052sup4305mCG_5hmCGvHg38_region.bed.gz
tumour=output/PAU59949.d052sup4305mCG_5hmCGvHg38_region.bed.gz
modkit dmr pair \
-a ${normal} \
-b ${tumour} \
-o output/dmr.bed \
--segment output/dmr_segments.bed \
--header \
--ref ${reference} \
--base C \
--threads 80 \
--log-filepath output/modkit_dmr_pair.log \
--force
echo "> done"
In my case I was able to detect the DMR: Here's the segmentation file I got: (I appreciate that this format isn't the easiest to read, but it's the easiest to copy out of the web!)
A couple things you could try:
You could try fiddling with the parameters in the
I would try |
Hi @ArtRand Thank you for your swift answer 😄 Epi2me data My data Your segments run with dmr pair
Looking at the dmr I'm interested in (chr11:2698550-2700889) my data has an effect size of 0.46901054 while the epi2me data has 0.47345015 which is very close. However, our data has a b_pct_modified of 0.15795796 while the epi2me data has 0.005038221. While our data has a low percentage it is not quite as overwhelmingly unmethylated as the epi2me data. Could this be the reason why the region is not being caught as a dmr segment? Fiddling with segmentation optionsFine-grained Significance factor It's awesome that I can now catch the dmr, however, I don't understand the algorithm well enough to tell whether increasing the significance factor could also have some negative effects. Do you know if it will? I also played around with changing |
I have a sample in which I know there is a hypomethylated region (chr11:2698767-2700902) in the KCNQ1OT1 gene. In IGV I can clearly see the difference in methylation between this sample (bottom track) and a 'normal' sample (top track):
I ran dmr pair with segmentation on this sample to see if it would catch it. For the sake of simplicity I used only one normal sample (the same one shown in IGV) as the -a input but in reality we would use more. The segmented output from dmr pair consisted of some very small 'different' segments, most being just a few bps, and the KCNQ1OT1 region was included in a large 'same' segment (ie. dmr pair didn't find the region to be a differentially methylated segment).
For the command I used default settings, apart from adding --segment:
However, when I look at the individual positions of the KCNQ1OT1 region in the non-segmented output file (also output from dmr pair) it looks pretty good. Most of the positions have quite high positive effect sizes and the methylation frequencies look correct. I don't know if it's helpful but here is an excerpt with a few positions from the non-segmented output file:
Could you help me figure out what is going wrong in the segmentation part and how I can fix it?
Thank you in advance 😄
The text was updated successfully, but these errors were encountered: