Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Manhattan plots fail #25

Open
Ax3man opened this issue Oct 12, 2023 · 4 comments
Open

Manhattan plots fail #25

Ax3man opened this issue Oct 12, 2023 · 4 comments

Comments

@Ax3man
Copy link

Ax3man commented Oct 12, 2023

Hi Kivanc,

I've updated kGWASflow and tried to run the analysis including aligning the kmers. Unfortunately, generating the Manhattan plot yields errors, with unhelpful logs:

y_max: 19.284643066713652
y_min: 9.220349838454867
Plotting...
/MankFlex/wouter/color_selection_lines/kmergwas/kgwasflow/.snakemake/scripts/tmpk48uhlxn.plot_manhattan.py:141: FutureWarning: The default dtype for empty Series will be 'object' instead of 'float64' in a future version. Specify a dtype explicitly to silence this warning.
  align_kmers_sam_with_all_chrom_sorted = align_kmers_sam_with_all_chrom.sort_values(by=["chr", "bp"], key=natsort.natsort_keygen())

I don't really care about the manhattan plots, I can make those myself, but turning them off here didn't do anything. I tried to check, and as far as I can tell, that option does not appear to be used anywhere? Shouldn't the manhattan rule have an if statement like here?

I'll just continue without the aligned kmers, but thought I would let you know.

@akcorut
Copy link
Owner

akcorut commented Oct 13, 2023

Hi @Ax3man,

Thank you for reporting this issue. You are right about the plot_manhattan setting. I originally planned manhattan plot generation to be optional, but later I decided to keep it permanent as part of align_kmers phase since it wasn't costing much time or memory, and technically any output from align_kmers step should work with plot_manhattan rule without any problem.

Would you mind sharing your SAM file output generated during align_kmers step? It should be in "results/align_kmers/{your_pheno}/{your_pheno}_kmers_alignment.sam". I would like to see what caused the problem during Manhattan plot generation so I can fix it.

I will also make the necessary changes regarding plot_manhattan setting.

Best,
Kivanc

@cedarwarman
Copy link

Hi Kivanc,

I'm getting the same error, I checked for the SAM output as you suggested and there wasn't one. All of log files in that directory were empty, except kmers_align.bowtie2.log, which read:

54 reads; of these:
  54 (100.00%) were unpaired; of these:
    0 (0.00%) aligned 0 times
    36 (66.67%) aligned exactly 1 time
    18 (33.33%) aligned >1 times
100.00% overall alignment rate

I'm assuming this is why the Manhattan plot isn't working, since the Manhattan plot error message is talking about an empty Series. Not sure why the SAM output isn't being created or why the Manhattan step is running without it, I will let you know if I track it down.

Thanks,

Cedar

@cedarwarman
Copy link

cedarwarman commented Nov 6, 2023

I've generated the SAM output, here are the first few lines:

@HD     VN:1.5  SO:unsorted     GO:query
@SQ     SN:SL4.0ch00    LN:9643250
@SQ     SN:SL4.0ch01    LN:90863682
@SQ     SN:SL4.0ch02    LN:53473368
@SQ     SN:SL4.0ch03    LN:65298490
@SQ     SN:SL4.0ch04    LN:64459972
@SQ     SN:SL4.0ch05    LN:65269487
@SQ     SN:SL4.0ch06    LN:47258699
@SQ     SN:SL4.0ch07    LN:67883646
@SQ     SN:SL4.0ch08    LN:63995357
@SQ     SN:SL4.0ch09    LN:68513564
@SQ     SN:SL4.0ch10    LN:64792705
@SQ     SN:SL4.0ch11    LN:54379777
@SQ     SN:SL4.0ch12    LN:66688036
@PG     ID:bowtie2      PN:bowtie2      VN:2.4.5        CL:"/xdisk/rpalaniv/cedar/kmers-gwas/flowers_varitome_test_dir/.snakemake/conda/221bdcf96cc8a42228d668ab0f75056c_/bin/bowtie2-align-s --wrapper basic-0 -p 1 -x /xdisk/rpalaniv/cedar/kmers-gwas/flowers_varitome_test_dir/resources/ref/genome/bowtie2_index/genome -f /xdisk/rpalaniv/cedar/kmers-gwas/flowers_varitome_test_dir/results/fetch_kmers/anther_pistil_ratio_kmers_list.fa -S /home/u16/cedar/scratch/kGWASflow/aligned_kmers/aligned_kmers.sam"
kmer1_3.707106e-11      16      SL4.0ch09       48997168        1       31M     *       0       0       TATTTCCGCACTTTGTTAGATATTTGGTTTT IIIIIIIIIIIIIIIIIIIIIIIIIIIIIII AS:i:-6 XS:i:-6 XN:i:0  XM:i:1  XO:i:0  XG:i:0  NM:i:1  MD:Z:24G6       YT:Z:UU
kmer2_3.415117e-12      16      SL4.0ch01       10753768        24      31M     *       0       0       GAGGTAATTATCAATTTCTTTTTGACATTTT IIIIIIIIIIIIIIIIIIIIIIIIIIIIIII AS:i:-6 XN:i:0  XM:i:1  XO:i:0  XG:i:0  NM:i:1  MD:Z:25T5       YT:Z:UU
kmer3_5.486155e-13      16      SL4.0ch03       15427129        24      31M     *       0       0       TCACTCTGTCTGAAAAAGAATGTCCTAGTTT IIIIIIIIIIIIIIIIIIIIIIIIIIIIIII AS:i:-6 XN:i:0  XM:i:1  XO:i:0  XG:i:0  NM:i:1  MD:Z:2C28       YT:Z:UU
kmer4_8.487645e-12      16      SL4.0ch05       48950027        1       31M     *       0       0       AACTCTCCTTTGTAGGTATGTACCTCCATTT IIIIIIIIIIIIIIIIIIIIIIIIIIIIIII AS:i:-6 XS:i:-6 XN:i:0  XM:i:1  XO:i:0  XG:i:0  NM:i:1  MD:Z:1C29       YT:Z:UU

I think the problem is that the regex in the plot_manhattan.py script is not recognizing the chromosome format. I'm using the tomato genome.

@VanOverbeeke
Copy link

Hi @cedarwarman, I had the same issue today and was able to continue making Manhattan plots by replacing this line:

continue # skip chromosome

with name = chromosome_name and it will let you go on with your own chromosome names. If you have many smaller contigs that you want to leave out, you'll have to tweak the regex yourself.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants