Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Huge difference in mapping rates between two mouse tissues? #2249

Open
jdrnevich opened this issue Dec 2, 2024 · 1 comment
Open

Huge difference in mapping rates between two mouse tissues? #2249

jdrnevich opened this issue Dec 2, 2024 · 1 comment

Comments

@jdrnevich
Copy link

I have a bulk RNA-Seq data set of 2x150 PE for 36 mouse samples, half from liver tissue and half from muscle (gastrocnemius). FastQC shows the muscle samples with only a slight increase in % Duplicates and % GC content over the liver samples. Both aligned to the same GRCm39/Gencode M33 genome using STAR/2.7.10a in a custom nextflow script using the following parameters (we add the gtf in on the fly rather than in the index):

      STAR --runThreadN 8 \
      --genomeDir star-2.7.10a-gencode-GRCm39.primary_assembly \
      --readFilesIn ${read1} ${read2} \
      --sjdbGTFfile gencode.vM33.primary_assembly.annotation.gtf \
      --readFilesCommand gunzip -c \
      --outFileNamePrefix ${pair_id}_ \
      --sjdbGTFtagExonParentGene gene_id \
      --outSAMtype BAM SortedByCoordinate \
      --quantMode GeneCounts \
      --runDirPerm All_RWX \
      --sjdbOverhang 149

The liver samples mapped at a normal rate of ~150 million of reads per hour but the muscle samples are only mapping around 2 M rph, and with ~80 million reads per sample, they are taking ~2 days each! All samples including muscle had 80-90% uniquely mapping reads. What could be causing the huge slow down in mapping the muscle reads? I've googled a bit and found #996 but neither the number of contigs or low mappability seem to apply to the muscle samples. I've attached a full Log.final.out for one muscle and one liver sample, and also pulled out mapping speed and % uniquely mapped reads for all. Thanks for any advice (other than wait for the last few muscle samples to finish)!

$ cat Gastroc_DEPOWER_SPo_16_CAACAATTCG-GCGACCGATT_L006_Log.final.out
                                 Started job on |       Nov 25 05:17:43
                             Started mapping on |       Nov 25 05:21:29
                                    Finished on |       Nov 26 22:57:38
       Mapping speed, Million of reads per hour |       2.31

                          Number of input reads |       96288036
                      Average input read length |       281
                                    UNIQUE READS:
                   Uniquely mapped reads number |       86806641
                        Uniquely mapped reads % |       90.15%
                          Average mapped length |       281.25
                       Number of splices: Total |       101457314
            Number of splices: Annotated (sjdb) |       100750020
                       Number of splices: GT/AG |       100754298
                       Number of splices: GC/AG |       542168
                       Number of splices: AT/AC |       52426
               Number of splices: Non-canonical |       108422
                      Mismatch rate per base, % |       0.24%
                         Deletion rate per base |       0.01%
                        Deletion average length |       3.50
                        Insertion rate per base |       0.01%
                       Insertion average length |       1.55
                             MULTI-MAPPING READS:
        Number of reads mapped to multiple loci |       6886393
             % of reads mapped to multiple loci |       7.15%
        Number of reads mapped to too many loci |       355315
             % of reads mapped to too many loci |       0.37%
                                  UNMAPPED READS:
  Number of reads unmapped: too many mismatches |       0
       % of reads unmapped: too many mismatches |       0.00%
            Number of reads unmapped: too short |       2196405
                 % of reads unmapped: too short |       2.28%
                Number of reads unmapped: other |       43282
                     % of reads unmapped: other |       0.04%
                                  CHIMERIC READS:
                       Number of chimeric reads |       0
                            % of chimeric reads |       0.00%


$ cat Liver_DEPOWER_SPo_16_GTAGGTACAA-CACTCAAGAA_L006_Log.final.out
                                 Started job on |       Nov 23 03:19:24
                             Started mapping on |       Nov 23 03:23:09
                                    Finished on |       Nov 23 04:07:09
       Mapping speed, Million of reads per hour |       115.39

                          Number of input reads |       84622906
                      Average input read length |       281
                                    UNIQUE READS:
                   Uniquely mapped reads number |       69449931
                        Uniquely mapped reads % |       82.07%
                          Average mapped length |       280.69
                       Number of splices: Total |       69362082
            Number of splices: Annotated (sjdb) |       68898721
                       Number of splices: GT/AG |       68788068
                       Number of splices: GC/AG |       470609
                       Number of splices: AT/AC |       28236
               Number of splices: Non-canonical |       75169
                      Mismatch rate per base, % |       0.24%
                         Deletion rate per base |       0.01%
                        Deletion average length |       3.25
                        Insertion rate per base |       0.01%
                       Insertion average length |       1.94
                             MULTI-MAPPING READS:
        Number of reads mapped to multiple loci |       5974139
             % of reads mapped to multiple loci |       7.06%
        Number of reads mapped to too many loci |       286417
             % of reads mapped to too many loci |       0.34%
                                  UNMAPPED READS:
  Number of reads unmapped: too many mismatches |       0
       % of reads unmapped: too many mismatches |       0.00%
            Number of reads unmapped: too short |       2416728
                 % of reads unmapped: too short |       2.86%
                Number of reads unmapped: other |       6495691
                     % of reads unmapped: other |       7.68%
                                  CHIMERIC READS:
                       Number of chimeric reads |       0
                            % of chimeric reads |       0.00%



$ grep "speed" *Log.final.out
Gastroc_DEPOWER_SPo_16_CAACAATTCG-GCGACCGATT_L006_Log.final.out:       Mapping speed, Million of reads per hour |       2.31
Gastroc_DEPOWER_SPo_25_TAGATCCGAA-ACTTCGATAG_L006_Log.final.out:       Mapping speed, Million of reads per hour |       1.67
Gastroc_DEPOWER_TPo_14_TCTTAACTGG-TTCAGGCCGA_L006_Log.final.out:       Mapping speed, Million of reads per hour |       1.76
Gastroc_DEPOWER_TPo_17_GTCACATCCG-AGAACAGTGA_L006_Log.final.out:       Mapping speed, Million of reads per hour |       1.99
Gastroc_DEPOWER_TPo_21_TGAAGCATCT-GTTAGATACC_L006_Log.final.out:       Mapping speed, Million of reads per hour |       2.27
Gastroc_POWER_SPo_10_TGCGTGCGAA-ATCGGTATGA_L006_Log.final.out:       Mapping speed, Million of reads per hour | 2.14
Gastroc_POWER_SPo_1_TTGACCTAGC-GGAACACCAA_L006_Log.final.out:       Mapping speed, Million of reads per hour |  2.57
Gastroc_POWER_SPo_6_GCTTCAATCA-TTCCACCTGG_L006_Log.final.out:       Mapping speed, Million of reads per hour |  1.96
Gastroc_POWER_TPo_3_AATGGTACCT-ACAACCGTCG_L006_Log.final.out:       Mapping speed, Million of reads per hour |  1.85
Gastroc_POWER_TPo_8_GTAACATTGG-TCGTGGCCAA_L006_Log.final.out:       Mapping speed, Million of reads per hour |  1.71
Gastroc_REPOWER_SPo_27_CGGACTACTT-CCTTAATACG_L006_Log.final.out:       Mapping speed, Million of reads per hour |       2.19
Gastroc_REPOWER_SPo_31_AACGGAGTCC-GACACTCCTA_L006_Log.final.out:       Mapping speed, Million of reads per hour |       2.23
Gastroc_REPOWER_SPo_36_AGGTGTGACC-GAGTTCGGAG_L006_Log.final.out:       Mapping speed, Million of reads per hour |       2.05
Gastroc_REPOWER_TPo_27_CCAGAGTTCC-CGAGAGACAT_L006_Log.final.out:       Mapping speed, Million of reads per hour |       2.15
Gastroc_REPOWER_TPo_33_GACTGACATA-ATGCGGCCAA_L006_Log.final.out:       Mapping speed, Million of reads per hour |       2.75
Liver_DEPOWER_SPo_16_GTAGGTACAA-CACTCAAGAA_L006_Log.final.out:       Mapping speed, Million of reads per hour | 115.39
Liver_DEPOWER_SPo_24_TGCGACTTCG-TTGTTGCAGA_L006_Log.final.out:       Mapping speed, Million of reads per hour | 157.89
Liver_DEPOWER_SPo_25_TTGTCACGTT-TGTGTAGCGG_L006_Log.final.out:       Mapping speed, Million of reads per hour | 149.13
Liver_DEPOWER_TPo_14_CAACGACTGA-CCTGACAGAG_L006_Log.final.out:       Mapping speed, Million of reads per hour | 167.74
Liver_DEPOWER_TPo_17_GATTCGGCTA-AGACCGTTAA_L006_Log.final.out:       Mapping speed, Million of reads per hour | 148.09
Liver_DEPOWER_TPo_21_TGGTGGCTAG-AAGGTCATCG_L006_Log.final.out:       Mapping speed, Million of reads per hour | 147.29
Liver_POWER_SPo_10_ATCCAATAGG-TCGTCATCTT_L006_Log.final.out:       Mapping speed, Million of reads per hour |   153.41
Liver_POWER_SPo_1_GCGATCCTTG-CTCAGACACC_L006_Log.final.out:       Mapping speed, Million of reads per hour |    143.57
Liver_POWER_SPo_6_TGTTCCACTT-AATGGCACGG_L006_Log.final.out:       Mapping speed, Million of reads per hour |    131.55
Liver_POWER_TPo_3_AGACCGTTAA-TGGCAATACA_L006_Log.final.out:       Mapping speed, Million of reads per hour |    140.22
Liver_POWER_TPo_4_ACTATTGACC-GCCGATGGTT_L006_Log.final.out:       Mapping speed, Million of reads per hour |    126.06
Liver_POWER_TPo_8_GCCTAATTCC-AGGTTGCACG_L006_Log.final.out:       Mapping speed, Million of reads per hour |    132.88
Liver_REPOWER_SPo_27_AGGCCAGGAT-ACTCCTGCCT_L006_Log.final.out:       Mapping speed, Million of reads per hour | 134.40
Liver_REPOWER_SPo_31_AACGCCTGTG-CGAGTCCGTT_L006_Log.final.out:       Mapping speed, Million of reads per hour | 138.02
Liver_REPOWER_SPo_36_CGTGTGAGTG-GCCGAACCAA_L006_Log.final.out:       Mapping speed, Million of reads per hour | 122.46
Liver_REPOWER_TPo_27_CGTATGTGAA-AGGTAGTGCG_L006_Log.final.out:       Mapping speed, Million of reads per hour | 147.61
Liver_REPOWER_TPo_29_TACGTCACAA-CGTGGTATGG_L006_Log.final.out:       Mapping speed, Million of reads per hour | 151.13
Liver_REPOWER_TPo_33_GGAAGATCCG-TACCGAATTC_L006_Log.final.out:       Mapping speed, Million of reads per hour | 137.76

$ grep "Uniquely mapped reads %" *Log.final.out
Gastroc_DEPOWER_SPo_16_CAACAATTCG-GCGACCGATT_L006_Log.final.out:                        Uniquely mapped reads % |       90.15%
Gastroc_DEPOWER_SPo_25_TAGATCCGAA-ACTTCGATAG_L006_Log.final.out:                        Uniquely mapped reads % |       91.53%
Gastroc_DEPOWER_TPo_14_TCTTAACTGG-TTCAGGCCGA_L006_Log.final.out:                        Uniquely mapped reads % |       90.46%
Gastroc_DEPOWER_TPo_17_GTCACATCCG-AGAACAGTGA_L006_Log.final.out:                        Uniquely mapped reads % |       87.99%
Gastroc_DEPOWER_TPo_21_TGAAGCATCT-GTTAGATACC_L006_Log.final.out:                        Uniquely mapped reads % |       90.25%
Gastroc_POWER_SPo_10_TGCGTGCGAA-ATCGGTATGA_L006_Log.final.out:                        Uniquely mapped reads % | 89.67%
Gastroc_POWER_SPo_1_TTGACCTAGC-GGAACACCAA_L006_Log.final.out:                        Uniquely mapped reads % |  91.37%
Gastroc_POWER_SPo_6_GCTTCAATCA-TTCCACCTGG_L006_Log.final.out:                        Uniquely mapped reads % |  89.79%
Gastroc_POWER_TPo_3_AATGGTACCT-ACAACCGTCG_L006_Log.final.out:                        Uniquely mapped reads % |  88.38%
Gastroc_POWER_TPo_8_GTAACATTGG-TCGTGGCCAA_L006_Log.final.out:                        Uniquely mapped reads % |  87.24%
Gastroc_REPOWER_SPo_27_CGGACTACTT-CCTTAATACG_L006_Log.final.out:                        Uniquely mapped reads % |       89.29%
Gastroc_REPOWER_SPo_31_AACGGAGTCC-GACACTCCTA_L006_Log.final.out:                        Uniquely mapped reads % |       88.53%
Gastroc_REPOWER_SPo_36_AGGTGTGACC-GAGTTCGGAG_L006_Log.final.out:                        Uniquely mapped reads % |       90.11%
Gastroc_REPOWER_TPo_27_CCAGAGTTCC-CGAGAGACAT_L006_Log.final.out:                        Uniquely mapped reads % |       86.86%
Gastroc_REPOWER_TPo_33_GACTGACATA-ATGCGGCCAA_L006_Log.final.out:                        Uniquely mapped reads % |       89.51%
Liver_DEPOWER_SPo_16_GTAGGTACAA-CACTCAAGAA_L006_Log.final.out:                        Uniquely mapped reads % | 82.07%
Liver_DEPOWER_SPo_24_TGCGACTTCG-TTGTTGCAGA_L006_Log.final.out:                        Uniquely mapped reads % | 87.17%
Liver_DEPOWER_SPo_25_TTGTCACGTT-TGTGTAGCGG_L006_Log.final.out:                        Uniquely mapped reads % | 87.47%
Liver_DEPOWER_TPo_14_CAACGACTGA-CCTGACAGAG_L006_Log.final.out:                        Uniquely mapped reads % | 86.13%
Liver_DEPOWER_TPo_17_GATTCGGCTA-AGACCGTTAA_L006_Log.final.out:                        Uniquely mapped reads % | 86.30%
Liver_DEPOWER_TPo_21_TGGTGGCTAG-AAGGTCATCG_L006_Log.final.out:                        Uniquely mapped reads % | 87.53%
Liver_POWER_SPo_10_ATCCAATAGG-TCGTCATCTT_L006_Log.final.out:                        Uniquely mapped reads % |   82.40%
Liver_POWER_SPo_1_GCGATCCTTG-CTCAGACACC_L006_Log.final.out:                        Uniquely mapped reads % |    84.71%
Liver_POWER_SPo_6_TGTTCCACTT-AATGGCACGG_L006_Log.final.out:                        Uniquely mapped reads % |    83.50%
Liver_POWER_TPo_3_AGACCGTTAA-TGGCAATACA_L006_Log.final.out:                        Uniquely mapped reads % |    85.10%
Liver_POWER_TPo_4_ACTATTGACC-GCCGATGGTT_L006_Log.final.out:                        Uniquely mapped reads % |    82.24%
Liver_POWER_TPo_8_GCCTAATTCC-AGGTTGCACG_L006_Log.final.out:                        Uniquely mapped reads % |    83.67%
Liver_REPOWER_SPo_27_AGGCCAGGAT-ACTCCTGCCT_L006_Log.final.out:                        Uniquely mapped reads % | 84.21%
Liver_REPOWER_SPo_31_AACGCCTGTG-CGAGTCCGTT_L006_Log.final.out:                        Uniquely mapped reads % | 83.51%
Liver_REPOWER_SPo_36_CGTGTGAGTG-GCCGAACCAA_L006_Log.final.out:                        Uniquely mapped reads % | 81.85%
Liver_REPOWER_TPo_27_CGTATGTGAA-AGGTAGTGCG_L006_Log.final.out:                        Uniquely mapped reads % | 87.42%
Liver_REPOWER_TPo_29_TACGTCACAA-CGTGGTATGG_L006_Log.final.out:                        Uniquely mapped reads % | 85.76%
Liver_REPOWER_TPo_33_GGAAGATCCG-TACCGAATTC_L006_Log.final.out:                        Uniquely mapped reads % | 83.89%


@ChristopherBottomsOMRF
Copy link

Total outsider here (not part of STAR team), but if it were me, I would create a small version of several of the FASTQ files and retry the alignments several times to see if the trend holds.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants