You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have a bulk RNA-Seq data set of 2x150 PE for 36 mouse samples, half from liver tissue and half from muscle (gastrocnemius). FastQC shows the muscle samples with only a slight increase in % Duplicates and % GC content over the liver samples. Both aligned to the same GRCm39/Gencode M33 genome using STAR/2.7.10a in a custom nextflow script using the following parameters (we add the gtf in on the fly rather than in the index):
The liver samples mapped at a normal rate of ~150 million of reads per hour but the muscle samples are only mapping around 2 M rph, and with ~80 million reads per sample, they are taking ~2 days each! All samples including muscle had 80-90% uniquely mapping reads. What could be causing the huge slow down in mapping the muscle reads? I've googled a bit and found #996 but neither the number of contigs or low mappability seem to apply to the muscle samples. I've attached a full Log.final.out for one muscle and one liver sample, and also pulled out mapping speed and % uniquely mapped reads for all. Thanks for any advice (other than wait for the last few muscle samples to finish)!
$ cat Gastroc_DEPOWER_SPo_16_CAACAATTCG-GCGACCGATT_L006_Log.final.out
Started job on | Nov 25 05:17:43
Started mapping on | Nov 25 05:21:29
Finished on | Nov 26 22:57:38
Mapping speed, Million of reads per hour | 2.31
Number of input reads | 96288036
Average input read length | 281
UNIQUE READS:
Uniquely mapped reads number | 86806641
Uniquely mapped reads % | 90.15%
Average mapped length | 281.25
Number of splices: Total | 101457314
Number of splices: Annotated (sjdb) | 100750020
Number of splices: GT/AG | 100754298
Number of splices: GC/AG | 542168
Number of splices: AT/AC | 52426
Number of splices: Non-canonical | 108422
Mismatch rate per base, % | 0.24%
Deletion rate per base | 0.01%
Deletion average length | 3.50
Insertion rate per base | 0.01%
Insertion average length | 1.55
MULTI-MAPPING READS:
Number of reads mapped to multiple loci | 6886393
% of reads mapped to multiple loci | 7.15%
Number of reads mapped to too many loci | 355315
% of reads mapped to too many loci | 0.37%
UNMAPPED READS:
Number of reads unmapped: too many mismatches | 0
% of reads unmapped: too many mismatches | 0.00%
Number of reads unmapped: too short | 2196405
% of reads unmapped: too short | 2.28%
Number of reads unmapped: other | 43282
% of reads unmapped: other | 0.04%
CHIMERIC READS:
Number of chimeric reads | 0
% of chimeric reads | 0.00%
$ cat Liver_DEPOWER_SPo_16_GTAGGTACAA-CACTCAAGAA_L006_Log.final.out
Started job on | Nov 23 03:19:24
Started mapping on | Nov 23 03:23:09
Finished on | Nov 23 04:07:09
Mapping speed, Million of reads per hour | 115.39
Number of input reads | 84622906
Average input read length | 281
UNIQUE READS:
Uniquely mapped reads number | 69449931
Uniquely mapped reads % | 82.07%
Average mapped length | 280.69
Number of splices: Total | 69362082
Number of splices: Annotated (sjdb) | 68898721
Number of splices: GT/AG | 68788068
Number of splices: GC/AG | 470609
Number of splices: AT/AC | 28236
Number of splices: Non-canonical | 75169
Mismatch rate per base, % | 0.24%
Deletion rate per base | 0.01%
Deletion average length | 3.25
Insertion rate per base | 0.01%
Insertion average length | 1.94
MULTI-MAPPING READS:
Number of reads mapped to multiple loci | 5974139
% of reads mapped to multiple loci | 7.06%
Number of reads mapped to too many loci | 286417
% of reads mapped to too many loci | 0.34%
UNMAPPED READS:
Number of reads unmapped: too many mismatches | 0
% of reads unmapped: too many mismatches | 0.00%
Number of reads unmapped: too short | 2416728
% of reads unmapped: too short | 2.86%
Number of reads unmapped: other | 6495691
% of reads unmapped: other | 7.68%
CHIMERIC READS:
Number of chimeric reads | 0
% of chimeric reads | 0.00%
$ grep "speed" *Log.final.out
Gastroc_DEPOWER_SPo_16_CAACAATTCG-GCGACCGATT_L006_Log.final.out: Mapping speed, Million of reads per hour | 2.31
Gastroc_DEPOWER_SPo_25_TAGATCCGAA-ACTTCGATAG_L006_Log.final.out: Mapping speed, Million of reads per hour | 1.67
Gastroc_DEPOWER_TPo_14_TCTTAACTGG-TTCAGGCCGA_L006_Log.final.out: Mapping speed, Million of reads per hour | 1.76
Gastroc_DEPOWER_TPo_17_GTCACATCCG-AGAACAGTGA_L006_Log.final.out: Mapping speed, Million of reads per hour | 1.99
Gastroc_DEPOWER_TPo_21_TGAAGCATCT-GTTAGATACC_L006_Log.final.out: Mapping speed, Million of reads per hour | 2.27
Gastroc_POWER_SPo_10_TGCGTGCGAA-ATCGGTATGA_L006_Log.final.out: Mapping speed, Million of reads per hour | 2.14
Gastroc_POWER_SPo_1_TTGACCTAGC-GGAACACCAA_L006_Log.final.out: Mapping speed, Million of reads per hour | 2.57
Gastroc_POWER_SPo_6_GCTTCAATCA-TTCCACCTGG_L006_Log.final.out: Mapping speed, Million of reads per hour | 1.96
Gastroc_POWER_TPo_3_AATGGTACCT-ACAACCGTCG_L006_Log.final.out: Mapping speed, Million of reads per hour | 1.85
Gastroc_POWER_TPo_8_GTAACATTGG-TCGTGGCCAA_L006_Log.final.out: Mapping speed, Million of reads per hour | 1.71
Gastroc_REPOWER_SPo_27_CGGACTACTT-CCTTAATACG_L006_Log.final.out: Mapping speed, Million of reads per hour | 2.19
Gastroc_REPOWER_SPo_31_AACGGAGTCC-GACACTCCTA_L006_Log.final.out: Mapping speed, Million of reads per hour | 2.23
Gastroc_REPOWER_SPo_36_AGGTGTGACC-GAGTTCGGAG_L006_Log.final.out: Mapping speed, Million of reads per hour | 2.05
Gastroc_REPOWER_TPo_27_CCAGAGTTCC-CGAGAGACAT_L006_Log.final.out: Mapping speed, Million of reads per hour | 2.15
Gastroc_REPOWER_TPo_33_GACTGACATA-ATGCGGCCAA_L006_Log.final.out: Mapping speed, Million of reads per hour | 2.75
Liver_DEPOWER_SPo_16_GTAGGTACAA-CACTCAAGAA_L006_Log.final.out: Mapping speed, Million of reads per hour | 115.39
Liver_DEPOWER_SPo_24_TGCGACTTCG-TTGTTGCAGA_L006_Log.final.out: Mapping speed, Million of reads per hour | 157.89
Liver_DEPOWER_SPo_25_TTGTCACGTT-TGTGTAGCGG_L006_Log.final.out: Mapping speed, Million of reads per hour | 149.13
Liver_DEPOWER_TPo_14_CAACGACTGA-CCTGACAGAG_L006_Log.final.out: Mapping speed, Million of reads per hour | 167.74
Liver_DEPOWER_TPo_17_GATTCGGCTA-AGACCGTTAA_L006_Log.final.out: Mapping speed, Million of reads per hour | 148.09
Liver_DEPOWER_TPo_21_TGGTGGCTAG-AAGGTCATCG_L006_Log.final.out: Mapping speed, Million of reads per hour | 147.29
Liver_POWER_SPo_10_ATCCAATAGG-TCGTCATCTT_L006_Log.final.out: Mapping speed, Million of reads per hour | 153.41
Liver_POWER_SPo_1_GCGATCCTTG-CTCAGACACC_L006_Log.final.out: Mapping speed, Million of reads per hour | 143.57
Liver_POWER_SPo_6_TGTTCCACTT-AATGGCACGG_L006_Log.final.out: Mapping speed, Million of reads per hour | 131.55
Liver_POWER_TPo_3_AGACCGTTAA-TGGCAATACA_L006_Log.final.out: Mapping speed, Million of reads per hour | 140.22
Liver_POWER_TPo_4_ACTATTGACC-GCCGATGGTT_L006_Log.final.out: Mapping speed, Million of reads per hour | 126.06
Liver_POWER_TPo_8_GCCTAATTCC-AGGTTGCACG_L006_Log.final.out: Mapping speed, Million of reads per hour | 132.88
Liver_REPOWER_SPo_27_AGGCCAGGAT-ACTCCTGCCT_L006_Log.final.out: Mapping speed, Million of reads per hour | 134.40
Liver_REPOWER_SPo_31_AACGCCTGTG-CGAGTCCGTT_L006_Log.final.out: Mapping speed, Million of reads per hour | 138.02
Liver_REPOWER_SPo_36_CGTGTGAGTG-GCCGAACCAA_L006_Log.final.out: Mapping speed, Million of reads per hour | 122.46
Liver_REPOWER_TPo_27_CGTATGTGAA-AGGTAGTGCG_L006_Log.final.out: Mapping speed, Million of reads per hour | 147.61
Liver_REPOWER_TPo_29_TACGTCACAA-CGTGGTATGG_L006_Log.final.out: Mapping speed, Million of reads per hour | 151.13
Liver_REPOWER_TPo_33_GGAAGATCCG-TACCGAATTC_L006_Log.final.out: Mapping speed, Million of reads per hour | 137.76
$ grep "Uniquely mapped reads %" *Log.final.out
Gastroc_DEPOWER_SPo_16_CAACAATTCG-GCGACCGATT_L006_Log.final.out: Uniquely mapped reads % | 90.15%
Gastroc_DEPOWER_SPo_25_TAGATCCGAA-ACTTCGATAG_L006_Log.final.out: Uniquely mapped reads % | 91.53%
Gastroc_DEPOWER_TPo_14_TCTTAACTGG-TTCAGGCCGA_L006_Log.final.out: Uniquely mapped reads % | 90.46%
Gastroc_DEPOWER_TPo_17_GTCACATCCG-AGAACAGTGA_L006_Log.final.out: Uniquely mapped reads % | 87.99%
Gastroc_DEPOWER_TPo_21_TGAAGCATCT-GTTAGATACC_L006_Log.final.out: Uniquely mapped reads % | 90.25%
Gastroc_POWER_SPo_10_TGCGTGCGAA-ATCGGTATGA_L006_Log.final.out: Uniquely mapped reads % | 89.67%
Gastroc_POWER_SPo_1_TTGACCTAGC-GGAACACCAA_L006_Log.final.out: Uniquely mapped reads % | 91.37%
Gastroc_POWER_SPo_6_GCTTCAATCA-TTCCACCTGG_L006_Log.final.out: Uniquely mapped reads % | 89.79%
Gastroc_POWER_TPo_3_AATGGTACCT-ACAACCGTCG_L006_Log.final.out: Uniquely mapped reads % | 88.38%
Gastroc_POWER_TPo_8_GTAACATTGG-TCGTGGCCAA_L006_Log.final.out: Uniquely mapped reads % | 87.24%
Gastroc_REPOWER_SPo_27_CGGACTACTT-CCTTAATACG_L006_Log.final.out: Uniquely mapped reads % | 89.29%
Gastroc_REPOWER_SPo_31_AACGGAGTCC-GACACTCCTA_L006_Log.final.out: Uniquely mapped reads % | 88.53%
Gastroc_REPOWER_SPo_36_AGGTGTGACC-GAGTTCGGAG_L006_Log.final.out: Uniquely mapped reads % | 90.11%
Gastroc_REPOWER_TPo_27_CCAGAGTTCC-CGAGAGACAT_L006_Log.final.out: Uniquely mapped reads % | 86.86%
Gastroc_REPOWER_TPo_33_GACTGACATA-ATGCGGCCAA_L006_Log.final.out: Uniquely mapped reads % | 89.51%
Liver_DEPOWER_SPo_16_GTAGGTACAA-CACTCAAGAA_L006_Log.final.out: Uniquely mapped reads % | 82.07%
Liver_DEPOWER_SPo_24_TGCGACTTCG-TTGTTGCAGA_L006_Log.final.out: Uniquely mapped reads % | 87.17%
Liver_DEPOWER_SPo_25_TTGTCACGTT-TGTGTAGCGG_L006_Log.final.out: Uniquely mapped reads % | 87.47%
Liver_DEPOWER_TPo_14_CAACGACTGA-CCTGACAGAG_L006_Log.final.out: Uniquely mapped reads % | 86.13%
Liver_DEPOWER_TPo_17_GATTCGGCTA-AGACCGTTAA_L006_Log.final.out: Uniquely mapped reads % | 86.30%
Liver_DEPOWER_TPo_21_TGGTGGCTAG-AAGGTCATCG_L006_Log.final.out: Uniquely mapped reads % | 87.53%
Liver_POWER_SPo_10_ATCCAATAGG-TCGTCATCTT_L006_Log.final.out: Uniquely mapped reads % | 82.40%
Liver_POWER_SPo_1_GCGATCCTTG-CTCAGACACC_L006_Log.final.out: Uniquely mapped reads % | 84.71%
Liver_POWER_SPo_6_TGTTCCACTT-AATGGCACGG_L006_Log.final.out: Uniquely mapped reads % | 83.50%
Liver_POWER_TPo_3_AGACCGTTAA-TGGCAATACA_L006_Log.final.out: Uniquely mapped reads % | 85.10%
Liver_POWER_TPo_4_ACTATTGACC-GCCGATGGTT_L006_Log.final.out: Uniquely mapped reads % | 82.24%
Liver_POWER_TPo_8_GCCTAATTCC-AGGTTGCACG_L006_Log.final.out: Uniquely mapped reads % | 83.67%
Liver_REPOWER_SPo_27_AGGCCAGGAT-ACTCCTGCCT_L006_Log.final.out: Uniquely mapped reads % | 84.21%
Liver_REPOWER_SPo_31_AACGCCTGTG-CGAGTCCGTT_L006_Log.final.out: Uniquely mapped reads % | 83.51%
Liver_REPOWER_SPo_36_CGTGTGAGTG-GCCGAACCAA_L006_Log.final.out: Uniquely mapped reads % | 81.85%
Liver_REPOWER_TPo_27_CGTATGTGAA-AGGTAGTGCG_L006_Log.final.out: Uniquely mapped reads % | 87.42%
Liver_REPOWER_TPo_29_TACGTCACAA-CGTGGTATGG_L006_Log.final.out: Uniquely mapped reads % | 85.76%
Liver_REPOWER_TPo_33_GGAAGATCCG-TACCGAATTC_L006_Log.final.out: Uniquely mapped reads % | 83.89%
The text was updated successfully, but these errors were encountered:
Total outsider here (not part of STAR team), but if it were me, I would create a small version of several of the FASTQ files and retry the alignments several times to see if the trend holds.
I have a bulk RNA-Seq data set of 2x150 PE for 36 mouse samples, half from liver tissue and half from muscle (gastrocnemius). FastQC shows the muscle samples with only a slight increase in % Duplicates and % GC content over the liver samples. Both aligned to the same GRCm39/Gencode M33 genome using STAR/2.7.10a in a custom nextflow script using the following parameters (we add the gtf in on the fly rather than in the index):
The liver samples mapped at a normal rate of ~150 million of reads per hour but the muscle samples are only mapping around 2 M rph, and with ~80 million reads per sample, they are taking ~2 days each! All samples including muscle had 80-90% uniquely mapping reads. What could be causing the huge slow down in mapping the muscle reads? I've googled a bit and found #996 but neither the number of contigs or low mappability seem to apply to the muscle samples. I've attached a full Log.final.out for one muscle and one liver sample, and also pulled out mapping speed and % uniquely mapped reads for all. Thanks for any advice (other than wait for the last few muscle samples to finish)!
The text was updated successfully, but these errors were encountered: