-
Notifications
You must be signed in to change notification settings - Fork 89
Closed
Description
Hi!
Sorry for posting many issues in the past few days :) I had an error from salmon index:
[2024-10-29 13:22:09.436] [puff::index::jointLog] [info] Running fixFasta
[2024-10-29 13:22:09.444] [puff::index::jointLog] [error] In FixFasta, two references with the same name but different sequences: RefSeq. We require that all input records have a unique name up to the first whitespace (or user-provided separator) character.
The problem turns out to in genes.fa, as they have the same sequence names:
head /scratch/hhu/Xenopus_laevis_v10_1_lambda_spikein/annotation/genes.original.fa
>RefSeq
acaaactacagctcccagcaaccCTTTGCCACCTCGATAGCAAGAAATGTAACAGTTCTTTCAGTGCAACTGAACTCCAAGCTATTAAACTAG
>RefSeq
TTGAGCCACCCACATCATGGACTTTGCCCCTGAGGGCAGATCAGACCCGACAGAGGGCTTATGGGTTAAATAAATCACCTATTGCactaaa
..
I think the command in the genes_bed2fasta:
bedtools getfasta -name -s -split -fi /scratch/hhu/Xenopus_laevis_v10_1_lambda_spikein/genome_fasta/genome.fa -bed <(cat /scratch/hhu/Xenopus_laevis_v10_1_lambda_spikein/annotation/genes.bed | cut -f1-12) | sed 's/(.*)//g' | sed 's/:.*//g' > annotation/genes.fa 2> annotation/logs/bed2fasta.log
It cannot deal with genes of name like this:
Chr4L 15610 37088 RefSeq:XR_005966836.1 . - 15610 37088 255,0,0 4 144,83,50,796 0,446,2433,20683
Chr4L 40727 57680 RefSeq:XM_041589621.1 . + 40727 57680 255,0,0 9 283,98,133,116,111,101,102,139,278 0,1109,6513,9213,9409,11052,12648,16142,16676
Maybe this could be a potential problem for others using not so common GTF of other species..
Thanks a lot!
Hanrong
Metadata
Metadata
Assignees
Labels
No labels