You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Can someone explain to me what the -r option does in gffread?
I have read that it:
"only shows transcripts overlapping coordinate range .. (on chromosome/contig , strand if provided)"
So I thought that an additional file was needed to be provided in order for this option to work. However when I run the command without providing anything other than my annotation and my reference genome, gffread filters down from ~37 000 genes to ~19 000 genes. But I don't understand how this works since I am not providing any information about what chromosome positions to filter out.
The reason I ask is because I have an annotation that seems to have a lot of pseudo genes or isoforms that I want to get rid of. I have tried the options -V, -J and --no-pseudo but they don't really do much difference. However, when i tried the -r option just as a random test, this made a huge difference to my list of transcripts but now I don't really understand why?
-r should take a chromosome:start-end genomic location specification, with an optional strand prefix or suffix, e.g. -r chr1:9000-12000 should only show transcripts overlapping that genomic range on chr1 (no matter the strand), but if you only want the transcripts on the reverse strand in that region, you can write -r -chr1:9000-12000 instead.
There is an example usage for that option at the bottom of this wiki page: https://github.com/gpertea/stringtie/wiki/Extracting-bundle-data-for-debugging
As for what you got there, I am afraid the way I handle the argument there is not that smart, when using -r without intentionally providing any genomic region specification, like you did, the code takes whatever token that follows there as the genomic region and attempts to parse it, but it fails silently if it could not make sense of it (it was -R in your example command line).
This is a very basic region filter and not very efficient, and as you can see it only takes one genomic region currently. You could use bedtools intersect or other specialized tools if you want, say, an intersection with a set of genomic regions specified in a BED file.
Hi,
Can someone explain to me what the -r option does in gffread?
I have read that it:
"only shows transcripts overlapping coordinate range .. (on chromosome/contig , strand if provided)"
So I thought that an additional file was needed to be provided in order for this option to work. However when I run the command without providing anything other than my annotation and my reference genome, gffread filters down from ~37 000 genes to ~19 000 genes. But I don't understand how this works since I am not providing any information about what chromosome positions to filter out.
The reason I ask is because I have an annotation that seems to have a lot of pseudo genes or isoforms that I want to get rid of. I have tried the options -V, -J and --no-pseudo but they don't really do much difference. However, when i tried the -r option just as a random test, this made a huge difference to my list of transcripts but now I don't really understand why?
This is the simple command that I ran:
gffread my_gff.gtf -r -R -F -g my_genome.fasta -y RF_prots.fa -w RF_exons.fa
Grateful for any support!
The text was updated successfully, but these errors were encountered: