-
Notifications
You must be signed in to change notification settings - Fork 68
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add documentation for reference databases (SMR >= v4.3) #329
Comments
Hello @RAWWiberg , The main difference between the original databases distributed with SMR, and the new ones, is the updated SILVA and RFAM databases were used (e.g. SILVA 138). We took SILVA 138 SSURef NR99, SILVA 138 LSURef and latest RFAM and clustered them at different thresholds to render new SMR databases: fast: bac-16S 85%, 5S & 5.8S RFAM seeds, rest 90% All SMR databases have minimum 99.8% accuracy, therefore we normally suggest the fast or default versions. Best, |
Hi @ekopylova, |
Hi, I just tested the new database using version 4.3.6 However, I get vastly different results compared to the old database (all SILVA + RFAM). The sample (Arabidopsis RNA) I tested had 5% rRNA with the old version but with the new version 19% align. Should the values be that different? Also, with the new database I get exactly one value but before it was very useful to see the taxonomic domain the rRNA stems from. Is there an option to get this output as well or would I have to parse the blast output to get these values again? Cheers, |
Hello Nicole, |
Hello @ekopylova, Where can we find these new |
Hello, The latest databases are here. Best, |
Thank you for the link ! Best regards, |
Are there also separate taxonomy files for the new databases or I have to extract it from the fasta files? I'm finding some missing taxonomies, e.g. in the file smr_v4.3_sensitive_db_rfam_seeds.fasta. For example,
And in practice, the taxonomies for RFAM_14.1_RF00001_5S_rRNA are not reported in the fasta definition line. thanks |
|
I just want to +1 this. We have a few complex projects where we're QC'ing RNA data to check for rRNA background in mixed samples (metatrx-like), so even having a rough idea on the taxonomic breakdown would be great. |
We need to add documentation on the following:
|
Hi,
I'm using
sortmerna
to "clean" some RNA-seq data that. I'm wondering about the "new" databases, i.e. those startingsmr_*
. The documentation is forsortmerna
v 4.3.3 is not clear on how these were generated, they seem to contain both SILVA and RFAM sequences. Coul this be clarified?Best wishes,
Axel
The text was updated successfully, but these errors were encountered: