-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SILVA reference #21
Comments
We could store the references in the FTP server. |
Hi @sjanssen2, Would it be possible in the near future to also create and make available in QIIME2 a pre-compiled SILVA v132 database? I note your comment here that making the database ready for use in q2-fragment-insertion takes around 2 weeks, which is my main reason for not attempting the steps outlined here by @smirarab. It's great that a pre-compiled SILVA v128 database comes packaged with this plugin in QIIME! I've simply already done some analysis with SILVA v132 and am on a tight schedule, so don't have the time to re-analyse with 128 - at the moment this unfortunately prevents me from using the fragment insertion method to build trees. Cheers, |
Hey there @rachaellappan --- we would love to get some help with this task - are you interested? If you don't have the bandwidth, maybe you could cross-post this request to the QIIME 2 Forum, that way more eyes see this? Thanks! |
Just adding to the discussion. For the GG release we did a lot of benchmarks and basically this is what was used in the fragment insertion paper. However, AFAIK, such benchmarks have not been done in SILVA so it will be great if someone actually did these benchmarks, in case @rachaellappan is interested. |
regarding benchmarks: there is already a lot of infrastructure in place, for example the wonderful repo https://github.com/caporaso-lab/tax-credit-data/ which I used a couple of month ago to add SEPP as another tool to assign taxonomy and of course all the notebooks I used for our paper https://msystems.asm.org/content/3/3/e00021-18 I think we should first provide the necessary changes for SEPP to deal with different references before we think too hard about benchmark results. |
I'll argue that having them at the same time would be great; as you can imagine, once it's out there, it's out there and in the case there is a bug or something wrong that wasn't caught cause there were no benchmarks, it can get ugly ... my 2 pesos! |
Hi @thermokarst, I will post to the QIIME2 forum. I would like to help out but I'm not very familiar with what is being done here and whether these steps are all that's required. If I understand correctly, I agree that benchmarking SILVA (to demonstrate/confirm the improvement that fragment insertion offers over de novo trees in the case of SILVA?) would be ideal to do around the same time as providing v132 for SEPP. The SILVA aligned rep set doesn't specify whether it's 16S or 18S - does it contain both? - so the results may be different to GG. I'm probably not the person to do this - no experience with benchmarking =) |
The file used for SILVA package is described here:
https://github.com/smirarab/sepp-refs/blob/master/silva/README.md
It was called
SILVA_128_QIIME_release/rep_set_aligned/99/99_otus_aligned.fasta.gz
Does anyone know if that file did or did not include 18S?
…On Tue, Jan 15, 2019 at 5:18 PM Rachael Lappan ***@***.***> wrote:
Hi @thermokarst <https://github.com/thermokarst>, I will post to the
QIIME2 forum. I would like to help out but I'm not very familiar with what
is being done here
<https://github.com/smirarab/sepp-refs/tree/master/silva> and whether
these steps are all that's required.
If I understand correctly, I agree that benchmarking SILVA (to
demonstrate/confirm the improvement that fragment insertion offers over *de
novo* trees in the case of SILVA?) would be ideal to do around the same
time as providing v132 for SEPP. The SILVA aligned rep set doesn't specify
whether it's 16S or 18S - does it contain both? - so the results may be
different to GG.
I'm probably not the person to do this - no experience with benchmarking =)
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#21 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAybuFARRxUMCxOBEQsFkkErQ911HdoBks5vDn3rgaJpZM4Qi1J8>
.
--
Siavash Mirarab
|
In case this hasn't been done yet, I would be glad to pitch in. But I would need the scripts required to process the QIIME formatted SILVA file (SILVA_132_QIIME_release/rep_set_aligned/99/99_alignment.fna) |
Can anyone confirm if these modified steps would be right (taken from https://github.com/smirarab/sepp-refs/tree/master/silva)? 99_alignment.fna has 425098 sequences |
Is this issue still alive? |
Hi Aditya, |
Hi Stefan Sure. I was wondering if I can get started on this at my end since its a heavy compute. All I would need is if someone can confirm the steps that need to be run. Ofcourse, I will share the files for review once done and perhaps that would be mid-March already |
All I know about Silva is what Siavash did to convert / prepare the data vor Silva 12.8: https://github.com/smirarab/sepp-refs/tree/master/silva Maybe you can induce if you are dealing with the correct files? |
Yes, Stefan, I went through what Siavash had done and am sure I have the correct files with me. I wasn't entirely clear though how the masksites parameter was chosen for the first step. That's where I need some advise as the total number of sequences is different for v132 Perhaps @smirarab can pitch in? |
ups, now I see that you already pointed to this link. Sorry for not paying enough attention :-/ |
Any updates on this, we are well past mid march? |
Hi Aditya, fair point. Sorry for the delay. I started working on SEPP itself to add the ability to easily change reference in an convenient way for QIIME2 users. This procedure should include a) adding SEPP to a CI system (Travis) b) update code style c) add ability to pass info files to sepp binaries d) package SEPP as a bioconda recipe. I am happy to receive some code reviews smirarab/sepp#41 and thus increase visibility and quality. I just downloaded the 3 GB of Silva's QIIME compatible version 13.2 https://www.arb-silva.de/fileadmin/silva_databases/qiime/Silva_132_release.zip I am pretty confident that the alignment file is I figure you already know the right computational steps to perform, but I am not totally sure if the numeric parameters will also work for the slightly larger 13.2 release. Guess we will learn that the hard way :-/ |
Aditya,
Sorry for the long silence on this.
The steps you mentioned are mostly correct. However, in the end, you need
to root the tree at the LCA of Archea.
Hope this helps.
Regards
Siavash
…On Tue, Mar 5, 2019 at 11:10 AM Aditya Bandla ***@***.***> wrote:
Yes, Stefan, I went through what Siavash had done and am sure I have the
correct files with me. I wasn't entirely clear though how the masksites
parameter was chosen for the first step. That's where I need some advise as
the total number of sequences is different for v132
Perhaps @smirarab <https://github.com/smirarab> can pitch in?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#21 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAybuJPW0rEM_Xbka7U5Jo46o_xMOLVNks5vTsEVgaJpZM4Qi1J8>
.
--
Siavash Mirarab
|
I am trying to create a bioconda recipe for Siavash's SEPP program (without the heavy sized reference files) to support - in the long run - different references like Silva or others. |
Is this something being still considered? |
The bioconda package has been created: https://anaconda.org/bioconda/sepp (without reference files), but is not yet integrated into Qiime2. |
Stefan, thats great to hear. Are the updated reference files for SILVA available as well? |
Hi @adityabandla, files for Silva 12.8 (phylogeny, alignment and info) are shipped with the default Qiime2 install and should be located in Did you succeed in creating a reference for Silva 13.2? If so, would you be willing to share those files with me / the Qiime community? My PR #32 contains necessary updates for the qiime2 wrapper to cope with the new parameter for the info file, but it is still not merged into master. Thus, to use other references than Greengenes 13.8 you either have to overwrite the info file each time or use the run-sepp.sh script directly. Best, |
Hi Stefan Sorry, I never managed to get to it. I just started and I ran into this error with the very first step
|
Hi @adityabandla I would need much more information about what you are trying to execute to be able to help debugging. |
I am trying to run the following command when I get that error Please let me know if you need additional details |
Aditya, is there a place where I can access the 99_alignment.fna file? I
can try to have a look.
…On Mon, Jun 24, 2019 at 9:24 PM Aditya Bandla ***@***.***> wrote:
I am trying to run the following command when I get that error
run_seqtools.py -masksites 2125 -infile 99_alignment.fna -outfile
99_alignment_masked.fna
Please let me know if you need additional details
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#21?email_source=notifications&email_token=AAGJXOD46WMM3QF3AVTBPFTP4GMWFA5CNFSM4EELKJ6KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODYO6R2Q#issuecomment-505276650>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAGJXOFEBYBH3TJIXUCFTWLP4GMWFANCNFSM4EELKJ6A>
.
--
Siavash Mirarab
|
@smirarab Siavash, its the file I downloaded from the SILVA website, https://www.arb-silva.de/fileadmin/silva_databases/qiime/Silva_132_release.zip, the particular file being SILVA_132_QIIME_release/rep_set_aligned/99/99_alignment.fna.zip |
@adityabandla @smirarab is there any progress on using silva 132 ? |
I am starting to work on this. Does anyone know if unaligned sits
(alignment sites with a dot) should be removed?
…On Tue, Nov 5, 2019 at 8:02 AM Ryszard Kubinski ***@***.***> wrote:
@adityabandla <https://github.com/adityabandla> @smirarab
<https://github.com/smirarab> is there any progress on using silva 132 ?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#21?email_source=notifications&email_token=AAGJXOGQQ3OVUKMBMOX5D5LQSGKJLA5CNFSM4EELKJ6KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEDDGO6Y#issuecomment-549873531>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAGJXOAXUKZU4GEQJEA4TALQSGKJLANCNFSM4EELKJ6A>
.
--
Siavash Mirarab
|
I have been working on this and now have the trees. I am having trouble
with rooting the tree. There are several problematic taxa, mentioned below.
Is anyone more familiar with SILVA able to advise what's best to do here?
Should we just remove these? Are they simply missclassified? Or perhaps I
am using the wrong taxonomy file
(SILVA_132_QIIME_release/taxonomy/taxonomy_all/99/raw_taxonomy.txt)?
- AF328210.1.1013 is placed at the root of a clade of Archaea+Eukaryotes
- The following are labelled Bacteria but are found in Archaea+Eukaryotes
ADDN02000002.6651626.6653415
HG975450.19692355.19694028
HG975518.15072943.15074233
HG975523.15290773.15292080
HG975523.15346888.15348277
HG975523.17807403.17809186
HG975523.42881560.42883326
HG975523.42909168.42910966
HG975523.42990106.42991886
MJEQ01037184.11757410.11759206
MJEQ01037184.50691366.50692824
MJEQ01037184.50711173.50712950
MJEQ01037189.76487988.76489751
MJEQ01037194.44875903.44877314
MKYQ01000643.5265419.5267207
MKYQ01000643.5327953.5329707
MKYQ01000643.5348241.5350028
MKYQ01000643.5377951.5379539
MKYQ01000643.5430586.5432383
MKYQ01000643.5449793.5451502
MKYQ01000643.5519287.5521084
MKYQ01000643.5564974.5566721
MKYQ01000643.5616492.5618272
MTTA01000002.173539804.173541537
MTTA01000002.24695889.24697451
…On Mon, Nov 18, 2019 at 8:35 AM siavash mirarab ***@***.***> wrote:
I am starting to work on this. Does anyone know if unaligned sits
(alignment sites with a dot) should be removed?
On Tue, Nov 5, 2019 at 8:02 AM Ryszard Kubinski ***@***.***>
wrote:
> @adityabandla <https://github.com/adityabandla> @smirarab
> <https://github.com/smirarab> is there any progress on using silva 132 ?
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> <#21?email_source=notifications&email_token=AAGJXOGQQ3OVUKMBMOX5D5LQSGKJLA5CNFSM4EELKJ6KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEDDGO6Y#issuecomment-549873531>,
> or unsubscribe
> <https://github.com/notifications/unsubscribe-auth/AAGJXOAXUKZU4GEQJEA4TALQSGKJLANCNFSM4EELKJ6A>
> .
>
--
Siavash Mirarab
--
Siavash Mirarab
|
@smirarab Your question is also related to mine: smirarab/sepp-refs#2. |
In answered your questions there. The issue here has to do with the tree
topology.
|
Any updates on this issue? Thanks! |
I have the trees needed, but I have issues with rooting it, as mentioned
above. I remain hopeful that someone with more familiarity with SILVA can
tell me how the rooting issue should be dealt with.
…On Tue, Jun 30, 2020 at 8:16 AM ETaSky ***@***.***> wrote:
Any updates on this issue? Thanks!
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#21 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAGJXOHVG6D3YYFTVMKRMPLRZH6V7ANCNFSM4EELKJ6A>
.
--
Siavash Mirarab
|
@smirarab the first sequence seems to be anomalous on the first view, so it might be good to exclude it. For the other sequences, I checked some of the accession numbers and they are from genome or WGS sequence set entries. Those entries, sometimes contain contaminations from different domains. I am pretty sure that this is the case here. I think we should discuss how the sequences that are included in the tree are selected and if that can be optimised to leave this problematic sequences out. By the way, the current SILVA release is 138.1. I am not familiar with QIIME, the fragment placing plugin or SEPP. I think the easiest approach would be that you send an email to our support email address (contact(at)arb-silva.de) giving us a short summary what data is need and how it is compiled and which issues you have (maybe there are more than just the routing of the trees?). With that information we then will try to help you solving the issues you are facing. We would also like to host the reference files on the SILVA website and see if we can find a way to automatically generate them with new SILVA releases, if possible. All the best |
Hi Jan,
I will initiate an email.
Thanks
Siavash
…On Wed, Nov 25, 2020 at 12:59 PM Jan ***@***.***> wrote:
@smirarab <https://github.com/smirarab> the first sequence seems to be
anomalous on the first view, so it might be good to exclude it. For the
other sequences, I checked some of the accession numbers and they are from
genome or WGS sequence set entries. Those entries, sometimes contain
contaminations from different domains. I am pretty sure that this is the
case here. I think we should discuss how the sequences that are included in
the tree are selected and if that can be optimised to leave this
problematic sequences out. By the way, the current SILVA release is 138.1.
I am not familiar with QIIME, the fragment placing plugin or SEPP. I think
the easiest approach would be that you send an email to our support email
address (contact(at)arb-silva.de) giving us a short summary what data is
need and how it is compiled and which issues you have (maybe there are more
than just the routing of the trees?). With that information we then will
try to help you solving the issues you are facing. We would also like to
host the reference files on the SILVA website and see if we can find a way
to automatically generate them with new SILVA releases, if possible.
All the best
Jan from the SILVA team
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#21 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAGJXOAG2E5AH5DBMF24IVTSRVVZ3ANCNFSM4EELKJ6A>
.
--
Siavash Mirarab
|
Any update on a SLIVA reference database formatted for SEPP through qiime2? |
not that I am aware of, unfortunately |
Improvement Description
It should be possible to download the QIIME compatible version of Silva and construct reference phylogeny and alignment for SEPP to enable 18S analyses.
Questions
@josenavas @wasade do you know if release 128 is the latest?
How and where would we host SEPP compatible references? Within this Plugin (which is already 130 MB large), on the github repo?
The text was updated successfully, but these errors were encountered: