-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DivNet on rRNA gene counts derived from metagenomes? #128
Comments
Hi there, I also have a similar question and just wanted to boost this! Thanks, |
@scubalaina I have used DivNet in a similar way to you. When you are running it on the subcommunity (ie just the rna pol seqs) you are passing the data to DivNet as counts right? If so, it will go through its process treating that as samples/community in the right way. |
Hi Ryan,
Ok great! I am using the reads per kilobase because each gene has a
different length, and I need to normalize for that, but that's great it has
worked for you with using a subcommunity of the data so it should work for
mine similarly.
Thanks,
Alaina :)
…On Thu, Jun 22, 2023 at 2:31 PM Ryan Moore ***@***.***> wrote:
@scubalaina <https://github.com/scubalaina> I have used DivNet in a
similar way to you. When you are running it on the subcommunity (ie just
the rna pol seqs) you are passing the data to DivNet as counts right? If
so, it will go through its process treating that as samples/community in
the right way.
—
Reply to this email directly, view it on GitHub
<#128 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AFRYVGLWHUGPQNXUFKBOGELXMSFPXANCNFSM54XJJAJA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
@scubalaina something to keep in mind about normalizations ...you will be changing the read counts which could have an affect on variance estimations. Check out this tiny example. It's a silly contrived example where each gene has the same gene length, but the counts are still normalized by the gene length (ie reducing the count equally for all sample/genes in this particular example, and so increasing the variance). Of course this is just a silly example, but the point is that normalizing could impact variance estimations. Though, in practice, I'm not sure how much of an issue it will be. Someone from the Willis lab will have to comment on that. One other thing if you're doing some normalization, you could think of a gene in a sample that has a low count like 2, but it is a 4kb gene, so its "per kilobase" count would be 0.5. Depending on your choice of pseudocount (for example, 0.5 was chosen in the DivNet manuscript for the analysis) that could be around the sam as that normalized count. Another thing to keep in mind. (Not relevant to this discussion, but I work in a viral ecology lab, so I know some of your papers! Just a cool coincidence 😄) |
Hi Ryan,
Ah I see! That makes sense! Thank you for taking the time to demonstrate. I
really appreciate your help in understanding this all. I clearly needed to
take more stats classes in grad school haha
I wonder how one could avoid compromising variance calculations without
overestimating the abundance of longer genes if gene length isn't accounted
for? I did notice when I ran divnet on my normalized read counts that
differences in Shannon's diversity were no longer significant - or at least
the divnet output had overlapping confidence intervals. I attached the
example below of divnet vs a Wilcox test of vegan's Shannon's diversity
calculation.
Should I be interpreting this as no difference between the diversity of
these groups?
Sorry to take up more of your time! I really, really appreciate the help!
Awesome you're in viral ecology! I think I saw you're in Eric Wommack's
lab? Super cool!
Thanks again for your time and help!
Alaina :)
…On Mon, Jun 26, 2023 at 12:56 PM Ryan Moore ***@***.***> wrote:
@scubalaina <https://github.com/scubalaina> something to keep in mind
about normalizations ...you will be changing the read counts which could
have an affect on variance estimations. Check out this tiny example. It's a
silly contrived example where each gene has the same gene length, but the
counts are still normalized by the gene length (ie reducing the count
equally for all sample/genes in this particular example, and so increasing
the variance). Of course this is just a silly example, but the point is
that normalizing could impact variance estimations. Though, in practice,
I'm not sure how much of an issue it will be. Someone from the Willis lab
will have to comment on that.
One other thing if you're doing some normalization, you could think of a
gene in a sample that has a low count like 2, but it is a 4kb gene, so its
"per kilobase" count would be 0.5. Depending on your choice of pseudocount
(for example, 0.5 was chosen in the DivNet manuscript for the analysis)
that could be around the sam as that normalized count. Another thing to
keep in mind.
divnet_rpk_variance.R.txt
<https://github.com/adw96/DivNet/files/11871614/divnet_rpk_variance.R.txt>
[image: alpha_div]
<https://user-images.githubusercontent.com/3172014/248871802-872a19b5-5489-4a7f-b19a-b05c684b083d.png>
(Not relevant to this discussion, but I work in a viral ecology lab, so I
know some of your papers! Just a cool coincidence 😄)
—
Reply to this email directly, view it on GitHub
<#128 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AFRYVGJBGA3IC23DM5R6FX3XNG5LTANCNFSM54XJJAJA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
^ Yeah, that's a good question...as far as I know that is still an open research question. Someone from the Willis lab will have to weigh in here.
^ I think you may have forgotten the attachment...I'm not seeing it. (Yep in Wommack's lab...small world haha!) |
Hi,
First of all, thanks for developing this tool!
I have a few metagenomic samples from which I've estimated the number of couple of reads mapping in proper pair to the SSU rRNA genes present in SILVA and I was thinking to use DivNet to potentially provide more support to my beta diversity analyses.
I definitely have lower counts per gene due to the untargetedness of shotgun sequencing and which I guess could result in a somewhat higher influence of the addition of the pseudocount.
The larger number of singletons which might be present in one sample but not in another, taking into account the uncertainty due to the sampling process, not be considered as not enough evidence for their difference. So I suspect that this would result in a very conservative analysis
Even given the potential conservative nature of this, would it be correct to use it also in my scenario?
Thank you again for your time!
Marco
The text was updated successfully, but these errors were encountered: