Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DivNet on rRNA gene counts derived from metagenomes? #128

Open
mgabriell1 opened this issue Jul 26, 2022 · 7 comments
Open

DivNet on rRNA gene counts derived from metagenomes? #128

mgabriell1 opened this issue Jul 26, 2022 · 7 comments

Comments

@mgabriell1
Copy link

mgabriell1 commented Jul 26, 2022

Hi,
First of all, thanks for developing this tool!

I have a few metagenomic samples from which I've estimated the number of couple of reads mapping in proper pair to the SSU rRNA genes present in SILVA and I was thinking to use DivNet to potentially provide more support to my beta diversity analyses.

I definitely have lower counts per gene due to the untargetedness of shotgun sequencing and which I guess could result in a somewhat higher influence of the addition of the pseudocount.
The larger number of singletons which might be present in one sample but not in another, taking into account the uncertainty due to the sampling process, not be considered as not enough evidence for their difference. So I suspect that this would result in a very conservative analysis

Even given the potential conservative nature of this, would it be correct to use it also in my scenario?
Thank you again for your time!

Marco

@scubalaina
Copy link

Hi there,

I also have a similar question and just wanted to boost this!
I'm working with RNAP (B and B' subunit genes separately) which are single-copy markers, so I don't have to worry about copy-numbers skewing thigns, but I'm wondering how the diversity calculations are implemented and interpreted with metagenomic data in which the whole composition of the single-gene community only accounts for a very small portion of the reads/members of the community - in other words, their relative abundances will not sum to 1?

Thanks,
Alaina :)

@mooreryan
Copy link
Contributor

@scubalaina I have used DivNet in a similar way to you. When you are running it on the subcommunity (ie just the rna pol seqs) you are passing the data to DivNet as counts right? If so, it will go through its process treating that as samples/community in the right way.

@scubalaina
Copy link

scubalaina commented Jun 26, 2023 via email

@mooreryan
Copy link
Contributor

@scubalaina something to keep in mind about normalizations ...you will be changing the read counts which could have an affect on variance estimations. Check out this tiny example. It's a silly contrived example where each gene has the same gene length, but the counts are still normalized by the gene length (ie reducing the count equally for all sample/genes in this particular example, and so increasing the variance). Of course this is just a silly example, but the point is that normalizing could impact variance estimations. Though, in practice, I'm not sure how much of an issue it will be. Someone from the Willis lab will have to comment on that.

One other thing if you're doing some normalization, you could think of a gene in a sample that has a low count like 2, but it is a 4kb gene, so its "per kilobase" count would be 0.5. Depending on your choice of pseudocount (for example, 0.5 was chosen in the DivNet manuscript for the analysis) that could be around the sam as that normalized count. Another thing to keep in mind.

divnet_rpk_variance.R.txt

alpha_div

(Not relevant to this discussion, but I work in a viral ecology lab, so I know some of your papers! Just a cool coincidence 😄)

@scubalaina
Copy link

scubalaina commented Jun 26, 2023 via email

@mooreryan
Copy link
Contributor

I wonder how one could avoid compromising variance calculations without overestimating the abundance of longer genes if gene length isn't accounted for?

^ Yeah, that's a good question...as far as I know that is still an open research question. Someone from the Willis lab will have to weigh in here.

I attached the example below

^ I think you may have forgotten the attachment...I'm not seeing it.

(Yep in Wommack's lab...small world haha!)

@scubalaina
Copy link

Hi Ryan,

Sorry I was corresponding via email so the attachment probably didn't work through github. Here it is!
Screenshot 2023-06-26 at 5 52 35 PM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants