Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Impact of score_threshold on results #76

Open
jdwinkler-lanzatech opened this issue Feb 27, 2022 · 5 comments
Open

Impact of score_threshold on results #76

jdwinkler-lanzatech opened this issue Feb 27, 2022 · 5 comments

Comments

@jdwinkler-lanzatech
Copy link

Hi,

Thanks for your work on DAS_Tool, I have found it very useful in the past! I do have one question about the impact of the score threshold on the output. I noticed that DAS Tool is (correctly) discarding a lot of low quality bins for some of my samples. I generally like to do post-hoc assembly quality filtering with CheckM, so I was wondering if setting the score_threshold = 0 means that DAS Tool effectively just dereplicates the input MAGs only + discards any MAG with a negative score?

Thanks for your help!

@cmks
Copy link
Owner

cmks commented Feb 28, 2022

Yes, that's correct. The output will be a de-replicated set of MAGs with score above 0 and MAGs with a score below 0 will not be reported. Following up with CheckM is a good idea.

@jdwinkler-lanzatech
Copy link
Author

Great, thanks. I'll follow up with some alternative binners first before resorting to this approach.

@jdwinkler-lanzatech
Copy link
Author

One additional question: DAS Tool is still removing bins that are presumably contaminated or megabins even with the score threshold of zero. I am guessing that behavior is probably what I want. How should I interpret the negative scores though?

@cmks
Copy link
Owner

cmks commented Mar 5, 2022

Basically, a negative score means a high fraction of redundant single copy genes (SCGs). The score is calculated based on the number of unique- and redundant SCGs per bin. In case of a high number of redundant SCGs, the number of unique SCGs may get outweighed and the score can turn negative. You can always check the SCG_completeness and SCG_redundancy columns of the .summary file showing the unique- and redundant SCG fractions.
In case you want DAS Tool to report also bins with high SCG redundancy or even dereplicate your full bin set, you can do so by choosing a very low, negative score threshold.

@jolespin
Copy link

I have the same question. My main goal is to dereplicate and not necessarily score the quality as I do that with CheckM post hoc. Would you recommend setting the score to 0 in this case?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants