Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Empty outputs with lower coverage #15

Open
bhargava-morampalli opened this issue Apr 6, 2024 · 3 comments
Open

Empty outputs with lower coverage #15

bhargava-morampalli opened this issue Apr 6, 2024 · 3 comments

Comments

@bhargava-morampalli
Copy link

I have been testing yanocomp on multiple coverage levels for the same data and for lower coverages (less than 70x), the output is empty. Is there anything with how the tool works that causes this?
In general, as the coverage is going down (from 1000x), the output keeps getting truncated in some positions (some positions does have low coverage compared to overall coverage due to the way filtering was done) and completely empty at less than 70x coverage.

It would be helpful if anyone can explain why this is happening. Thank you.

@mparker2
Copy link
Member

mparker2 commented Apr 6, 2024

Hi @bhargava-morampalli,

Yanocomp is not really being actively developed any more, but I looked at the code to remind myself how it is working...

The minimum coverage is set dynamically depending on the window size used for modelling:

def set_default_depth(ctx, param, val):
if val is None:
win_size = ctx.params['window_size']
val = max(win_size * 2, 5) # at least as many reads per sample as features
logger.warn(f'Default min depth set to {val} to match '
f'window size {win_size}')
return val

So, when the default window of 3 adjacent kmers is used for modelling, the min depth (per replicate) is set to 6. This is to ensure that the number of samples used to fit the model is always greater than the number of features.

Coverage in an RNA sequencing dataset can vary across several orders of magnitude depending on the gene, so it is unclear to me how you can be sure that all positions have approximately 70x, unless you are just testing one gene?

@bhargava-morampalli
Copy link
Author

That's exactly right, I am testing it on one gene and the coverage is close to 70x after filtering it based on total bases but there are definitely areas where the coverage dips a lot (may be due to which reads were included in the filtered dataset)
Weirdly, there is not even an output below 70x - not sure why thats happening.

@mparker2
Copy link
Member

mparker2 commented Apr 8, 2024

That is weird, I'm not sure what is occurring in that case. Does it consistently happen with many different subsamples?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants