Empty outputs with lower coverage #15

bhargava-morampalli · 2024-04-06T10:55:17Z

I have been testing yanocomp on multiple coverage levels for the same data and for lower coverages (less than 70x), the output is empty. Is there anything with how the tool works that causes this?
In general, as the coverage is going down (from 1000x), the output keeps getting truncated in some positions (some positions does have low coverage compared to overall coverage due to the way filtering was done) and completely empty at less than 70x coverage.

It would be helpful if anyone can explain why this is happening. Thank you.

mparker2 · 2024-04-06T11:36:20Z

Hi @bhargava-morampalli,

Yanocomp is not really being actively developed any more, but I looked at the code to remind myself how it is working...

The minimum coverage is set dynamically depending on the window size used for modelling:

yanocomp/yanocomp/gmmtest.py

Lines 245 to 251 in afda4b5

    
           def set_default_depth(ctx, param, val): 
        
               if val is None: 
        
                   win_size = ctx.params['window_size'] 
        
                   val = max(win_size * 2, 5) # at least as many reads per sample as features 
        
                   logger.warn(f'Default min depth set to {val} to match ' 
        
                               f'window size {win_size}') 
        
               return val

So, when the default window of 3 adjacent kmers is used for modelling, the min depth (per replicate) is set to 6. This is to ensure that the number of samples used to fit the model is always greater than the number of features.

Coverage in an RNA sequencing dataset can vary across several orders of magnitude depending on the gene, so it is unclear to me how you can be sure that all positions have approximately 70x, unless you are just testing one gene?

bhargava-morampalli · 2024-04-06T11:58:05Z

That's exactly right, I am testing it on one gene and the coverage is close to 70x after filtering it based on total bases but there are definitely areas where the coverage dips a lot (may be due to which reads were included in the filtered dataset)
Weirdly, there is not even an output below 70x - not sure why thats happening.

mparker2 · 2024-04-08T09:55:22Z

That is weird, I'm not sure what is occurring in that case. Does it consistently happen with many different subsamples?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Empty outputs with lower coverage #15

Empty outputs with lower coverage #15

bhargava-morampalli commented Apr 6, 2024

mparker2 commented Apr 6, 2024

bhargava-morampalli commented Apr 6, 2024

mparker2 commented Apr 8, 2024

Empty outputs with lower coverage #15

Empty outputs with lower coverage #15

Comments

bhargava-morampalli commented Apr 6, 2024

mparker2 commented Apr 6, 2024

bhargava-morampalli commented Apr 6, 2024

mparker2 commented Apr 8, 2024