Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Solve" the insert size distribution in order to extrapolate the estimate. #174

Closed
rhpvorderman opened this issue Jun 10, 2024 · 3 comments

Comments

@rhpvorderman
Copy link
Owner

Currently the found insert sizes are displayed. These form a regular statistical distribution. It should be possible to infer a distribution since usually the peak of the distribution is visible. Since one half of the distribution is known, it should be theoretically possible to solve for the parameters and estimate the tail end of the distribution.

@rhpvorderman
Copy link
Owner Author

Ioannis suggested I try a few distributions and check with https://en.wikipedia.org/wiki/Kullback%E2%80%93Leibler_divergence

@rhpvorderman
Copy link
Owner Author

Apparently the protocol consists of several steps, with first selecting the desired median, then removing smaller and larger inserts in separate steps. This may affect the mapping to a distribution.

@rhpvorderman
Copy link
Owner Author

This is infeasible for large insert sizes. It is much better to check this after alignment instead. If the estimate cannot be accurate, it is better not to include it rather than have a potentially misleading graph.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant