Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using preseq 3.2.0 with BAM as input #74

Open
donaldcampbelljr opened this issue Oct 4, 2024 · 6 comments
Open

Using preseq 3.2.0 with BAM as input #74

donaldcampbelljr opened this issue Oct 4, 2024 · 6 comments

Comments

@donaldcampbelljr
Copy link

Hello,

One of our pipelines currently uses preseq 2.0 and uses BAM files as an input to preseq. I recently attempted to upgrade to 3.2.0 and, after following the instructions on the GITHUB readme as well as the ReadTheDocs site, I was unable to get the program to process .bam files. Could you please clarify the steps to building preseq ( I ensured that HTSlib was installed prior to building preseq per the docs) to process .bam files as well at the proper syntax with regards to the input file and whether or not the -B flag is required? The CLI does not show the -B option but it is referenced in the docs specifically for .bam files.

Thank you!

@andrewdavidsmith
Copy link
Contributor

@donaldcampbelljr When you run ./configure --enable-hts do you see something like this?

checking for hts_version in -lhts... yes

@donaldcampbelljr
Copy link
Author

Yes, just confirmed I see that during the instllation:
checking for hts_version in -lhts... yes

@andrewdavidsmith
Copy link
Contributor

Ok I'll look more closely at it. As of now the code I have downloaded still includes the BAM options. I assume my most recent code would correspond to the most recent release as I likely made that release.

@andrewdavidsmith
Copy link
Contributor

I think I see the problem. Should be a relatively easy fix. I'll have to find the time. If you don't hear back by end of week ping me.

@donaldcampbelljr
Copy link
Author

Excellent. Thank you for taking a look so quickly.

@andrewdavidsmith
Copy link
Contributor

@donaldcampbelljr This is unfortunately more complicated. As briefly as possible: we had been concerned about misinterpreting various internals of BAM/SAM file formats, which were not always uniform, and had made a choice a few years ago to gradually move towards only doing the mathematical stuff preseq was designed for. So some code was removed for BAM format that I would have to put back into the source. Considering it again, I'm a bit less opposed to it than I was a few years ago, but realistically the same kinds of issues might re-emerge: preseq is designed to do specific calculations, and should be independent of any mapping formats.

So the choice I have is either (1) to provide an external method with substantial documentation on how to use samtools with some unix command line tools to achieve the same effect (and without much cost to efficiency; and how I use preseq myself), or (2) re-introduce code to take care of the BAM format. I think the SAM format code might still be present, despite us intending to eventually remove it. Right now I don't know which way to go with this, but I'll try some tests to see how conveniently I can do either of (1) vs. (2). In any case, I'm obviously trying to take this seriously.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants