Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support degenerate / gap characters #12

Open
fedarko opened this issue Oct 2, 2023 · 0 comments
Open

Support degenerate / gap characters #12

fedarko opened this issue Oct 2, 2023 · 0 comments
Labels
enhancement New feature or request

Comments

@fedarko
Copy link
Owner

fedarko commented Oct 2, 2023

Currently, the presence of Ns in a sequence will make matrix construction fail with the following error: Input sequence contains character N; only DNA nucleotides (A, C, G, T) are currently allowed.

This is a very "safe" way of handling this situation, but it's a bit over-cautious. It would be better to just modify things so that these characters are allowed, but any k-mers containing them are just assumed to not have any matches anywhere.

Some workaround options, in the meantime:

  • Remove these characters from your sequence before creating a dot plot (if you keep track of where the "breaks" are, you can then label these on the dot plot to explain the situation)

    • The downside of this, ofc, is that this will create "spurious" k-mers that span the "break".
  • Split up your sequence into "islands" of non-degenerate/gap characters, and just analyze these independently. I guess you could also concatenate the resulting dot plot matrices together, too, although that would require some extra programming work.

  • Replace these characters with random (?) DNA nucleotides (as is done, for example, in section 2.7.1 of the BWA paper).

@fedarko fedarko added the enhancement New feature or request label Oct 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant