You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, the presence of Ns in a sequence will make matrix construction fail with the following error: Input sequence contains character N; only DNA nucleotides (A, C, G, T) are currently allowed.
This is a very "safe" way of handling this situation, but it's a bit over-cautious. It would be better to just modify things so that these characters are allowed, but any k-mers containing them are just assumed to not have any matches anywhere.
Some workaround options, in the meantime:
Remove these characters from your sequence before creating a dot plot (if you keep track of where the "breaks" are, you can then label these on the dot plot to explain the situation)
The downside of this, ofc, is that this will create "spurious" k-mers that span the "break".
Split up your sequence into "islands" of non-degenerate/gap characters, and just analyze these independently. I guess you could also concatenate the resulting dot plot matrices together, too, although that would require some extra programming work.
Currently, the presence of
N
s in a sequence will make matrix construction fail with the following error:Input sequence contains character N; only DNA nucleotides (A, C, G, T) are currently allowed.
This is a very "safe" way of handling this situation, but it's a bit over-cautious. It would be better to just modify things so that these characters are allowed, but any k-mers containing them are just assumed to not have any matches anywhere.
Some workaround options, in the meantime:
Remove these characters from your sequence before creating a dot plot (if you keep track of where the "breaks" are, you can then label these on the dot plot to explain the situation)
Split up your sequence into "islands" of non-degenerate/gap characters, and just analyze these independently. I guess you could also concatenate the resulting dot plot matrices together, too, although that would require some extra programming work.
Replace these characters with random (?) DNA nucleotides (as is done, for example, in section 2.7.1 of the BWA paper).
The text was updated successfully, but these errors were encountered: