Update handling of INFO/END when writing records in VCF#1201
Open
rickymagner wants to merge 2 commits intopysam-developers:masterfrom
Open
Update handling of INFO/END when writing records in VCF#1201rickymagner wants to merge 2 commits intopysam-developers:masterfrom
rickymagner wants to merge 2 commits intopysam-developers:masterfrom
Conversation
|
Hi Ricky, thanks for the explanation. Has there been any update on this? |
Author
|
Hi, I think it's up to the maintainers to decide if they want to change this functionality according to the suggestions made here, so I leave it up to them to respond. I haven't heard anything since initially posting this. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR is meant to resolve issue #1200. In particular, it changes the way the
writemethod forVariantFilehandles reasoning about when to writeINFO/ENDor not. Previously, the code attempted to check to write this only when there were symbolic alleles, but ended up only writing this for insertions when asked for explicitly.The code change now decides to exclude writing
INFO/ENDif it's not present in the header, but will write in all cases when included in the header. This should allow users to updateENDvalues usingrecord.stop, like in the following examples.Start with
example.vcf.gzas:Here are two blocks of Python code run on it with their respective outputs:
In this case, the output matches
example.vcf.gzdespite editing therecord.stoppositions, because theENDfield is not defined in the header. However, this code block:produces the following output:
So the user can control whether
ENDshould appear in the INFO fields by toggling whether it should be included in the header or not, and then access it viarecord.stopas usual. I think this makes more conceptual sense to check whether to print the field or not based on the header values. Since thesyncmethod uses the same formula for determining theENDcoordinate, it should be consistent with the existing paradigm in the other field setters.