Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vcf: Invalid unstructured header line in VCF 4.3 example complexfile_passed_000.vcf #772

Open
zaeleus opened this issue May 31, 2024 · 2 comments · May be fixed by #773
Open

vcf: Invalid unstructured header line in VCF 4.3 example complexfile_passed_000.vcf #772

zaeleus opened this issue May 31, 2024 · 2 comments · May be fixed by #773
Assignees
Labels

Comments

@zaeleus
Copy link

zaeleus commented May 31, 2024

test/vcf/4.3/passed/complexfile_passed_000.vcf has an invalid unstructured header line:

##AnalysisTitleBrackets=<"FINRISK: Whole-exome sequencing of Dietary, life style, and genetic determinants of obesity and metabolic syndrome (DILGOM)">

As per The Variant Call Format Specification: VCFv4.3 and BCFv2.2 § 1.4 "Meta-information lines" (2022-11-27):

An unstructured meta-information line consists of [...] a value (which may not be empty and must not start with a ‘<’ character)...

@jmarshall
Copy link
Member

Previously reported in #608 and #620, but perhaps this time the VCF maintainers will be moved to address this. 😃

#620 lists some options for resolving this.

@jmarshall jmarshall added the vcf label May 31, 2024
@jmarshall
Copy link
Member

For ease of reference, here is the relevant comment from #620:

The test/vcf/4.[123]/passed/complexfile_passed_000.vcf test files still contain another unstructured meta-information line with angle bracket delimiters:
[##AnalysisTitleBrackets=<…> as above]

The maintainers should also update the ##AnalysisTitleBrackets lines, either just removing them or perhaps rewriting them as ##AnalysisTitleStructured=<ID="FINRISK: … (DILGOM)">.

@d-cameron d-cameron self-assigned this Jun 18, 2024
@jmarshall jmarshall moved this to New items in GA4GH File Formats Jun 18, 2024
@jmarshall jmarshall moved this from New items to Progressing in GA4GH File Formats Jun 18, 2024
jmarshall added a commit to jmarshall/hts-specs that referenced this issue Jun 18, 2024
In VCF v4.3 structured meta-information lines must have an ID subfield.
Split the value into identifier-like ID and text Description subfields
to demonstrate the structured benefit, though for a non-INFO/FORMAT/etc
line there's probably nothing requiring ID's value to be identifier-like.
Fixes samtools#772.

This leaves the similar 4.1 and 4.2 files as is, as it is less clear
whether ID-less structured headers are merely discouraged there.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: Progressing
Development

Successfully merging a pull request may close this issue.

3 participants