Releases · aprilweilab/picovcf · GitHub

23 Dec 15:16

dcdehaas

v2.1 - Better tools, better unphased data handling, bug fixes Latest

Latest

Add igdtools, which handles conversion, filtering, and stats of an IGD file.
Proper support for unphased data. Instead of just treating it identically to phased data (storage-wise), we now have numCopies defined on each variant in the index. For unphased diploid data, numCopies=1 is a heterozygote and numCopies=2 is a homozygote. numCopies=0 is unused (correspond to homozygous w.r.t. reference). igdtools supports unphased data in this way, as does VCF conversion. There is an example in examples/ that demonstrates how to compute runs-of-homozygosity (ROH) using this format.
Increment file format to V4 (backwards-compatible). Shrinks string representations a bit.
Speed up and simplify IGD writing by constructing each variant row in RAM prior to writing.
Properly clang-format the code in picovcf.hpp

Assets 2

15 Apr 12:46

dcdehaas

v2.0

This release increments the IGD file format version from v2 to v3. There aren't really any VCF-related changes.

Simplification of missing data handling. Previously, it was stored in its own table as sparse lists, and loaded all at once. For any non-trivial amount of missing data this could use a fair amount of RAM. Now it is just another row in the "regular" data rows.
Each row can be either sparse or not. Sparse rows are list of sample indexes (like missing data was before). Non-sparse are bit vectors (like all the other data was before). This change makes the resulting file significantly smaller than before, for large datasets.
The API for writing IGD data rows was simplifed.
Faster processing of the bitvector representation.
Example in igdpp for computing allele frequency.
In-memory storage of allele values (the strings) was reworked to be significantly smaller. More than 6x smaller for most alleles.

Assets 2

05 Feb 18:34

dcdehaas

v1.0

First release. VCF and VCF.GZ parsing. IGD parsing and creation.

Assets 2