Skip to content

Releases: aprilweilab/picovcf

v2.1 - Better tools, better unphased data handling, bug fixes

23 Dec 15:16
Compare
Choose a tag to compare
  • Add igdtools, which handles conversion, filtering, and stats of an IGD file.
  • Proper support for unphased data. Instead of just treating it identically to phased data (storage-wise), we now have numCopies defined on each variant in the index. For unphased diploid data, numCopies=1 is a heterozygote and numCopies=2 is a homozygote. numCopies=0 is unused (correspond to homozygous w.r.t. reference). igdtools supports unphased data in this way, as does VCF conversion. There is an example in examples/ that demonstrates how to compute runs-of-homozygosity (ROH) using this format.
  • Increment file format to V4 (backwards-compatible). Shrinks string representations a bit.
  • Speed up and simplify IGD writing by constructing each variant row in RAM prior to writing.
  • Properly clang-format the code in picovcf.hpp

v2.0

15 Apr 12:46
Compare
Choose a tag to compare

This release increments the IGD file format version from v2 to v3. There aren't really any VCF-related changes.

  • Simplification of missing data handling. Previously, it was stored in its own table as sparse lists, and loaded all at once. For any non-trivial amount of missing data this could use a fair amount of RAM. Now it is just another row in the "regular" data rows.
  • Each row can be either sparse or not. Sparse rows are list of sample indexes (like missing data was before). Non-sparse are bit vectors (like all the other data was before). This change makes the resulting file significantly smaller than before, for large datasets.
  • The API for writing IGD data rows was simplifed.
  • Faster processing of the bitvector representation.
  • Example in igdpp for computing allele frequency.
  • In-memory storage of allele values (the strings) was reworked to be significantly smaller. More than 6x smaller for most alleles.

v1.0

05 Feb 18:34
Compare
Choose a tag to compare

First release. VCF and VCF.GZ parsing. IGD parsing and creation.