Releases: aprilweilab/picovcf
Releases · aprilweilab/picovcf
v2.1 - Better tools, better unphased data handling, bug fixes
- Add
igdtools
, which handles conversion, filtering, and stats of an IGD file. - Proper support for unphased data. Instead of just treating it identically to phased data (storage-wise), we now have
numCopies
defined on each variant in the index. For unphased diploid data,numCopies=1
is a heterozygote andnumCopies=2
is a homozygote.numCopies=0
is unused (correspond to homozygous w.r.t. reference).igdtools
supports unphased data in this way, as does VCF conversion. There is an example inexamples/
that demonstrates how to compute runs-of-homozygosity (ROH) using this format. - Increment file format to V4 (backwards-compatible). Shrinks string representations a bit.
- Speed up and simplify IGD writing by constructing each variant row in RAM prior to writing.
- Properly clang-format the code in picovcf.hpp
v2.0
This release increments the IGD file format version from v2 to v3. There aren't really any VCF-related changes.
- Simplification of missing data handling. Previously, it was stored in its own table as sparse lists, and loaded all at once. For any non-trivial amount of missing data this could use a fair amount of RAM. Now it is just another row in the "regular" data rows.
- Each row can be either sparse or not. Sparse rows are list of sample indexes (like missing data was before). Non-sparse are bit vectors (like all the other data was before). This change makes the resulting file significantly smaller than before, for large datasets.
- The API for writing IGD data rows was simplifed.
- Faster processing of the bitvector representation.
- Example in
igdpp
for computing allele frequency. - In-memory storage of allele values (the strings) was reworked to be significantly smaller. More than 6x smaller for most alleles.