All notable changes, updates, and fixes to pod5 will be documented here
The format is based on Keep a Changelog and this project adheres to Semantic Versioning.
pod5 convert fast5
now creates logs whenPOD5_DEBUG=1
setpod5 convert fast5
checks multi-read fast5s at conversion time
- Fixed memory usage growth over time as signal was loaded with large pod5 files.
- Fixed crash loading malicious files (found via fuzz testing)
- Fixed leaks and UB when running unit tests.
- Fixed run-away memory consumption during fast5 conversion
- Updated internal arrow version to 8.0.0.3
- Fixed issue where pod5 would read out of bounds memory when decompressing some reads.
- Refactored
pod5 convert fast5
to useconcurrent.futures
only. - Add further info to error message when signal cannot be decompressed by zstd
- Make merge operation not generate multiple identical run infos.
- Fixed closing uninitialised file handles.
- Fixed
pod5 inspect reads
repeating header - Fixed a crash with certain pod5 search operations.
- Fix loading large pod5 files on virtual-memory limited systems.
- Added
--output
argument topod5 convert fast5
andto_fast5
replacing positional argument of the same name - Added
--strict
argument topod5 convert fast5
to promptly stop on exceptions - Added readthedocs documentation links in README.md
- Updated developer installation instructions to use
conan<2
- Reworked
pod5 convert fast5
to tolerate runtime exceptions - Use same type
run_info_index_t
forpod5_get_file_run_info_count
andpod5_get_file_run_info
.
- Fixed file handle leak in repacker
- Python API supports python 3.11
- Added missing python API wheels on windows
- Changed python API dependency version
pyarrow~=11.0.0
from8.0.0
to support python 3.11 - Changed python API dependency version
hdf5~=8.0.0
fromv7.0.0
to support python 3.11
- Added
pod5_get_read_count
to find the count of all reads in file - Added
pod5_get_read_ids
to retrieve all read id's in file - Added
pod5_get_file_run_info
to retrieve a run info at an absolute index in the file - Added
pod5_free_run_info
to free run info's (replacespod5_release_run_info
) - Added
pod5_get_file_run_info_count
to find the number of run info's in a file - Added
pod5 filter
tool to subset pod5 files with simple list of read ids - Added
tqdm
progress bar topod5 subset
(disable withPOD5_PBAR=0
)
- Reworked
pod5 subset
to give better control over resources used pod5 subset
can now parse csv and tsv tables / summariespod5 repack
now repacks all inputs one-to-one
- Deprecated
pod5_release_run_info
(seepod5_free_run_info
)
- Removed filepath header line from
pod5 inspect reads
- Added version attributes to
lib-pod5
- Versioning now controlled by VCS inspection using
setuptools_scm
- Added more
read_id
getter methods toReader
- Added support for python 3.8 + 3.10 on windows
- Added gcc7 linux build of pod5
- Update to zlib 1.2.13
- Update to zstd 1.5.4
- Pinned
pre-commit=v2.21.0
while supportingpython3.7
- Reworked
pod5 convert to_fast5
output filenames to allow for1-1
mapping
- Fixed
pod5 inspect read
- Fixed
pod5 convert to_fast5
creating an empty fast5 output - Fixed
pod5 convert to_fast5
ignoring the--force_overwrite
argument - Fixed issue where thread_pool.h wasn't shipped.
- Explicitly re-exported
lib-pod5
public symbols and addedpy.typed
marker file to support type-checking.
- Fixed issue where closing many pod5 files in sequence is slow.
- Fixed incorrect python types and adopted python type-checking.
- Linux python 3.11 wheels
- ReadTheDocs documentation support
- OSX arm64 wheel naming corrections - works with wider set of python executables
- Added
Reader.__iter__
method.
- Renamed
EndReason.name
toEndReason.reason
to access the inner enum and addedEndReason.name
as a property to return the string representation of this enum value. BaseRead
,Read
,CompressedRead
,Calibration
andPore
dataclasses are now mutable.
- Removed deprecated
Writer
functions.
- Fixed osx arm64 wheel compatibility for older python versions.
- Fixed EndReason type errors.
- Fixed EndReason in pod5 to fast5 conversion.
- Optimised the file writing utilities
- Restricted exported boost dependencies of conan package to just the boost::headers component.
- Documentation edits
Writer.add_reads
now handles bothRead
andCompressedRead
.
- Deprecated
Writer
methodsadd_read_object
andadd_read_objects
foradd_read
andadd_reads
respectively.
- Removed direct pod5 tool scripts.
- Fixed name of internal utils - "pad_file".
- Fixed spelling of various internal variables.
- Fixed
pod5 convert to_fast5
- Reformat c++ code with more consistent format file.
- Added
pod5
tools entry-point - Added api to query file version information as written on disk.
- Fixed signal_chunk_size type error in convert-from-fast5
- Replaced
ont_fast5_api
dependency withvbz_h5py_plugin
- Restructured Python packaging to include
lib_pod5_format
which contains the native bindings build from pybind11. pod5_format
andpod5_format_tools
are now pure python packages which depend onlib_pod5_format
- Python packages
pod5_format
andpod5_format_tools
have been merged into singlepod5
pure-python package. pod5-convert-from-fast5
--output-one-to-one
reworked so that output files maintain the input structure making this argument more flexible and avoid filename clobbering.- Added missing
lib_pod5.update_file
function to pyi. pod5-convert-from-fast5
output
now takes existing directories and writesoutput.pod5
(current behaviour) or creates a new file with the given name if it doesn't exist.- Renamed arguments in tools relating to multi-processing / multi-threading from
-p/--processes
to the mode common-t/--threads
.
- Fixed pod5-inspect erroring when loading data.
- Fixed issue where some files in between 0.34 - 0.38 wouldn't load correctly.
- Fixed migrating of large files from older versions.
- Fixed building against the c++ api - previously missing include files.
- All data in the read table that was previously contained in dictionaries of structs is now stored in the read table, or a new "run info" table.
This change simplifies data access into the pod5 files, and helps users who want to convert the pod5 data to pandas or other arrow-compatible reader formats.
Old data is migrated on load, and will continue to work, data can be permanently migrated using the tool
pod5-migrate
- Support for opening and writing "split" pod5 files. All API's now expect and return combined pod5 files.
- Updated Conan recipe to support building without specifying C++ standard version.
- Bump the Boost and Arrow versions to pick up latest changes.
- Support C++17 + C++20 with the conan package pod5 generates.
- Modified
pod5_format_tools/pod5_convert_to_fast5.py
to separatepod5_convert_to_fast5_argparser()
andconvert_from_fast5()
out frompod5_convert_from_fast5.main()
.
- Added
num_samples
field to read table, containing the total number of samples a read contains. The field is filled in by API if it doesn't exist.
- File version is now V2, due to the addition of
num_samples
.
- Fixed an issue where multi-threaded access to a single batch could cause a crash discovered by dorado testing.
- Fixed help text in convert to fast5 script.