DALI v1.15.0
Key Features and Enhancements
This DALI release includes the following key features and enhancements:
- Added the GPU audio resampling operator (#3884, #3914 and #3911).
- Improved the performance of the GPU
fn.readers.numpy
by custom GDS staging (#3894, #3905). - Added support for video processing and per-frame (temporal) arguments to the
warp_affine
operator (#3879, #3900). - Added HEVC support to the GPU frames decoder (#3896).
- Added experimental support for the eager execution of stateless operators as Python functions and readers as iterators (#3887, #3930).
- Added CUDA 11.7 support (#3906).
- Profiling improvements:
Fixed Issues
The following issues were fixed in this release:
- Added the missing device/device synchronization when copying pipeline outputs with copy_to_external (#3953).
- Fixed the buffer synchronization between default and custom stream in a multi-GPU case (#3957).
Improvements
- Fix Python formatting (#3961)
- Fix coverity issues (#3974)
- Add FindReduceGPU and FindRegionGPU kernels (#3951)
- Fix Python formatting (#3965)
- Add .style.yapf file (#3970)
- Update Optical Flow example (#3971)
- Fix per frame pass through (#3959)
- Fixing Python code formatting (#3948)
- Suppress the use of a staging buffer for nvJPEG input if it's already pinned.(#3956)
- Fix cyclic dependency import problem in fn.py in python 3.6 (#3955)
- Refactor qa test scripts (#3933)
- Change thread pool creation for eager operators to lazy (#3931)
- Fix sequence shape test (#3949)
- Expose readers as iterators in eager mode (#3930)
- Add Python linter (#3929)
- Remove redundant quote marks from the protobuf version specifier (#3945)
- Skip GDS tests when the GPU is incompatible. (#3941)
- Add sequence processing to warp operator (#3879)
- Add MovingMeanSquareGpu kernel (#3922)
- Pin protobuf to <4 for Paddle Paddle (#3940)
- Update compilation flags for the DALI TensorFlow plugin (#3943)
- Change MultiDevice to MultiGpu test suffix (#3942)
- Bump up the nvidia-tensorflow version to 20.05 in tests (#3938)
- Add FindFirstLastGPU kernel (#3932)
- Adjust PR template to ask for listing exisiting tests that apply (#3939)
- Pin protobuf to <4 (#3934)
- Add VFR detection (#3921)
- Fix CVE-2022-0562 in libtiff (#3925)
- Update RNN-T pipeline tests to include GPU resampling and silence detection (#3920)
- Add more NVTX ranges to the executor (#3928)
- Add HEVC support for FramesDecoderGpu (#3896)
- Add a thread name to all DALI threads (#3912)
- Add dataclasses pip package to tests deps to fix Python3.6 operator tests (#3926)
- Add
fn.experimental.audio_resample
GPU (#3911) - Custom staging for GDS (#3894)
- Update the readme roadmap link to use 2022 one (#3918)
- Support specifying per-frame positional arguments in sequence processing test utility (#3901)
- Move audio resampler CPU implementation to a single compilation unit (#3914)
- Add stateless CPU eager operators (#3887)
- Add CUDA 11.7 support (#3906)
- Add
VideoReaderDecoder
test for missing labels (#3908) - Add signal resampling GPU kernel (#3884)
- Optimize parameter passing for ScatterGather GPU (#3905)
- Add references to ops documentation in the tutorials (#3904)
- Enable per-frame operator on GPU (#3900)
Bug Fixes
- Fix dltensor operator tests (#3984)
- Prevent clobbering of outputs before non-blocking copy_to_external finishes. (#3953)
- Fix a bug in AccessOrder when synchronizing with a default stream on the same device, which is not the current device. (#3957)
- Workaound GDS memory leak in GDSMem tests. (#3936)
- Fix circular imports in eager mode (#3919)
- Remove intermediate Tensor and use DynamicScratchpad for op tile descirptors. (#3915)
- Add missing moving of order in TensorVector's move assgiment/constructor (#3899)
Breaking API changes
There are no breaking changes in this DALI release.
Deprecated features
There are no deprecated features in this DALI release.
Known issues:
- The video loader operator requires that the key frames occur, at a minimum, every 10 to 15 frames of the video stream.
If the key frames occur at a frequency that is less than 10-15 frames, the returned frames might be out of sync. - Experimental VideoReaderDecoder does not support open GOP.
It will not report an error and might produce invalid frames. VideoReader uses a heuristic approach to detect open GOP and should work in most common cases. - The DALI TensorFlow plugin might not be compatible with TensorFlow versions 1.15.0 and later.
To use DALI with the TensorFlow version that does not have a prebuilt plugin binary shipped with DALI, make sure that the compiler that is used to build TensorFlow exists on the system during the plugin installation. (Depending on the particular version, you can use GCC 4.8.4, GCC 4.8.5, or GCC 5.4.) - Due to some known issues with meltdown/spectra mitigations and DALI, DALI shows best performance when running in Docker with escalated privileges, for example:
privileged=yes
in Extra Settings for AWS data points--privileged
or--security-opt seccomp=unconfined
for bare Docker.
Binary builds
Install via pip for CUDA 10.2:
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-cuda102==1.15.0
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-tf-plugin-cuda102==1.15.0
or for CUDA 11:
CUDA 11.0 build uses CUDA toolkit enhanced compatibility. It is built with the latest CUDA 11.x toolkit
while it can run on the latest, stable CUDA 11.0 capable drivers (450.80 or later).
Using the latest driver may enable additional functionality.
More details can be found in enhanced CUDA compatibility guide.
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-cuda110==1.15.0
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-tf-plugin-cuda110==1.15.0
Or use direct download links (CUDA 10.2):
- https://developer.download.nvidia.com/compute/redist/nvidia-dali-cuda102/nvidia_dali_cuda102-1.15.0-5080387-py3-none-manylinux2014_x86_64.whl
- https://developer.download.nvidia.com/compute/redist/nvidia-dali-tf-plugin-cuda102/nvidia-dali-tf-plugin-cuda102-1.15.0.tar.gz
Or use direct download links (CUDA 11.0):
- https://developer.download.nvidia.com/compute/redist/nvidia-dali-cuda110/nvidia_dali_cuda110-1.15.0-5080390-py3-none-manylinux2014_x86_64.whl
- https://developer.download.nvidia.com/compute/redist/nvidia-dali-cuda110/nvidia_dali_cuda110-1.15.0-5080390-py3-none-manylinux2014_aarch64.whl
- https://developer.download.nvidia.com/compute/redist/nvidia-dali-tf-plugin-cuda110/nvidia-dali-tf-plugin-cuda110-1.15.0.tar.gz
FFmpeg source code:
Libsndfile source code: