Releases: NVIDIA/DALI
DALI v1.19.0
Key Features and Enhancements
This DALI release includes the following key features and enhancements:
- Added the
experimental.decoders.video
stand-alone video decoder to decode video on GPU and CPU provided as an in-memory buffer (for example, through an external source) (#4354, #4296). - Added support to decode indexless videos (#4347, #4302, and #4335).
Fixed Issues
The following issues were fixed in this release:
- Fixed the handling of Caffe LMDB empty samples (without data or labels) (#4266).
Improvements
- Exclude HEVC files from video decoder test. (#4357)
- Fix a typo in Debug Mode documentation (#4355)
- Parallelize gpu video decoding (#4354)
- Make tests for DALI linked dynamically with CUDA more flexible (#4341) [categories: Other]
- Update MXNet version used in tests (#4342)
- Enable indexless video decoding for GPU (#4347)
- Prevent obtaining handle values from dead unique handles and stream leases. (#4346)
- Update broadcasting shape simplification logic (#4314)
- Add warning about the end of support for CUDA 10.2 (#4334)
- Frames decoder gpu without index (#4302)
- Enable indexless decoding in CPU video decoder (#4335)
- Update outdated links in the documentation (#4329)
- Add Mixed VideoDecoder (#4296)
- Update cutlass and DALI_deps revision. (#4328)
- Fixes and performance improvments in imgcodec/nvjpeg (#4318)
- Update Jetson build env to support CUDA 11.4 and Orin (#4250)
- Update nvJPEG2k version to 0.6.0 (#4320)
- Add missing documentation to (Future)DecodingResult(Promise). (#4310)
- Update libcudacxx target macros for clang and SM90. (#4315)
- Don't use nvjpegGetHardwareDecoderInfo in pre-11.8 toolkits. (#4325)
- Prune static cuda libraries DALI links with from unused archs (#4317)
- Fix clang warnings (#4312)
- Add pass-through tracking to auto-pinning buffers (#4294)
- Update protobuf (v21.5 to v21.7) (#4313)
- Extended ImageDecoder tests (#4297)
- Refactor OpSchema - move implementation to one translation unit (#4293)
- Emit the warning about the default value change only when using the default. (#4214)
- Reduce the batch size in RN50 data pipeline tests. (#4304)
- Enable ROI adjustment for multi-frame inputs + cleanup. (#4303)
- Use GPU Convert in nvJPEG decoder (#4247)
- Aggregating ImageDecoder (#4224)
- Support palette TIFFs (#4206)
- Refactor video decoder for reusability (#4290)
- Add ROI support to nvJPEG (#4244)
- RemapKernel API (#4284)
- Presteps to image_decoder.* APIs (#4277)
- Add frames decoder CPU without index (#4278)
- Add experimental.decoders.video for CPU (#4270)
- Fix a typo in the documentation (#4258)
- Add orientation to GPU image data Convert (#4232)
- Fix hang in TL1_tensorflow-dali_test (#4255)
- Make test_dltensor_operator.py consistent when the HW decoder is available (#4272)
- Fix issues in DALI in action snippet (#4268)
- Assure operator documentation links to enum types (#4264)
- Support applying orientation in Convert (#4219)
- Add image decoder registry. (#4261)
- Support tiled TIFFs (#4201)
- Bump up TensorFlow version in tests (#4238)
Bug Fixes
- Fix coverity issues (#4349)
- Revert pruning of unused architectures (#4336)
- Fix order of access order waiting in TL's set_order (#4338)
- Fix NVJPEG pinned buffer synchronization. (#4337)
- Change the default order of data storage objects (#4276)
- Fix checking of the return status of the bundle lib tests (#4330)
- Fix executor test - add test operators (#4323)
- Fix parameter propagation in ImageDecoder. (#4309)
- Fix normalization when running GPU color space conversion (#4285)
- Fix support for ANY_DATA in nvJPEG2K (#4299)
- Fix inconsistent tensor recreation in TensorList (#4286)
- Fix no ffmpeg build (#4288)
- Fix libtiff error handling (#4274)
- Fix imgcodec batched APIs and tests (#4263)
- Fix handling of Caffe LMDB without valid data (#4266)
- Move params in PerThreadResources move constructor (#4265)
- Fix fusing the dimensions in SliceFlipNormalizePermutePadGpu (#4234)
- Improve error handling in LibTiffDecoder (#4210)
- Fix exception handling in BatchParallelDecoderImpl (#4262)
- Make nvjpeg decoder use its own thread pool (#4241)
Breaking API changes
There are no breaking changes in this DALI release.
Deprecated features
DALI will drop support for CUDA 10.2 in an upcoming release.
Known issues:
- The video loader operator requires that the key frames occur, at a minimum, every 10 to 15 frames of the video stream.
If the key frames occur at a frequency that is less than 10-15 frames, the returned frames might be out of sync. - Experimental VideoReaderDecoder does not support open GOP.
It will not report an error and might produce invalid frames. VideoReader uses a heuristic approach to detect open GOP and should work in most common cases. - The DALI TensorFlow plugin might not be compatible with TensorFlow versions 1.15.0 and later.
To use DALI with the TensorFlow version that does not have a prebuilt plugin binary shipped with DALI, make sure that the compiler that is used to build TensorFlow exists on the system during the plugin installation. (Depending on the particular version, you can use GCC 4.8.4, GCC 4.8.5, or GCC 5.4.) - In experimental debug and eager modes, the GPU external source is not properly synchronized with DALI internal streams.
As a workaround, you can manually synchronize the device before returning the data from the callback. - Due to some known issues with meltdown/spectra mitigations and DALI, DALI shows best performance when running in Docker with escalated privileges, for example:
privileged=yes
in Extra Settings for AWS data points--privileged
or--security-opt seccomp=unconfined
for bare Docker.
Binary builds
Install via pip for CUDA 10.2:
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-cuda102==1.19.0
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-tf-plugin-cuda102==1.19.0
or for CUDA 11:
CUDA 11.0 build uses CUDA toolkit enhanced compatibility. It is built with the latest CUDA 11.x toolkit
while it can run on the latest, stable CUDA 11.0 capable drivers (450.80 or later).
Using the latest driver may enable additional functionality.
More details can be found in enhanced CUDA compatibility guide.
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-cuda110==1.19.0
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-tf-plugin-cuda110==1.19.0
Or use direct download links (CUDA 10.2):
- https://developer.download.nvidia.com/compute/redist/nvidia-dali-cuda102/nvidia_dali_cuda102-1.19.0-6205437-py3-none-manylinux2014_x86_64.whl
- https://developer.download.nvidia.com/compute/redist/nvidia-dali-tf-plugin-cuda102/nvidia-dali-tf-plugin-cuda102-1.19.0.tar.gz
Or use direct download links (CUDA 11.0):
- https://developer.download.nvidia.com/compute/redist/nvidia-dali-cuda110/nvidia_dali_cuda110-1.19.0-6205436-py3-none-manylinux2014_x86_64.whl
- https://developer.download.nvidia.com/compute/redist/nvidia-dali-cuda110/nvidia_dali_cuda110-1.19.0-6205436-py3-none-manylinux2014_aarch64.whl
- https://developer.download.nvidia.com/compute/redist/nvidia-dali-tf-plugin-cuda110/nvidia-dali-tf-plugin-cuda110-1.19.0.tar.gz
FFmpeg source code:
Libsndfile source code:
DALI v1.18.0
Key Features and Enhancements
This DALI release includes the following key features and enhancements:
- Unified batch representation in the GPU and CPU stages of the pipeline (effort towards conditional execution) (#4253, #4236, #4220, #4189).
- Added support to specify the
fill_value
argument for each sample in thefn.erase
operator (#4182). - Added support for the memory video file in
FramesDecoder
(#4184). - Moved the
audio_resample
operator out of experimental module (#4194).
Fixed Issues
The following issues were fixed in this release:
- Fixed an unnecessary synchronization in MakeContiguous. (#4248).
- Fixed the Python tool to create the webdataset index (#4226).
- Added a fix to prevent DALI from allocating GPU memory when constructing CPU TensorList (#4203).
- Fixed a PyTorch example to comply with the new PyTroch (#4213).
Improvements
- GPU image data conversion (#4208)
- Fix libtiff and libtar vulnerabilities (#4245)
- Update third party dependencies (#4233)
- Reduce batch size in the
WebDataset integration using External Source
example (#4240) - Rename the set and copy sample APIs in TensorList (#4236)
- Move nvjpeg decoder files to imgcodec/decoders/nvjpeg/ (#4235)
- Add Nvjpeg decoder (#4178)
- Rename TensorVector to TensorList (#4220)
- Make JPEG HW decoder test to fully use HW and not hybrid approach (#4222)
- Add bulk parameter passing to decoders and factories. (#4212)
- Support any bitdepth in TIFF (#4180)
- Remove TensorList and use only TensorVector (#4189)
- [imgcodec] API adjustments (#4205)
- ROI support for nvjpeg2k decoder (#4175)
- Use deprecated PIL resampling import for Python 3.6, due to lack of availability of a newer version of PIL (#4200)
- Add arithmetic expression broadcasting utils (#4188)
- Support higher TIFF bitdepths (#4174)
- Enable per-sample
fill_value
argument in Erase operator (#4182) - Fix python linter errors for the qa/ directory (#4117)
- Fix usage of deprecated np.float in tests (#4192)
- Adjust PIL interpolation types to module PIL.Image.Resampling (#4195)
- Move
audio_resample
out of experimental module (#4194) - Support different layouts in imgcodec's Convert (#4157)
- Fix typos in iterator last_batch_policy argument documentation (#4170)
- Fix synchronization in external source tests (#4153)
- Add support for memory video file in FramesDecoder (#4184)
- Support outputting YCbCr in libjpeg-turbo decoder (#4156)
- Use std::exchange in move operator for Tensors (#4183)
Bug Fixes
- Unify buffers caching in CPU/GPU external source (#4253)
- Fix builds without nvJPEG (#4252)
- Separate nvjpeg lib wrapper and stub from the decoder (#4249)
- Prevent unnecessary synchronization in MakeContiguous. (#4248)
- Do not leak DecodeParams (#4242)
- Fix AssertClose bug in Imgcodec tests (#4243)
- Fix bug in CPU Convert (#4237)
- Fix webdataset python index creation script (#4226)
- Fix In memory video decoding tests (#4216)
- Fix UnpackBits (#4227)
- Fix issues detected by Coverity. (#4221)
- Make TensorList constructor for CPU not using GPU memory (#4203)
- Fix the indexing for newer PyTorch (#4213)
- Fix possibly incorrect parallel write access to vector (#4211)
- Fix Layout propagation in TensorVector (#4202)
Breaking API changes
There are no breaking changes in this DALI release.
Deprecated features
There are no deprecated features in this DALI release.
Known issues:
- The video loader operator requires that the key frames occur, at a minimum, every 10 to 15 frames of the video stream.
If the key frames occur at a frequency that is less than 10-15 frames, the returned frames might be out of sync. - Experimental VideoReaderDecoder does not support open GOP.
It will not report an error and might produce invalid frames. VideoReader uses a heuristic approach to detect open GOP and should work in most common cases. - The DALI TensorFlow plugin might not be compatible with TensorFlow versions 1.15.0 and later.
To use DALI with the TensorFlow version that does not have a prebuilt plugin binary shipped with DALI, make sure that the compiler that is used to build TensorFlow exists on the system during the plugin installation. (Depending on the particular version, you can use GCC 4.8.4, GCC 4.8.5, or GCC 5.4.) - In experimental debug and eager modes, the GPU external source is not properly synchronized with DALI internal streams.
As a workaround, you can manually synchronize the device before returning the data from the callback. - Due to some known issues with meltdown/spectra mitigations and DALI, DALI shows best performance when running in Docker with escalated privileges, for example:
privileged=yes
in Extra Settings for AWS data points--privileged
or--security-opt seccomp=unconfined
for bare Docker.
Binary builds
Install via pip for CUDA 10.2:
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-cuda102==1.18.0
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-tf-plugin-cuda102==1.18.0
or for CUDA 11:
CUDA 11.0 build uses CUDA toolkit enhanced compatibility. It is built with the latest CUDA 11.x toolkit
while it can run on the latest, stable CUDA 11.0 capable drivers (450.80 or later).
Using the latest driver may enable additional functionality.
More details can be found in enhanced CUDA compatibility guide.
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-cuda110==1.18.0
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-tf-plugin-cuda110==1.18.0
Or use direct download links (CUDA 10.2):
- https://developer.download.nvidia.com/compute/redist/nvidia-dali-cuda102/nvidia_dali_cuda102-1.18.0-5920075-py3-none-manylinux2014_x86_64.whl
- https://developer.download.nvidia.com/compute/redist/nvidia-dali-tf-plugin-cuda102/nvidia-dali-tf-plugin-cuda102-1.18.0.tar.gz
Or use direct download links (CUDA 11.0):
- https://developer.download.nvidia.com/compute/redist/nvidia-dali-cuda110/nvidia_dali_cuda110-1.18.0-5920076-py3-none-manylinux2014_x86_64.whl
- https://developer.download.nvidia.com/compute/redist/nvidia-dali-cuda110/nvidia_dali_cuda110-1.18.0-5920076-py3-none-manylinux2014_aarch64.whl
- https://developer.download.nvidia.com/compute/redist/nvidia-dali-tf-plugin-cuda110/nvidia-dali-tf-plugin-cuda110-1.18.0.tar.gz
FFmpeg source code:
Libsndfile source code:
DALI v1.17.0
Key Features and Enhancements
This DALI release includes the following key features and enhancements:
- Added CUDA 11.8 support.
- Improved color conversion performance and precision (#4139).
- Laid the groundwork for ongoing conditional execution effort (#4149, #4124, #4083, #3827, #4049).
- Laid the groundwork for ongoing effort on improved decoding and processing of images.
- Documentation improvements (#4168, #4102, #4059, #4094).
Fixed Issues
The following issues were fixed in this release:
- Fixed default dtype in color twist family of operators (#4067)
- Fix handling of TIFFs with palette (#4089)
Improvements
- Separating nvjpeg2k utils in imgcodec (#4160)
- Add NvJpeg2000Decoder (#4114)
- Port operators Python tests to
nose2
(#4037) - Refactor Tensor Vector (#4149)
- Rename ImageDecoder to ImageDecoderFactory. (#4169)
- Add section on deferred setup and shm limit to PES docs (#4168)
- Change pinned version of matplotlib (#4167)
- Add LibTIFF decoder (#4109)
- Make decoder_test_helper.h accept TensorView (#4154)
- Update dependencies (#4152)
- Add color conversion support (#4143)
- Extend the ImageDecoder testing framework to support GPU decoders (#4142)
- Add color space conversion to imgcodec (#4121)
- Fix CVE-2022-34526 (#4133)
- Copy nvjpeg utils into imgcodec (#4148)
- Fix linter for files inisde the dali_tf_plugin directory (#4118)
- Add LibJpegTurboDecoder (#4099)
- Color conversion - optimizations and tests (#4139)
- Move to CUDA 11.7U1 (#4137)
- Remove pageable copies from Convolution, Transpose and Warp kernels. (#4141)
- Add AsTensor and related APIs to Tensor Vector (#4124)
- [imgcodec] Add thread index and cuda stream to Decode APIs (#4128)
- Move operator test files (#4125)
- Silence some constexpr-related warnings in NVCC 10. (#4131)
- Move libjpeg-turbo utils/impl to imgcodec directory (#4129)
- Add missing constexpr to vec and mat. (#4130)
- Parse EXIF metadata in PNG imgcodec parser (#4122)
- Add parenthesis to assert to avoid using
\
(#4123) - Fix error reported by flake8 5.0.1 (#4120)
- Turn Python linter on by default (#3997)
- Add decoder test framework (#4103)
- Add dali namespace to third_party copy of OpenCV's exif (#4112)
- Parsing EXIF metadata in WebP images (#4087)
- Add PNG parser (#4052)
- Fix OpenCV warning in jpeg compression distortion tests (#4107)
- Document unsupported external source arguments in TF Dataset (#4102)
- Add boilerplate synchronization for batch copying (#4083)
- Pin Numba version to 0.55.2 (#4108)
- Example image decoder using OpenCV (#4036)
- Remove signal handler for SIGKILL (#4015)
- Extract common functions from numpy reader (#4100)
- Add JPEG EXIF parser (#4073)
- Remove video reader warning that a frame has been seen twice (#4092)
- Remove unnecessary loggin from resize checkerboard tests (#4086)
- Add Jpeg2000 parser (#4068)
- Fix flake8 warnings (#4074)
- Fix & extend formatting of collections. (#4082)
- Add inherited members to the Pytorch plugin docs (#4094)
- Adjust Doxygen configuration (#4088)
- Add imgcodec compatibility tests (#4057)
- Add restrictions to set_type (#4071)
- Add WebP parser (#4053)
- Add JPEG Parser (#4050)
- Silence buggy GCC warning about freeing non-heap objects. (#4077)
- Add a tool for testing Imgcodec against ImageMagick (#4058)
- BMP parser (#4062)
- Make endian swapping work with ADL. (#4075)
- Add utilities for swapping endianness. (#4069)
- Add PNM parser (#4044)
- Add references to image_processing/index. Add optional ordering to references. (#4059)
- Extract EXIF parser from OpenCV (#4063)
- Fix ifndef guards to be at the end of the file (#4064)
- Stop exposing internal contiguous TV storage (#3827)
- ReadValue extension to support enums (#4060)
- Propagate device_id in ShareData and SetSample APIs (#4049)
- Add TIFF parser (#4040)
- Make the DALI video reader throw an exception when the VFR video is decoded (#4022)
- Add ReadHeader util to parser baseclass (#4042)
Bug Fixes
- Prevent excessive synchronization in MakeContiguous (#4228)
- Prevent overflow in random_resized_crop tests (#4187)
- Fix invalid destruction order in decoder test helper (#4186)
- Added missing const in for loops (#4185)
- Fix coverity issues (#4164)
- Conditional compilation of TIFF Codec (#4166)
- Fix zlib CVE-2022-37434 (#4150)
- Pin matplotlib version to 3.5.2 (#4159)
- Fix parsing of grayscale bitmaps (#4147)
- Install flake8 for xavier builds (#4127)
- Fix handling of TIFFs with palette (#4089)
- Fix missing override in decoder test (#4105)
- Disable HEVC tests for FramesDecoderGpu when it is not supported by the GPU (#4084)
- Fix default dtype in color twist family of operators (#4067)
- Fix libtiff CVE-2022-2058, CVE-2022-2057, CVE-2022-2056 (#4047)
Breaking API changes
There are no breaking changes in this DALI release.
Deprecated features
There are no deprecated features in this DALI release.
Known issues:
- The video loader operator requires that the key frames occur, at a minimum, every 10 to 15 frames of the video stream.
If the key frames occur at a frequency that is less than 10-15 frames, the returned frames might be out of sync. - Experimental VideoReaderDecoder does not support open GOP.
It will not report an error and might produce invalid frames. VideoReader uses a heuristic approach to detect open GOP and should work in most common cases. - The DALI TensorFlow plugin might not be compatible with TensorFlow versions 1.15.0 and later.
To use DALI with the TensorFlow version that does not have a prebuilt plugin binary shipped with DALI, make sure that the compiler that is used to build TensorFlow exists on the system during the plugin installation. (Depending on the particular version, you can use GCC 4.8.4, GCC 4.8.5, or GCC 5.4.) - In experimental debug and eager modes, the GPU external source is not properly synchronized with DALI internal streams.
As a workaround, you can manually synchronize the device before returning the data from the callback. - Due to some known issues with meltdown/spectra mitigations and DALI, DALI shows best performance when running in Docker with escalated privileges, for example:
privileged=yes
in Extra Settings for AWS data points--privileged
or--security-opt seccomp=unconfined
for bare Docker.
Binary builds
Install via pip for CUDA 10.2:
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-cuda102==1.17.0
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-tf-plugin-cuda102==1.17.0
or for CUDA 11:
CUDA 11.0 build uses CUDA toolkit enhanced compatibility. It is built with the latest CUDA 11.x toolkit
while it can run on the latest, stable CUDA 11.0 capable drivers (450.80 or later).
Using the latest driver may enable additional functionality.
More details can be found in enhanced CUDA compatibility guide.
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-cuda110==1.17.0
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-tf-plugin-cuda110==1.17.0
Or use direct download links (CUDA 10.2):
- https://developer.download.nvidia.com/compute/redist/nvidia-dali-cuda102/nvidia_dali_cuda102-1.17.0-5838887-py3-none-manylinux2014_x86_64.whl
- https://developer.download.nvidia.com/compute/redist/nvidia-dali-tf-plugin-cuda102/nvidia-dali-tf-plugin-cuda102-1.17.0.tar.gz
Or use direct download links (CUDA 11.0):
- https://developer.download.nvidia.com/compute/redist/nvidia-dali-cuda110/nvidia_dali_cuda110-1.17.0-5838886-py3-none-manylinux2014_x86_64.whl
- https://developer.download.nvidia.com/compute/redist/nvidia-dali-cuda110/nvidia_dali_cuda110-1.17.0-5838886-py3-none-manylinux2014_aarch64.whl
- https://developer.download.nvidia.com/compute/redist/nvidia-dali-tf-plugin-cuda110/nvidia-dali-tf-plugin-cuda110-1.17.0.tar.gz
FFmpeg source code:
Libsndfile source code:
DALI v1.16.1
Key Features and Enhancements
This release includes bug fixes, so there are no new features or enhancements.
Fixed Issues
The following issues were fixed in this release:
- Fixed the fn.decoders.image was leaking memory on corrupted images (#4138).
- A memory leak in the libjpeg-turbo decoder implementation in case of corrupted images was fixed.
- Fixed a crash in the fn.readers.numpy, when pad_last_batch is set, and more then one thread is used by DALI (#4056).
- Fixed a faulty check that prevented the feed_input method from working after the pipeline was deserialized (#4096).
Improvements
- None
Bug Fixes
- Fix pad_last_batch in GPU NumpyReader (#4056)
- Fix feed_input after deserialization (#4096)
- Fix memory leak in libjpeg-turbo decoder implementation in case of corrupted images (#4138)
- Add zlib to conda recipe (#4173)
- Fix Numba versions in tests (#4111)
- Fix device pick in Numpy reader tests (#4104)
Breaking API changes
There are no breaking changes in this DALI release.
Deprecated features
There are no deprecated features in this DALI release.
Known issues:
- The video loader operator requires that the key frames occur, at a minimum, every 10 to 15 frames of the video stream.
If the key frames occur at a frequency that is less than 10-15 frames, the returned frames might be out of sync. - Experimental VideoReaderDecoder does not support open GOP.
It will not report an error and might produce invalid frames. VideoReader uses a heuristic approach to detect open GOP and should work in most common cases. - The DALI TensorFlow plugin might not be compatible with TensorFlow versions 1.15.0 and later.
To use DALI with the TensorFlow version that does not have a prebuilt plugin binary shipped with DALI, make sure that the compiler that is used to build TensorFlow exists on the system during the plugin installation. (Depending on the particular version, you can use GCC 4.8.4, GCC 4.8.5, or GCC 5.4.) - In experimental debug and eager modes, GPU external source is not properly synchronized with DALI internal streams. As a workaround, the user may manually synchronize the device before returning the data from the callback.
- Due to some known issues with meltdown/spectra mitigations and DALI, DALI shows best performance when running in Docker with escalated privileges, for example:
privileged=yes
in Extra Settings for AWS data points--privileged
or--security-opt seccomp=unconfined
for bare Docker.
Binary builds
Install via pip for CUDA 10.2:
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-cuda102==1.16.1
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-tf-plugin-cuda102==1.16.1
or for CUDA 11:
CUDA 11.0 build uses CUDA toolkit enhanced compatibility. It is built with the latest CUDA 11.x toolkit
while it can run on the latest, stable CUDA 11.0 capable drivers (450.80 or later).
Using the latest driver may enable additional functionality.
More details can be found in enhanced CUDA compatibility guide.
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-cuda110==1.16.1
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-tf-plugin-cuda110==1.16.1
Or use direct download links (CUDA 10.2):
- https://developer.download.nvidia.com/compute/redist/nvidia-dali-cuda102/nvidia_dali_cuda102-1.16.1-5688170-py3-none-manylinux2014_x86_64.whl
- https://developer.download.nvidia.com/compute/redist/nvidia-dali-tf-plugin-cuda102/nvidia-dali-tf-plugin-cuda102-1.16.1.tar.gz
Or use direct download links (CUDA 11.0):
- https://developer.download.nvidia.com/compute/redist/nvidia-dali-cuda110/nvidia_dali_cuda110-1.16.1-5688171-py3-none-manylinux2014_x86_64.whl
- https://developer.download.nvidia.com/compute/redist/nvidia-dali-cuda110/nvidia_dali_cuda110-1.16.1-5688171-py3-none-manylinux2014_aarch64.whl
- https://developer.download.nvidia.com/compute/redist/nvidia-dali-tf-plugin-cuda110/nvidia-dali-tf-plugin-cuda110-1.16.1.tar.gz
FFmpeg source code:
Libsndfile source code:
DALI v1.16.0
Key Features and Enhancements
This DALI release includes the following key features and enhancements:
- Added GPU non-silent region detection operator (#3944, #4001).
- Added experimental support for the eager execution of stateful operators and arithmetic operators (#4016, #3952, #3969, #3990).
- Added
antialias
flag to Resize operator for improved control over resampling mode used (#4032). - Added experimental support for custom GPU Numba operators (#3891, #3998, #4006, #4013).
- Added support for processing video and handling of temporal arguments to color-manipulation operators and affine transform operators (#3937, #3946, #3917).
Fixed Issues
The following issues were fixed in this release:
- Fixed DALI + PyTorch Lightning iterator issue resulting in subsequent epochs terminating too early (#3923, #4048).
- Fixed scalars handling by the readers.tfrecord operator (#4024).
- Fixed variable batch size handling by the crop and coord_transform operators (#4045, #3958).
Improvements
- Add little-endian and big-endian read functions for InputStreams (#4038)
- Add antialias flag to Resize (#4032)
- Reformat python files (#4026)
- Python formatting (#4035)
- Enable nose2 in Python Tests (#4033)
- Imgcodec module boilerplate (interfaces/placeholders/basic logic) (#4029)
- Remove deprecated option options.experimental_optimization.map_vectorization.enabled (#4027)
- Guided contribution tutorial (#4011)
- Fix python formatting (#3982)
- Add eager mode stateful operators (#4016)
- Disable Numba GPU op for incompatible Numba versions (#4025)
- Add missing quote marks to the DALI_AFFINITY_MASK usage example (#4020)
- Add abstract InputStream. Refactor existing FileStreams to in to use it. (#4019)
- Make DALI iterator to call
reset()
wheniter()
is called upon it (#3923) - Add eager mode operators coverage test (#3952)
- Add ack for Numba GPU op (#3998)
- Add eager mode arithm ops (#3969)
- Reduce DALI conda package installation time (#3995)
- Add Non-silent region GPU operator (#3944)
- Workaround for nosetests in Python 3.10 (#3986)
- Numba cuda operator (#3891)
- Fix Python formatting (#3992)
- Fix Python formatting (#3988)
- Add examples of processing video that utilize per-frame operator (#3917)
- Per frame affine transforms (#3946)
- Handle partially pruned multi-output external sources (#3975)
- Dependencies update (#3979)
- Doxygen typo (#3989)
- Add per frame parameters support to brightness_contrast and color_twist families (#3937)
- Fix missing return (#3985)
- Support vector alike output for OpSpec::TryGetRepeatedArgument (#3851)
- Fix Python formatting (#3962)
- Fix and reenable optimized Cast kernel (#3976)
Bug Fixes
- Fix lack of reset when iter() is called on the DALI framework iterator (#4048)
- Use actual batch size instead of max batch size in crop_attr.h (#4045)
- Support scalars in readers.tfrecord (#4024)
- Add const char* ctor to ThreadPool (#4005)
- Remove unconditional float16 type mapping in Numba GPU op (#4013)
- Change flake8 config (#4004)
- Fix Numba CI issues (#4006)
- Fix and simplify moving mean squares CPU kernel. (#4001)
- Fix nan check and unused external source arguments in debug mode (#3990)
- Fix fn.coord_transform handling of a default matrix in variable batch case (#3958)
- Fix test_dali_tf_dataset_mnist_eager test (#3991)
- Fix test_dali_tf_dataset_mnist_eager.py and test_dali_tf_dataset_mnist_graph.py tests (#3987)
- Improve handling of "dtype" arguments in OpSchema/OpSpec (#3981)
Breaking API changes
- The shape of scalars read by the readers.tfrecord operator is now
()
instead of(1,)
. - For
cubic
andlinear
interpolation modes, theresize
operator applies the antialiasing filter by default now. The antialiasing can be turned off with theantialias
flag.
Deprecated features
- The triangular interpolation for
resize
operator has been deprecated as it is equivalent to linear interpolation with antialiasing on.
Known issues:
- The video loader operator requires that the key frames occur, at a minimum, every 10 to 15 frames of the video stream.
If the key frames occur at a frequency that is less than 10-15 frames, the returned frames might be out of sync. - Experimental VideoReaderDecoder does not support open GOP.
It will not report an error and might produce invalid frames. VideoReader uses a heuristic approach to detect open GOP and should work in most common cases. - The DALI TensorFlow plugin might not be compatible with TensorFlow versions 1.15.0 and later.
To use DALI with the TensorFlow version that does not have a prebuilt plugin binary shipped with DALI, make sure that the compiler that is used to build TensorFlow exists on the system during the plugin installation. (Depending on the particular version, you can use GCC 4.8.4, GCC 4.8.5, or GCC 5.4.) - In experimental debug and eager modes, GPU external source is not properly synchronized with DALI internal streams. As a workaround, the user may manually synchronize the device before returning the data from the callback.
- Due to some known issues with meltdown/spectra mitigations and DALI, DALI shows best performance when running in Docker with escalated privileges, for example:
privileged=yes
in Extra Settings for AWS data points--privileged
or--security-opt seccomp=unconfined
for bare Docker.
Binary builds
Install via pip for CUDA 10.2:
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-cuda102==1.16.0
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-tf-plugin-cuda102==1.16.0
or for CUDA 11:
CUDA 11.0 build uses CUDA toolkit enhanced compatibility. It is built with the latest CUDA 11.x toolkit
while it can run on the latest, stable CUDA 11.0 capable drivers (450.80 or later).
Using the latest driver may enable additional functionality.
More details can be found in enhanced CUDA compatibility guide.
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-cuda110==1.16.0
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-tf-plugin-cuda110==1.16.0
Or use direct download links (CUDA 10.2):
- https://developer.download.nvidia.com/compute/redist/nvidia-dali-cuda102/nvidia_dali_cuda102-1.16.0-5323000-py3-none-manylinux2014_x86_64.whl
- https://developer.download.nvidia.com/compute/redist/nvidia-dali-tf-plugin-cuda102/nvidia-dali-tf-plugin-cuda102-1.16.0.tar.gz
Or use direct download links (CUDA 11.0):
- https://developer.download.nvidia.com/compute/redist/nvidia-dali-cuda110/nvidia_dali_cuda110-1.16.0-5322998-py3-none-manylinux2014_x86_64.whl
- https://developer.download.nvidia.com/compute/redist/nvidia-dali-cuda110/nvidia_dali_cuda110-1.16.0-5322998-py3-none-manylinux2014_aarch64.whl
- https://developer.download.nvidia.com/compute/redist/nvidia-dali-tf-plugin-cuda110/nvidia-dali-tf-plugin-cuda110-1.16.0.tar.gz
FFmpeg source code:
Libsndfile source code:
DALI v1.15.0
Key Features and Enhancements
This DALI release includes the following key features and enhancements:
- Added the GPU audio resampling operator (#3884, #3914 and #3911).
- Improved the performance of the GPU
fn.readers.numpy
by custom GDS staging (#3894, #3905). - Added support for video processing and per-frame (temporal) arguments to the
warp_affine
operator (#3879, #3900). - Added HEVC support to the GPU frames decoder (#3896).
- Added experimental support for the eager execution of stateless operators as Python functions and readers as iterators (#3887, #3930).
- Added CUDA 11.7 support (#3906).
- Profiling improvements:
Fixed Issues
The following issues were fixed in this release:
- Added the missing device/device synchronization when copying pipeline outputs with copy_to_external (#3953).
- Fixed the buffer synchronization between default and custom stream in a multi-GPU case (#3957).
Improvements
- Fix Python formatting (#3961)
- Fix coverity issues (#3974)
- Add FindReduceGPU and FindRegionGPU kernels (#3951)
- Fix Python formatting (#3965)
- Add .style.yapf file (#3970)
- Update Optical Flow example (#3971)
- Fix per frame pass through (#3959)
- Fixing Python code formatting (#3948)
- Suppress the use of a staging buffer for nvJPEG input if it's already pinned.(#3956)
- Fix cyclic dependency import problem in fn.py in python 3.6 (#3955)
- Refactor qa test scripts (#3933)
- Change thread pool creation for eager operators to lazy (#3931)
- Fix sequence shape test (#3949)
- Expose readers as iterators in eager mode (#3930)
- Add Python linter (#3929)
- Remove redundant quote marks from the protobuf version specifier (#3945)
- Skip GDS tests when the GPU is incompatible. (#3941)
- Add sequence processing to warp operator (#3879)
- Add MovingMeanSquareGpu kernel (#3922)
- Pin protobuf to <4 for Paddle Paddle (#3940)
- Update compilation flags for the DALI TensorFlow plugin (#3943)
- Change MultiDevice to MultiGpu test suffix (#3942)
- Bump up the nvidia-tensorflow version to 20.05 in tests (#3938)
- Add FindFirstLastGPU kernel (#3932)
- Adjust PR template to ask for listing exisiting tests that apply (#3939)
- Pin protobuf to <4 (#3934)
- Add VFR detection (#3921)
- Fix CVE-2022-0562 in libtiff (#3925)
- Update RNN-T pipeline tests to include GPU resampling and silence detection (#3920)
- Add more NVTX ranges to the executor (#3928)
- Add HEVC support for FramesDecoderGpu (#3896)
- Add a thread name to all DALI threads (#3912)
- Add dataclasses pip package to tests deps to fix Python3.6 operator tests (#3926)
- Add
fn.experimental.audio_resample
GPU (#3911) - Custom staging for GDS (#3894)
- Update the readme roadmap link to use 2022 one (#3918)
- Support specifying per-frame positional arguments in sequence processing test utility (#3901)
- Move audio resampler CPU implementation to a single compilation unit (#3914)
- Add stateless CPU eager operators (#3887)
- Add CUDA 11.7 support (#3906)
- Add
VideoReaderDecoder
test for missing labels (#3908) - Add signal resampling GPU kernel (#3884)
- Optimize parameter passing for ScatterGather GPU (#3905)
- Add references to ops documentation in the tutorials (#3904)
- Enable per-frame operator on GPU (#3900)
Bug Fixes
- Fix dltensor operator tests (#3984)
- Prevent clobbering of outputs before non-blocking copy_to_external finishes. (#3953)
- Fix a bug in AccessOrder when synchronizing with a default stream on the same device, which is not the current device. (#3957)
- Workaound GDS memory leak in GDSMem tests. (#3936)
- Fix circular imports in eager mode (#3919)
- Remove intermediate Tensor and use DynamicScratchpad for op tile descirptors. (#3915)
- Add missing moving of order in TensorVector's move assgiment/constructor (#3899)
Breaking API changes
There are no breaking changes in this DALI release.
Deprecated features
There are no deprecated features in this DALI release.
Known issues:
- The video loader operator requires that the key frames occur, at a minimum, every 10 to 15 frames of the video stream.
If the key frames occur at a frequency that is less than 10-15 frames, the returned frames might be out of sync. - Experimental VideoReaderDecoder does not support open GOP.
It will not report an error and might produce invalid frames. VideoReader uses a heuristic approach to detect open GOP and should work in most common cases. - The DALI TensorFlow plugin might not be compatible with TensorFlow versions 1.15.0 and later.
To use DALI with the TensorFlow version that does not have a prebuilt plugin binary shipped with DALI, make sure that the compiler that is used to build TensorFlow exists on the system during the plugin installation. (Depending on the particular version, you can use GCC 4.8.4, GCC 4.8.5, or GCC 5.4.) - Due to some known issues with meltdown/spectra mitigations and DALI, DALI shows best performance when running in Docker with escalated privileges, for example:
privileged=yes
in Extra Settings for AWS data points--privileged
or--security-opt seccomp=unconfined
for bare Docker.
Binary builds
Install via pip for CUDA 10.2:
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-cuda102==1.15.0
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-tf-plugin-cuda102==1.15.0
or for CUDA 11:
CUDA 11.0 build uses CUDA toolkit enhanced compatibility. It is built with the latest CUDA 11.x toolkit
while it can run on the latest, stable CUDA 11.0 capable drivers (450.80 or later).
Using the latest driver may enable additional functionality.
More details can be found in enhanced CUDA compatibility guide.
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-cuda110==1.15.0
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-tf-plugin-cuda110==1.15.0
Or use direct download links (CUDA 10.2):
- https://developer.download.nvidia.com/compute/redist/nvidia-dali-cuda102/nvidia_dali_cuda102-1.15.0-5080387-py3-none-manylinux2014_x86_64.whl
- https://developer.download.nvidia.com/compute/redist/nvidia-dali-tf-plugin-cuda102/nvidia-dali-tf-plugin-cuda102-1.15.0.tar.gz
Or use direct download links (CUDA 11.0):
- https://developer.download.nvidia.com/compute/redist/nvidia-dali-cuda110/nvidia_dali_cuda110-1.15.0-5080390-py3-none-manylinux2014_x86_64.whl
- https://developer.download.nvidia.com/compute/redist/nvidia-dali-cuda110/nvidia_dali_cuda110-1.15.0-5080390-py3-none-manylinux2014_aarch64.whl
- https://developer.download.nvidia.com/compute/redist/nvidia-dali-tf-plugin-cuda110/nvidia-dali-tf-plugin-cuda110-1.15.0.tar.gz
FFmpeg source code:
Libsndfile source code:
DALI v1.14.0
Key Features and Enhancements
This DALI release includes the following key features and enhancements.
- Added HEVC support to the CPU frames decoder (#3885).
- Added the CPU audio resampling operator (#3840).
- Added support for video processing and per-frame (temporal) arguments to the rotate operator (#3820).
- Added support for variable batch size in the debug mode (#3799).
- Performance optimizations:
Fixed Issues
- Fixed the compatibility with TensorFlow 2.9 by adding type propagation to DALIDataset (#3875).
- Added a missing check when the number of files and labels match in the experimental video reader (#3903).
- Added a missing check when the number of samples is greater or equal to the number of shards in readers (#3856).
- Fixed scalars handling in the GPU cast operator (#3924).
Improvements
- Add support for TensorFlow 2.9. (#3909)
- Remove deprecated usage of numpy types int and long (#3898)
- Add
output_dtype
andoutput_ndim
arguments to Pipeline constructor (#3877) - Add hevc support cpu frames decoder (#3885)
- Add a C API call to get the max batch size (#3890)
- Add bool to Pad supported types (#3895)
- Adjust eps in test comparing readers (#3892)
- Fix coverity issues. Do not re-throw worker thread error in the destructor. (#3886)
- Fix memory leak in C API test (#3889)
- Add tutorials references to ops docs - general section (#3869)
- Refactor video tests (#3864)
- Add NonsilentRegion GPU, implemented in terms of the CPU version (#3874)
- Add a check of the decoding progress in the VideoReader (#3858)
- Reduce libaviutils log verbosity to errors and above (#3871)
- Extend C Api to fetch the layout and ndim from External Source (#3862)
- Updated PyTorch-Lightning example with new strategy keyword for Trainer. (#3867)
- Update clang version to 14.02 (#3863)
- Improve cast operator performance (#3783)
- Update CUTLASS to v2.9.0 (#3860)
- Change the way how CUDA pub key is installed (#3866)
- Audio resampling operator for CPU backend (#3840)
- Dependencies update (#3831)
- Optimization of tiled transposition algorithm on small data types (#3730)
- Improve CropMirrorNormalize operator performance (#3771)
- Fix typo (model -> module) (#3848)
- Add a check against changing layout in ES (#3839)
- Add cpu only and variable batch size tests to per-frame operator (#3850)
- Missing f prefix on f-strings fix #3847
- Fix handling of arguments with trailing newlines when generating operator docs (#3841)
- Add support for sequence processing to rotate (#3820)
- Fix TF DALIDataset tests that changed layout between iterations (#3836)
- Add ndim argument to the external source operator (#3755)
- Add operators cross-referencing to data loading index (#3823)
- Features required for
autoserialization
in DALI Backend (#3795) - Remove gtest RandomBBoxCropTest tests (#3822)
- Update user documentation footer copyright date (#3819)
- Add operator cross-referencing to custom operators tutorials (#3818)
- Fix the default value of resize min_filter in the documentation (#3816)
- Benchmark for Transpose operator (#3785)
- Add operator cross-referencing to data loading section (#3809)
- Update
[shields.io](http://shields.io/)
badges inREADME.rst
. (#3815) - Add operator cross-referencing to audio processing tutorials (#3806)
- Add operator cross-referencing to video processing tutorials (#3808)
- Add support for variable batch size and NVTX ranges in debug mode (#3799)
- Shutdown() a WorkerThread in the destructor (#3810)
- Improve the redirect (#3801)
Bug Fixes
- Add tests for operator cast. Revert to plain batched cast kernel until the optimized one is fixed. (#3927)
- Fix scalar handling in GPU cast. (#3924)
- Adds check to the experimental video reader if the number of files and labels match (#3903)
- Add type propagation implementation introduced in TF 2.8 (#3875)
- Fix corruption: Change bool to int when querying pointer attributes. (#3873)
- Make libtar and libsnd root paths customizable. (#3872)
- Add check if the number of samples is greater or equal to the number of shards in readers (#3856)
- Fix transposition kernel tests (#3859)
- Fix default argument handling in cuda_vm_resource constructor (#3857)
- Fixes
test_coverage
case in test_dali_cpu_only.py and test_dali_variable_batch_size.py (#3849) - Fix rotate assertion warning (#3852)
- Make failure in curl to fail Dockerfile.build.aarch64-linux image build (#3821)
Breaking API changes
There are no breaking changes in this DALI release.
Deprecated features
There are no deprecated features in this DALI release.
Known issues:
- The video loader operator requires that the key frames occur, at a minimum, every 10 to 15 frames of the video stream.
If the key frames occur at a frequency that is less than 10-15 frames, the returned frames might be out of sync. - Experimental VideoReaderDecoder does not support open GOP.
It will not report an error and might produce invalid frames. VideoReader uses a heuristic approach to detect open GOP and should work in most common cases. - The DALI TensorFlow plug-in might not be compatible with TensorFlow versions 1.15.0 and later.
To use DALI with the TensorFlow version that does not have the prebuilt plug-in binary that is shipped with DALI, ensure that the compiler that is used to build TensorFlow exists on the system during the plug-in installation. (Depending on the particular version, you can use GCC 4.8.4, GCC 4.8.5, or GCC 5.4.) - Due to some known issues with meltdown/spectra mitigations and DALI, DALI shows the best performance when running in Docker with escalated privileges, for example:
- privileged=yes in Extra Settings for AWS data points
- --privileged or --security-opt seccomp=unconfined for bare Docker
Binary builds
Install via pip for CUDA 10.2:
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-cuda102==1.14.0
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-tf-plugin-cuda102==1.14.0
or for CUDA 11:
CUDA 11.0 build uses CUDA toolkit enhanced compatibility. It is built with the latest CUDA 11.x toolkit
while it can run on the latest, stable CUDA 11.0 capable drivers (450.80 or later).
Using the latest driver may enable additional functionality.
More details can be found in enhanced CUDA compatibility guide.
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-cuda110==1.14.0
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-tf-plugin-cuda110==1.14.0
Or use direct download links (CUDA 10.2):
- https://developer.download.nvidia.com/compute/redist/nvidia-dali-cuda102/nvidia_dali_cuda102-1.14.0-4921279-py3-none-manylinux2014_x86_64.whl
- https://developer.download.nvidia.com/compute/redist/nvidia-dali-tf-plugin-cuda102/nvidia-dali-tf-plugin-cuda102-1.14.0.tar.gz
Or use direct download links (CUDA 11.0):
- https://developer.download.nvidia.com/compute/redist/nvidia-dali-cuda110/nvidia_dali_cuda110-1.14.0-4921308-py3-none-manylinux2014_x86_64.whl
- https://developer.download.nvidia.com/compute/redist/nvidia-dali-cuda110/nvidia_dali_cuda110-1.14.0-4921308-py3-none-manylinux2014_aarch64.whl
- https://developer.download.nvidia.com/compute/redist/nvidia-dali-tf-plugin-cuda110/nvidia-dali-tf-plugin-cuda110-1.14.0.tar.gz
FFmpeg source code:
Libsndfile source code:
DALI v1.13.0
Key Features and Enhancements
This DALI release includes the following key features and enhancements.
- Added support for per-frame (temporal) arguments to the Gaussian Blur and Laplacian operators (#3715 and #3723).
- Optimized audio decoder resampling for ARM (#3745).
- Improved the debug (immediate execution) mode:
- Added support for GPU positional arguments in the Slice operator (#3741).
- Documentation improvements:
- Split the operator documentation into separate pages (#3794).
- Added a mechanism for cross-referencing examples and operators (#3748).
- Added an FAQ section to the DALI user guide (#3761).
- Added new GTC talks (#3757).
- Added shuffling and shards handling snippets to the parallel external source examples (#3744).
Fixed Issues
- Fixed the handling of samples that exceed 2GBs in the parallel external source (#3768).
Improvements
- Add per-frame operator (#3723)
- Add support for per-frame arguments to Gaussian Blur and Laplacian operators (#3715)
- Separate the documentation pages! (#3794)
- Update zlib to 1.2.12 version (#3787)
- Trim TL0_tensorflow_plugin and TL0_python-self-test-readers-decoders tests (#3796)
- Add
_schema_name
attribute in fn API (#3798) - Add resize checkerboard tests, comparing to ONNX reference precomputed data (#3792)
- Update nvJPEG2000 to 0.5.0 version (#3791)
- Fix header in parallel external source notebook (#3790)
- Update documentation link to the '22 roadmap (#3786)
- Bump Nvidia TF1 version used in tests to 22.03 (#3769)
- Add mechanism for crossreferencing examples and operators (#3748)
- Add direct operator calls in debug mode (#3734)
- Make number of samples in batch signed (#3789)
- Add debug mode benchmark (#3762)
- Fix the cuBLAS version to one compatible with nvTF 22.01 (#3781)
- Apply changes from TV sample encapsulation in NVJPEG2K (#3780)
- Ensure sample encapsulation in Tensor Vector (#3701)
- Add a TL0 test that runs on more than 1 GPU (#3772)
- Add FAQ section to the DALI documentation (#3761)
- Remove the compose operator from the fn API table (#3767)
- Add new GTC talks. Update old link (#3757)
- Update to CUDA 11.6u2 (#3764)
- RNG to use pinned memory for kernel launch args (#3765)
- Revert "Pin webdataset version to the last compatible with python 3.6 (#3746)" (#3763)
- Fix the wrong patch for CVE-2022-0907 which by mistake duplicated CVE-2022-0909 (#3760)
- Quantize GDS chunk size to 1 MB. (#3759)
- Add GDS-compatible allocator with 4k alignment. (#3754)
- Update error messaging of nvJPEG (#3756)
- Allow GPU slice arguments (#3741)
- Add filename to the error message in the numpy reader (#3753)
- Fix libtiff vulnerabilities (#3752)
- Update parallel external source notebook and include shuffling example.. (#3744)
- Add supported python version classifier to DALI TF plugin setup.py (#3751)
- Vectorize audio resampling for ARM NEON. (#3745)
- Remove prints from the regular DALI execution flow (#3740)
- Pin webdataset version to the last compatible with python 3.6 (#3746)
- Align test expectations with slice implementation rounding logic (#3738)
- Update RapidJSON (#3737)
- Regenerate getting started jupyter examples (#3732)
- Improve documentation for AccessOrder wait and set_order. (#3736)
Bug Fixes
- Add missing copying of pinned prop when sharing buffer (#3797)
- Disable PES large sample test on Xavier runner (#3788)
- Fix source device in PyTorch cross-device test. (#3775)
- Fix large mini-batch handling in parallel external source (#3768)
- Fix Yolo v4 example non-fatal teardown error (#3739)
- Rework Image Decoder example (#3731)
- Check return value of a CUDA function call. (#3733)
Breaking API changes
There are no breaking changes in this DALI release.
Deprecated features
There are no deprecated features in this DALI release.
Known issues:
- The video loader operator requires that the key frames occur, at a minimum, every 10 to 15 frames of the video stream.
If the key frames occur at a frequency that is less than 10-15 frames, the returned frames might be out of sync. - The DALI TensorFlow plug-in might not be compatible with TensorFlow versions 1.15.0 and later.
To use DALI with the TensorFlow version that does not have the prebuilt plug-in binary that is shipped with DALI, ensure that the compiler that is used to build TensorFlow exists on the system during the plug-in installation. (Depending on the particular version, you can use GCC 4.8.4, GCC 4.8.5, or GCC 5.4.) - Due to some known issues with meltdown/spectra mitigations and DALI, DALI shows the best performance when running in Docker with escalated privileges, for example:
- privileged=yes in Extra Settings for AWS data points
- --privileged or --security-opt seccomp=unconfined for bare Docker
Binary builds
Install via pip for CUDA 10.2:
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-cuda102==1.13.0
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-tf-plugin-cuda102==1.13.0
or for CUDA 11:
CUDA 11.0 build uses CUDA toolkit enhanced compatibility. It is built with the latest CUDA 11.x toolkit
while it can run on the latest, stable CUDA 11.0 capable drivers (450.80 or later).
Using the latest driver may enable additional functionality.
More details can be found in enhanced CUDA compatibility guide.
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-cuda110==1.13.0
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-tf-plugin-cuda110==1.13.0
Or use direct download links (CUDA 10.2):
- https://developer.download.nvidia.com/compute/redist/nvidia-dali-cuda102/nvidia_dali_cuda102-1.13.0-4481322-py3-none-manylinux2014_x86_64.whl
- https://developer.download.nvidia.com/compute/redist/nvidia-dali-tf-plugin-cuda102/nvidia-dali-tf-plugin-cuda102-1.13.0.tar.gz
Or use direct download links (CUDA 11.0):
- https://developer.download.nvidia.com/compute/redist/nvidia-dali-cuda110/nvidia_dali_cuda110-1.13.0-4481327-py3-none-manylinux2014_x86_64.whl
- https://developer.download.nvidia.com/compute/redist/nvidia-dali-cuda110/nvidia_dali_cuda110-1.13.0-4481327-py3-none-manylinux2014_aarch64.whl
- https://developer.download.nvidia.com/compute/redist/nvidia-dali-tf-plugin-cuda110/nvidia-dali-tf-plugin-cuda110-1.13.0.tar.gz
FFmpeg source code:
Libsndfile source code:
DALI v1.12.0
Key Features and Enhancements
This DALI release includes the following key features and enhancements.
- Added support for the GPU-accelerated decoding of videos with a variable frame rate (experimental.readers.video) (#3668).
- Reduced the binary size (#3680 and #3682).
- Improved the TensorFlow plug-in installation even when none of the prebuilt binaries matches the exact TensorFlow version (#3720).
- Improved performance by increasing the usage of pinned memory in argument input buffers (#3728).
- Documentation improvements (#3722, #3684, and #3674).
Fixed Issues
- Fixed the TensorFlow plug-in issue that prevented it from working in the CPU-only mode (#3719).
Improvements
- [DALI TF] Try building from source when TF version doesn't match exactly. Add test step to installation script. (#3720)
- Add supported layouts to Crop, CropMirrorNormalize (#3722)
- Make output buffers for arugment inputs to GPU operators pinned. (#3728)
- Bump up TensorFlow version used in tests (#3688)
- Fix coverity issues (#3679)
- Bump up CUDA to 11.6U1 (#3709)
- Add test to check if importing DALI doesn't break Torch process forking (#3669)
- Add non-owning SampleView (#3706)
- Use pinned buffers for kernel parameters and for ToContiguousGPU. (#3689)
- Update deps version for libtiff-CVE-2022-0561 fix (#3693)
- Update documentation regarding GDS being part of CUDA toolkit (#3684)
- Add VideoReaderDecoder GPU (#3668)
- Custom build: subset of file patterns for kernel and operators (#3672)
- Remove
lineinfo
from RelWithDebInfo DALI builds (#3680) - Build DALI only for major arch versions (#3682)
- Remove mpiexec affinity binding in TensorFlow TL1 and TL3 RN50 test (#3681)
- Remove Scratchpad from KernelManager (#3678)
- Update dependencies (#3677)
- Use DynamicScratchpad in KernelManager. (#3670)
- Add an info about
fill_values
being used bypad_output
in crop_mirror_normalize (#3674)
Bug Fixes
- Fix CVE-2022-0626 in libtiff (#3727)
- Fix TensorFlow plugin operation without GPU (#3719)
- Syncrhonize at the end of BoxEncoder's constructor. (#3724)
- Fix ES debug mode test failing with missing batch (#3712)
- Add missing
import nose.SkipTest
in optical flow tests (#3707) - Fix stream handling in video loader and nvdecoder. (#3705)
- Fix typos found in tensor_shape.h docs (#3695)
- Fix optical flow tests for Turing (#3685)
- Fix Slice's adaptive tiling for smaller output types (#3687)
Breaking API changes
There are no breaking changes in this DALI release.
Deprecated features
There are no deprecated features in this DALI release.
Known issues:
- The video loader operator requires that the key frames occur at a minimum every 10 to 15 frames of the video stream. If the key frames occur at a lesser frequency, then the returned frames may be out of sync.
- The DALI TensorFlow plugin might not be compatible with TensorFlow versions 1.15.0 and later.
To use DALI with the TensorFlow version that does not have a prebuilt plugin binary shipped with DALI, make sure that the compiler that is used to build TensorFlow exists on the system during the plugin installation. (Depending on the particular version, use GCC 4.8.4, GCC 4.8.5, or GCC 5.4.) - Due to some known issues with meltdown/spectra mitigations and DALI, DALI shows best performance when run in Docker with escalated privileges, for example:
- privileged=yes in Extra Settings for AWS data points
- --privileged or --security-opt seccomp=unconfined for bare Docker
Binary builds
Install via pip for CUDA 10.2:
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-cuda102==1.12.0
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-tf-plugin-cuda102==1.12.0
or for CUDA 11:
CUDA 11.0 build uses CUDA toolkit enhanced compatibility. It is built with the latest CUDA 11.x toolkit
while it can run on the latest, stable CUDA 11.0 capable drivers (450.80 or later).
Using the latest driver may enable additional functionality.
More details can be found in enhanced CUDA compatibility guide.
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-cuda110==1.12.0
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-tf-plugin-cuda110==1.12.0
Or use direct download links (CUDA 10.2):
- https://developer.download.nvidia.com/compute/redist/nvidia-dali-cuda102/nvidia_dali_cuda102-1.12.0-4144186-py3-none-manylinux2014_x86_64.whl
- https://developer.download.nvidia.com/compute/redist/nvidia-dali-tf-plugin-cuda102/nvidia-dali-tf-plugin-cuda102-1.12.0.tar.gz
Or use direct download links (CUDA 11.0):
- https://developer.download.nvidia.com/compute/redist/nvidia-dali-cuda110/nvidia_dali_cuda110-1.12.0-4144197-py3-none-manylinux2014_x86_64.whl
- https://developer.download.nvidia.com/compute/redist/nvidia-dali-cuda110/nvidia_dali_cuda110-1.12.0-4144197-py3-none-manylinux2014_aarch64.whl
- https://developer.download.nvidia.com/compute/redist/nvidia-dali-tf-plugin-cuda110/nvidia-dali-tf-plugin-cuda110-1.12.0.tar.gz
FFmpeg source code:
Libsndfile source code:
v1.11.1: Fix stream usage in C API (#3713)
Key Features and Enhancements
This is a patch release.
Fixed Issues
- Fixed wrong handling of input data by GPU external source in multi-GPU scenario
- Fixed wrong usage of streams in C API
Improvements
- None
Bug Fixes
- Fix multi-device GPU external source. (#3710)
- Fix constructing GPU Tensor from DLPack capsule (#3711)
- Fix stream usage in C API (#3713)
Breaking API changes
There are no breaking changes in this DALI release.
Deprecated features
There are no deprecated features in this DALI release.
Known issues:
- The video loader operator requires that the key frames occur at a minimum every 10 to 15 frames of the video stream. If the key frames occur at a lesser frequency, then the returned frames may be out of sync.
- The DALI TensorFlow plugin might not be compatible with TensorFlow versions 1.15.0 and later.
To use DALI with the TensorFlow version that does not have a prebuilt plugin binary shipped with DALI, make sure that the compiler that is used to build TensorFlow exists on the system during the plugin installation. (Depending on the particular version, use GCC 4.8.4, GCC 4.8.5, or GCC 5.4.) - Due to some known issues with meltdown/spectra mitigations and DALI, DALI shows best performance when run in Docker with escalated privileges, for example:
- privileged=yes in Extra Settings for AWS data points
- --privileged or --security-opt seccomp=unconfined for bare Docker
- The
experimental.readers.video
operator causes a crash during the process teardown with driver versions 460 to 470.21
Binary builds
Install via pip for CUDA 10.2:
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-cuda102==1.11.1
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-tf-plugin-cuda102==1.11.1
or for CUDA 11:
CUDA 11.0 build uses CUDA toolkit enhanced compatibility. It is built with the latest CUDA 11.x toolkit
while it can run on the latest, stable CUDA 11.0 capable drivers (450.80 or later).
Using the latest driver may enable additional functionality.
More details can be found in enhanced CUDA compatibility guide.
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-cuda110==1.11.1
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-tf-plugin-cuda110==1.11.1
Or use direct download links (CUDA 10.2):
- https://developer.download.nvidia.com/compute/redist/nvidia-dali-cuda102/nvidia_dali_cuda102-1.11.1-4069476-py3-none-manylinux2014_x86_64.whl
- https://developer.download.nvidia.com/compute/redist/nvidia-dali-tf-plugin-cuda102/nvidia-dali-tf-plugin-cuda102-1.11.1.tar.gz
Or use direct download links (CUDA 11.0):
- https://developer.download.nvidia.com/compute/redist/nvidia-dali-cuda110/nvidia_dali_cuda110-1.11.1-4069477-py3-none-manylinux2014_x86_64.whl
- https://developer.download.nvidia.com/compute/redist/nvidia-dali-cuda110/nvidia_dali_cuda110-1.11.1-4069477-py3-none-manylinux2014_aarch64.whl
- https://developer.download.nvidia.com/compute/redist/nvidia-dali-tf-plugin-cuda110/nvidia-dali-tf-plugin-cuda110-1.11.1.tar.gz
FFmpeg source code:
Libsndfile source code: