0.46.0: torch.compile() support; custom ops refactor; Linux aarch64 wheels
Highlights
- Support for
torch.compile
without graph breaks for LLM.int8().- Compatible with PyTorch 2.4+, but PyTorch 2.6+ is recommended.
- Experimental CPU support is included.
- Support
torch.compile
without graph breaks for 4bit.- Compatible with PyTorch 2.4+ for
fullgraph=False
. - Requires PyTorch 2.8 nightly for
fullgraph=True
.
- Compatible with PyTorch 2.4+ for
- We are now publishing wheels for CUDA Linux aarch64 (sbsa)!
- Targets are Turing generation and newer: sm75, sm80, sm90, and sm100.
- PyTorch Custom Operators refactoring and integration:
- We have refactored most of the library code to integrate better with PyTorch via the
torch.library
and custom ops APIs. This helps enable ourtorch.compile
and additional hardware compatibility efforts. - End-users do not need to change the way they are using
bitsandbytes
.
- We have refactored most of the library code to integrate better with PyTorch via the
- Unit tests have been cleaned up for increased determinism and most are now device-agnostic.
- A new nightly CI runs unit tests for CPU (Windows x86-64, Linux x86-64/aarch64) and CUDA (Linux/Windows x86-64).
Compatability Changes
- Support for Python 3.8 is dropped.
- Support for PyTorch < 2.2.0 is dropped.
- CUDA 12.6 and 12.8 builds are now compatible for
manylinux_2_24
(previouslymanylinux_2_34
). - Many APIs that were previously marked as deprecated have now been removed.
- New deprecations:
- bnb.autograd.get_inverse_transform_indices()
- bnb.autograd.undo_layout()
- bnb.functional.create_quantile_map()
- bnb.functional.estimate_quantiles()
- bnb.functional.get_colrow_absmax()
- bnb.functional.get_row_absmax()
- bnb.functional.histogram_scatter_add_2d()
What's Changed
- PyTorch Custom Operator Integration by @matthewdouglas in #1544
- Bump CUDA 12.8.0 build to CUDA 12.8.1 by @matthewdouglas in #1575
- Drop Python 3.8 support. by @matthewdouglas in #1574
- Test cleanup by @matthewdouglas in #1576
- Fix: Return tuple in get_cuda_version_tuple by @DevKimbob in #1580
- Fix torch.compile issue for LLM.int8() with threshold=0 by @matthewdouglas in #1581
- fix for missing cpu lib by @Titus-von-Koeller in #1585
- Fix #1588 - torch compatability for <=2.4 by @matthewdouglas in #1590
- Add autoloading for backend packages by @matthewdouglas in #1593
- Specify blocksize by @cyr0930 in #1586
- fix typo getitem by @ved1beta in #1597
- fix: Improve CUDA version detection and error handling by @ved1beta in #1599
- Support LLM.int8() inference with torch.compile by @matthewdouglas in #1594
- Updates for device agnosticism by @matthewdouglas in #1601
- Stop building for CUDA toolkit < 11.8 by @matthewdouglas in #1605
- fix intel cpu/xpu installation by @jiqing-feng in #1613
- Support 4bit torch.compile fullgraph with PyTorch nightly by @matthewdouglas in #1616
- Improve torch.compile support for int8 with torch>=2.8 nightly by @matthewdouglas in #1617
- Add simple op implementations for CPU by @matthewdouglas in #1602
- Set up nightly CI for unit tests by @matthewdouglas in #1619
- point to correct latest continuous release main by @winglian in #1621
- ARM runners (faster than cross compilation qemu) by @johnnynunez in #1539
- Linux aarch64 CI updates by @matthewdouglas in #1622
- Moved int8_mm_dequant from CPU to default backend by @Egor-Krivov in #1626
- Refresh content for README.md by @matthewdouglas in #1620
- C lib loading: add fallback with sensible error msg by @Titus-von-Koeller in #1615
- Switch CUDA builds to use Rocky Linux 8 container by @matthewdouglas in #1638
- Improvements to test suite by @matthewdouglas in #1636
- Additional CI runners by @matthewdouglas in #1639
- CI runner updates by @matthewdouglas in #1643
- Optimizer backwards compatibility fix by @matthewdouglas in #1647
- General cleanup & test improvements by @matthewdouglas in #1646
- Add torch.compile tests by @matthewdouglas in #1648
- Documentation Cleanup by @matthewdouglas in #1644
- simplified non_sign_bits by @ved1beta in #1649
New Contributors
- @DevKimbob made their first contribution in #1580
- @cyr0930 made their first contribution in #1586
- @ved1beta made their first contribution in #1597
- @winglian made their first contribution in #1621
- @Egor-Krivov made their first contribution in #1626
Full Changelog: 0.45.4...0.46.0