Release 0.46.0: torch.compile() support; custom ops refactor; Linux aarch64 wheels · bitsandbytes-foundation/bitsandbytes

Highlights

Support for torch.compile without graph breaks for LLM.int8().
- Compatible with PyTorch 2.4+, but PyTorch 2.6+ is recommended.
- Experimental CPU support is included.
Support torch.compile without graph breaks for 4bit.
- Compatible with PyTorch 2.4+ for fullgraph=False.
- Requires PyTorch 2.8 nightly for fullgraph=True.
We are now publishing wheels for CUDA Linux aarch64 (sbsa)!
- Targets are Turing generation and newer: sm75, sm80, sm90, and sm100.
PyTorch Custom Operators refactoring and integration:
- We have refactored most of the library code to integrate better with PyTorch via the torch.library and custom ops APIs. This helps enable our torch.compile and additional hardware compatibility efforts.
- End-users do not need to change the way they are using bitsandbytes.
Unit tests have been cleaned up for increased determinism and most are now device-agnostic.
- A new nightly CI runs unit tests for CPU (Windows x86-64, Linux x86-64/aarch64) and CUDA (Linux/Windows x86-64).

Compatability Changes

Support for Python 3.8 is dropped.
Support for PyTorch < 2.2.0 is dropped.
CUDA 12.6 and 12.8 builds are now compatible for manylinux_2_24 (previously manylinux_2_34).
Many APIs that were previously marked as deprecated have now been removed.
New deprecations:
- bnb.autograd.get_inverse_transform_indices()
- bnb.autograd.undo_layout()
- bnb.functional.create_quantile_map()
- bnb.functional.estimate_quantiles()
- bnb.functional.get_colrow_absmax()
- bnb.functional.get_row_absmax()
- bnb.functional.histogram_scatter_add_2d()

What's Changed

PyTorch Custom Operator Integration by @matthewdouglas in #1544
Bump CUDA 12.8.0 build to CUDA 12.8.1 by @matthewdouglas in #1575
Drop Python 3.8 support. by @matthewdouglas in #1574
Test cleanup by @matthewdouglas in #1576
Fix: Return tuple in get_cuda_version_tuple by @DevKimbob in #1580
Fix torch.compile issue for LLM.int8() with threshold=0 by @matthewdouglas in #1581
fix for missing cpu lib by @Titus-von-Koeller in #1585
Fix #1588 - torch compatability for <=2.4 by @matthewdouglas in #1590
Add autoloading for backend packages by @matthewdouglas in #1593
Specify blocksize by @cyr0930 in #1586
fix typo getitem by @ved1beta in #1597
fix: Improve CUDA version detection and error handling by @ved1beta in #1599
Support LLM.int8() inference with torch.compile by @matthewdouglas in #1594
Updates for device agnosticism by @matthewdouglas in #1601
Stop building for CUDA toolkit < 11.8 by @matthewdouglas in #1605
fix intel cpu/xpu installation by @jiqing-feng in #1613
Support 4bit torch.compile fullgraph with PyTorch nightly by @matthewdouglas in #1616
Improve torch.compile support for int8 with torch>=2.8 nightly by @matthewdouglas in #1617
Add simple op implementations for CPU by @matthewdouglas in #1602
Set up nightly CI for unit tests by @matthewdouglas in #1619
point to correct latest continuous release main by @winglian in #1621
ARM runners (faster than cross compilation qemu) by @johnnynunez in #1539
Linux aarch64 CI updates by @matthewdouglas in #1622
Moved int8_mm_dequant from CPU to default backend by @Egor-Krivov in #1626
Refresh content for README.md by @matthewdouglas in #1620
C lib loading: add fallback with sensible error msg by @Titus-von-Koeller in #1615
Switch CUDA builds to use Rocky Linux 8 container by @matthewdouglas in #1638
Improvements to test suite by @matthewdouglas in #1636
Additional CI runners by @matthewdouglas in #1639
CI runner updates by @matthewdouglas in #1643
Optimizer backwards compatibility fix by @matthewdouglas in #1647
General cleanup & test improvements by @matthewdouglas in #1646
Add torch.compile tests by @matthewdouglas in #1648
Documentation Cleanup by @matthewdouglas in #1644
simplified non_sign_bits by @ved1beta in #1649

New Contributors

@DevKimbob made their first contribution in #1580
@cyr0930 made their first contribution in #1586
@ved1beta made their first contribution in #1597
@winglian made their first contribution in #1621
@Egor-Krivov made their first contribution in #1626

Full Changelog: 0.45.4...0.46.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

0.46.0: torch.compile() support; custom ops refactor; Linux aarch64 wheels

Highlights

Compatability Changes

What's Changed

New Contributors

Contributors

Uh oh!