Skip to content

0.46.0: torch.compile() support; custom ops refactor; Linux aarch64 wheels

Compare
Choose a tag to compare
@matthewdouglas matthewdouglas released this 27 May 21:27
· 53 commits to main since this release

Highlights

  • Support for torch.compile without graph breaks for LLM.int8().
    • Compatible with PyTorch 2.4+, but PyTorch 2.6+ is recommended.
    • Experimental CPU support is included.
  • Support torch.compile without graph breaks for 4bit.
    • Compatible with PyTorch 2.4+ for fullgraph=False.
    • Requires PyTorch 2.8 nightly for fullgraph=True.
  • We are now publishing wheels for CUDA Linux aarch64 (sbsa)!
    • Targets are Turing generation and newer: sm75, sm80, sm90, and sm100.
  • PyTorch Custom Operators refactoring and integration:
    • We have refactored most of the library code to integrate better with PyTorch via the torch.library and custom ops APIs. This helps enable our torch.compile and additional hardware compatibility efforts.
    • End-users do not need to change the way they are using bitsandbytes.
  • Unit tests have been cleaned up for increased determinism and most are now device-agnostic.
    • A new nightly CI runs unit tests for CPU (Windows x86-64, Linux x86-64/aarch64) and CUDA (Linux/Windows x86-64).

Compatability Changes

  • Support for Python 3.8 is dropped.
  • Support for PyTorch < 2.2.0 is dropped.
  • CUDA 12.6 and 12.8 builds are now compatible for manylinux_2_24 (previously manylinux_2_34).
  • Many APIs that were previously marked as deprecated have now been removed.
  • New deprecations:
    • bnb.autograd.get_inverse_transform_indices()
    • bnb.autograd.undo_layout()
    • bnb.functional.create_quantile_map()
    • bnb.functional.estimate_quantiles()
    • bnb.functional.get_colrow_absmax()
    • bnb.functional.get_row_absmax()
    • bnb.functional.histogram_scatter_add_2d()

What's Changed

New Contributors

Full Changelog: 0.45.4...0.46.0