v0.1.7

github-actions released this 16 Nov 19:04

· 114 commits to main since this release

What's Changed

Build older cuda wheels by @casper-hansen in #158
Exclude download of CUDA wheels by @casper-hansen in #159
New benchmarks in README by @casper-hansen in #160
Fix typo in benchmark command by @casper-hansen in #161
Yi support by @casper-hansen in #167
Make sure to delete dummy model by @casper-hansen in #180
Fix CUDA error: invalid argument by @casper-hansen in #179
New logic for passing past_key_value by @younesbelkada in #177
Reset cache on new generation by @casper-hansen in #178
Adaptive batch sizing by @casper-hansen in #181
Pass arguments to AutoConfig by @s4rduk4r in #97
Fix cache util logic by @casper-hansen in #186
Fix multi-GPU loading and inference by @casper-hansen in #190
[core] Replace QuantLlamaMLP with QuantFusedMLP by @younesbelkada in #188
[core] Add is_hf_transformers flag by @younesbelkada in #195
Fixed multi-GPU quantization by @casper-hansen in #196

Full Changelog: v0.1.6...v0.1.7

Contributors

s4rduk4r, casper-hansen, and younesbelkada

Assets 18