Skip to content

Releases: ModelCloud/GPTQModel

GPTQModel v0.9.6

08 Jul 02:59
4fade4c
Compare
Choose a tag to compare

What's Changed

Intel/AutoRound QUANT_METHOD support added for a potentially higher quality quantization with lm_head module quantization support for even more vram reduction: format export to FORMAT.GPTQ for max inference compatibility.

Full Changelog: v0.9.5...v0.9.6

GPTQModel v0.9.5

05 Jul 13:48
f0a1ee8
Compare
Choose a tag to compare

What's Changed

Another large update with added support for Intel/Qbits quantization/inference on CPU. Cuda kernels have been fully deprecated in favor of better performing Exllama (v1/v2), Marlin, and Triton kernels.

Full Changelog: v0.9.4...v0.9.5

GPTQModel v0.9.4

04 Jul 05:41
527cffb
Compare
Choose a tag to compare

What's Changed

Full Changelog: v0.9.3...v0.9.4

GPTQModel v0.9.3

02 Jul 18:05
26b3dc0
Compare
Choose a tag to compare

What's Changed

New Contributors

Full Changelog: v0.9.2...v0.9.3

GPTQModel v0.9.2

29 Jun 12:15
6b3923e
Compare
Choose a tag to compare

What's Changed

Added auto-padding of model in/out-features for exllama and exllama v2. Fixed quantization of OPT and DeepSeek V2-Lite models. Fixed inference for DeepSeek V2-Lite.

New Contributors

Full Changelog: v0.9.1...v0.9.2

GPTQModel v0.9.1

27 Jun 07:30
71ed742
Compare
Choose a tag to compare

What's Changed

v0.9.1 is a huge release with 3 new models added in addition to new BITBLAS support from Microsoft. Batching in .quantize() has been fixed so the process is now more than 50% faster for batches enabled on large number of calibration data. Also added quantized model sharding support with optional hash security checking of weight files on model load.

New Contributors

Full Changelog: v0.9.0...v0.9.1

GPTQModel v0.9.0

20 Jun 17:50
6bf62cf
Compare
Choose a tag to compare

What's Changed (First Release since AutoGPTQ fork)

4 New Models plus sym=False asymmetry and lm_head quantized inference support.

Full Changelog: https://github.com/ModelCloud/GPTQModel/commits/v0.9.0