Skip to content
@IST-DASLab

IST Austria Distributed Algorithms and Systems Lab

Popular repositories Loading

  1. gptq gptq Public

    Code for the ICLR 2023 paper "GPTQ: Accurate Post-training Quantization of Generative Pretrained Transformers".

    Python 2.1k 171

  2. marlin marlin Public

    FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.

    Python 831 67

  3. sparsegpt sparsegpt Public

    Code for the ICML 2023 paper "SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot".

    Python 800 105

  4. PanzaMail PanzaMail Public

    Python 291 19

  5. qmoe qmoe Public

    Code for the paper "QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models".

    Python 275 21

  6. QUIK QUIK Public

    Repository for the QUIK project, enabling the use of 4bit kernels for generative inference - EMNLP 2024

    C++ 182 13

Repositories

Showing 10 of 61 repositories

Top languages

Loading…

Most used topics

Loading…